Last updated: April 19, 2026
Application No. 18/935,346
TECHNIQUES FOR CONTROLLING AUTONOMOUS VEHICLES USING VISION-LANGUAGE MODELS

Non-Final OA §103
Filed
Nov 01, 2024
Examiner
SARWAR, BABAR
Art Unit
3667
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
Interview Optional

— +20.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1043 resolved cases, 2023–2026
Examiner Intelligence

SARWAR, BABAR View full profile →
Grants 86% — above average
Career Allow Rate
893 granted / 1043 resolved
+33.6% vs TC avg
Strong +20% interview lift
Without
With
+20.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
27 currently pending
Career history
1070
Total Applications
across all art units
Statute-Specific Performance

§101
10.8%
-29.2% vs TC avg
§103
40.3%
+0.3% vs TC avg
§102
27.1%
-12.9% vs TC avg
§112
12.1%
-27.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1043 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are presented for examination.
Claims 1-20 are rejected.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al. (US Pub. No.: 2024/0092350 A1: hereinafter “Agrawal”) in view of Gopalkrishna et al. (US Pub. No.: 20230281963 A1: hereinafter “Gopalkrishna”). 


           Consider claims 1, 11, and 20:
                     Agrawal teaches one or more non-transitory computer-readable media (See Agrawal, e.g., ¶ [0141], Figs. 1, 6  elements 100-122, 600-654, “…The memory 622, 638, and/or 652 are examples of non-transitory computer-readable media. The memory 622, 638, and/or 652 can store an operating system and one or more software applications, instructions, programs, and/or data to implement the methods…as static random access memory (SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type of memory capable of storing information…”), a system (Figs. 1, 6  elements 100-122, 600-654), and a computer-implemented method for controlling vehicles (See Agrawal, e.g., “…validating or determining trajectories for a vehicle are discussed herein. A trajectory management component can receive status and/or error data from other safety system components and select or otherwise determine safe and valid vehicle trajectories…validate a trajectory upon which the trajectory management component can wait for selecting a vehicle trajectory, validate trajectories stored in a queue, and/or utilize kinematics for validation of trajectories…filter out objects based on trajectories stored in a queue…determine the collision states based on trajectories stored in a queue or determine a collision state upon which the trajectory management component can wait for selecting or otherwise determining a vehicle trajectory…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708), the method comprising: generating, based on sensor data, a first plan for controlling a vehicle (See Agrawal, e.g., “…The trajectory management component can select or otherwise determine a trajectory from among trajectories generated by, and received from, the primary system based on data generated by, and received from, other components of the safety system…include planned trajectories, first safe stop trajectories, second safe stop trajectories…the trajectories can be managed and stored via a state transition model of the trajectory management component 108. The trajectory management component 108 can store the current trajectory(ies), including the selected trajectory…retain one or more previous trajectories (e.g., any of the previous trajectory(ies) received and/or determined by the trajectory management component 108)…provide a failsafe in the event that the vehicle safety system does not receive an updated trajectory from the primary system…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708); generating, using a trained neural network (See Agrawal, e.g., “…The vehicle safety system 102 can receive data (or “primary system data”) from a primary system of the vehicle system 100. The data received from the primary system can include sensor data 104 and trajectory data 106. The sensor data 104 can include data received from one or more sensors of any of one or more systems (e.g., the sensor system(s)…The vehicle safety system 102 can include a trajectory management component 108 (e.g., a first component), a perception component 110 (e.g., a second component), a filter component 112 (e.g., a third component), and a collision detection (or “machine learned (ML)”) or (“neural network”) component 114 (e.g., a fourth component)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708), a final plan for controlling the vehicle based on the first plan and a second plan (See Agrawal, e.g., “…The trajectory data 106 can include one or more trajectories (e.g., trajectory(ies) of different types) for the vehicle. The trajectory(ies) (or “received trajectory(ies)”) (or “candidate trajectory(ies)”) can include a planned trajectory, and/or one or more other trajectories (or “alternative trajectory(ies)”) (e.g., a first safe stop trajectory (or “error trajectory”), a second safe stop trajectory (or “high-priority error trajectory”), a third safe stop trajectory (or “immediate stop trajectory”), etc.)… The data output from the trajectory management component 108 can include control data 122. The control data 122 can be utilized to control the vehicle…can be determined based on one or more of the results (e.g., the validated trajectory(ies) and/or the collision probability(ies)) received from the perception component 110, the results (e.g., the filtered object(s) and/or resulting object(s) based on the filtered object(s) being excluded) received from the filter component 112, and the results (e.g., the collision state(s)) received from the collision detection component 114. The control data 122 can include one or more of validation information (e.g., any results from any of the perception component 110, the filter component 112, and/or the collision detection component 114), the planned trajectory, the first safe stop trajectory, the second safe stop trajectory, the n-th safe stop trajectory, etc…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708).
                    Agrawal further teaches and controlling the vehicle based on the final plan (See Agrawal, e.g., “…The trajectory management component 108 can output data based on results from components of the vehicle safety system 102. The data output from the trajectory management component 108 can include control data 122. The control data 122 can be utilized to control the vehicle…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708). However, Agrawal does not explicitly teach visual language model (VLM). 
                     In an analogous field of endeavor, Gopalkrishna teaches a visual language model (VLM) (See Gopalkrishna, e.g., “…method is provided for pretraining vision and language models that includes receiving image-text pairs, each including an image and a text describing the image…” of Abstract, ¶ [0005]-¶ [0007], ¶ [0016], ¶ [0031], ¶ [0041], ¶ [0048]-¶ [0056], and Figs. 3-6 elements 300-364, 600-699, steps 400-460A).
                     It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to combine “…validating or determining trajectories for a vehicle are discussed herein. A trajectory management component can receive status and/or error data from other safety system components and select or otherwise determine safe and valid vehicle trajectories…validate a trajectory upon which the trajectory management component can wait for selecting a vehicle trajectory, validate trajectories stored in a queue, and/or utilize kinematics for validation of trajectories…filter out objects based on trajectories stored in a queue…determine the collision states based on trajectories stored in a queue or determine a collision state upon which the trajectory management component can wait for selecting or otherwise determining a vehicle trajectory…”, as disclosed in Agrawal with “a visual language model (VLM)”, as taught in Gopalkrishna with a reasonable expectation of success to yield a method, and a system for implementing an enhanced, improved, and robust alignment of the visual and text modalities.   

         Consider claims 2, 12, :
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claims 1, 11. In addition, Agrawal teaches wherein generating the final plan comprises: processing a plurality of embeddings or tokens associated with the sensor data (See Agrawal, e.g., “…The trajectory management component can select or otherwise determine a trajectory from among trajectories generated by, and received from, the primary system based on data generated by, and received from, other components of the safety system…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708), one or more detections based on the sensor data, and the first plan via the trained neural network to generate a risk score (See Agrawal, e.g., “…the collision detection component 114 can determine machine learned (ML) based one or more collision states based on the stored trajectory(ies), and/or based on one or more object trajectories determined for corresponding object(s) in the environment. The collision detection component 114 can transmit a signal (e.g., a third signal) 120 based on results of operating the collision detection component 114…determine to operate according to higher likelihood(s) of collision from corresponding component(s) by changing to another trajectory (e.g., safer trajectory), regardless of remaining component(s) indicating borderline, or lower likelihood(s) of collision (e.g., the trajectory management component 108 can ignore results associated with borderline, or lower likelihood(s) of collision, and instead perform trajectory selection based on the results associated with the higher likelihood(s) of collision)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708); and selecting the first plan or the second plan as the final plan based on the risk score (See Agrawal, e.g., “…the trajectory management component 108 can determine to operate according to higher likelihood(s) of collision from corresponding component(s) by changing to another trajectory (e.g., safer trajectory), regardless of remaining component(s) indicating borderline, or lower likelihood(s) of collision (e.g., the trajectory management component 108 can ignore results associated with borderline, or lower likelihood(s) of collision, and instead perform trajectory selection based on the results associated with the higher likelihood(s) of collision)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708). Gopalkrishna teaches a visual language model (VLM) (See Gopalkrishna, e.g., “…method is provided for pretraining vision and language models that includes receiving image-text pairs, each including an image and a text describing the image…” of Abstract, ¶ [0005]-¶ [0007], ¶ [0016], ¶ [0031], ¶ [0041], ¶ [0048]-¶ [0056], and Figs. 3-6 elements 300-364, 600-699, steps 400-460A). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Agrawal with the teachings of Gopalkrishna so as, with a reasonable expectation of success, to yield a method, and a system for implementing an enhanced, improved, and robust alignment of the visual and text modalities.   

         Consider claims 3, 13, :
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claims 2, 12. In addition, Agrawal teaches wherein at least one of geometric information or physics information associated with the one or more detections (See Agrawal, e.g., “…the collision detection component 114 can determine machine learned (ML) based one or more collision states based on the stored trajectory(ies), and/or based on one or more object trajectories determined for corresponding object(s) in the environment. The collision detection component 114 can transmit a signal (e.g., a third signal) 120 based on results of operating the collision detection component 114…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708) is also processed via the trained neural network to generate the risk score (See Agrawal, e.g., “…the trajectory management component 108 can determine to operate according to higher likelihood(s) of collision from corresponding component(s) by changing to another trajectory (e.g., safer trajectory), regardless of remaining component(s) indicating borderline, or lower likelihood(s) of collision (e.g., the trajectory management component 108 can ignore results associated with borderline, or lower likelihood(s) of collision, and instead perform trajectory selection based on the results associated with the higher likelihood(s) of collision)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708). Gopalkrishna teaches a visual language model (VLM) (See Gopalkrishna, e.g., “…method is provided for pretraining vision and language models that includes receiving image-text pairs, each including an image and a text describing the image…” of Abstract, ¶ [0005]-¶ [0007], ¶ [0016], ¶ [0031], ¶ [0041], ¶ [0048]-¶ [0056], and Figs. 3-6 elements 300-364, 600-699, steps 400-460A). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Agrawal with the teachings of Gopalkrishna so as, with a reasonable expectation of success, to yield a method, and a system for implementing an enhanced, improved, and robust alignment of the visual and text modalities.   

          Consider claims 4, 14, :
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claims 2, 12. In addition, Agrawal teaches further comprising: computing at least one collision, trajectory, or simulation based on the one or more detections (See Agrawal, e.g., “…the collision detection component 114 can determine machine learned (ML) based one or more collision states based on the stored trajectory(ies), and/or based on one or more object trajectories determined for corresponding object(s) in the environment. The collision detection component 114 can transmit a signal (e.g., a third signal) 120 based on results of operating the collision detection component 114…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708), wherein the at least one collision, trajectory, or simulation is also processed via the trained neural network to generate the risk score (See Agrawal, e.g., “…the trajectory management component 108 can determine to operate according to higher likelihood(s) of collision from corresponding component(s) by changing to another trajectory (e.g., safer trajectory), regardless of remaining component(s) indicating borderline, or lower likelihood(s) of collision (e.g., the trajectory management component 108 can ignore results associated with borderline, or lower likelihood(s) of collision, and instead perform trajectory selection based on the results associated with the higher likelihood(s) of collision)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708). Gopalkrishna teaches a visual language model (VLM) (See Gopalkrishna, e.g., “…method is provided for pretraining vision and language models that includes receiving image-text pairs, each including an image and a text describing the image…” of Abstract, ¶ [0005]-¶ [0007], ¶ [0016], ¶ [0031], ¶ [0041], ¶ [0048]-¶ [0056], and Figs. 3-6 elements 300-364, 600-699, steps 400-460A). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Agrawal with the teachings of Gopalkrishna so as, with a reasonable expectation of success, to yield a method, and a system for implementing an enhanced, improved, and robust alignment of the visual and text modalities.  

         Consider claim 5:
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claim 2. In addition, Agrawal teaches wherein the one or more detections include at least one of a detected object, a bounding box, or map information (See Agrawal, e.g., “…sing detection algorithms, the perception component 626 can generate a two-dimensional bounding box and/or a perception-based three-dimensional bounding box associated with the object. The perception component 626 can further generate a three-dimensional bounding box associated with the object. As discussed above, the three-dimensional bounding box can provide additional information such as a location, orientation, pose, and/or size (e.g., length, width, height, etc.) associated with the object…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], ¶ [0120], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708).

         Consider claims 6, 16:
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claims 1, 11. In addition, Agrawal teaches wherein generating the final plan comprises: processing a plurality of embeddings or tokens associated with the sensor data (See Agrawal, e.g., “…The trajectory management component can select or otherwise determine a trajectory from among trajectories generated by, and received from, the primary system based on data generated by, and received from, other components of the safety system…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708), one or more detections based on the sensor data, and the first plan via the trained neural network to generate program code (See Agrawal, e.g., “…the collision detection component 114 can determine machine learned (ML) based one or more collision states based on the stored trajectory(ies), and/or based on one or more object trajectories determined for corresponding object(s) in the environment. The collision detection component 114 can transmit a signal (e.g., a third signal) 120 based on results of operating the collision detection component 114…determine to operate according to higher likelihood(s) of collision from corresponding component(s) by changing to another trajectory (e.g., safer trajectory), regardless of remaining component(s) indicating borderline, or lower likelihood(s) of collision (e.g., the trajectory management component 108 can ignore results associated with borderline, or lower likelihood(s) of collision, and instead perform trajectory selection based on the results associated with the higher likelihood(s) of collision)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708); and executing the program code to select the first plan or the second plan as the final plan (See Agrawal, e.g., “…the collision detection component 114 can determine machine learned (ML) based one or more collision states based on the stored trajectory(ies), and/or based on one or more object trajectories determined for corresponding object(s) in the environment. The collision detection component 114 can transmit a signal (e.g., a third signal) 120 based on results of operating the collision detection component 114…determine to operate according to higher likelihood(s) of collision from corresponding component(s) by changing to another trajectory (e.g., safer trajectory), regardless of remaining component(s) indicating borderline, or lower likelihood(s) of collision (e.g., the trajectory management component 108 can ignore results associated with borderline, or lower likelihood(s) of collision, and instead perform trajectory selection based on the results associated with the higher likelihood(s) of collision)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708). Gopalkrishna teaches a visual language model (VLM) (See Gopalkrishna, e.g., “…method is provided for pretraining vision and language models that includes receiving image-text pairs, each including an image and a text describing the image…” of Abstract, ¶ [0005]-¶ [0007], ¶ [0016], ¶ [0031], ¶ [0041], ¶ [0048]-¶ [0056], and Figs. 3-6 elements 300-364, 600-699, steps 400-460A). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Agrawal with the teachings of Gopalkrishna so as, with a reasonable expectation of success, to yield a method, and a system for implementing an enhanced, improved, and robust alignment of the visual and text modalities.  

          Consider claims 7, 17:
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claims 6, 16. In addition, Agrawal teaches wherein executing the program code comprises invoking one or more functions to compute geometric or physics information associated with the one or more detections (See Agrawal, e.g., “…the collision detection component 114 can determine machine learned (ML) based one or more collision states based on the stored trajectory(ies), and/or based on one or more object trajectories determined for corresponding object(s) in the environment. The collision detection component 114 can transmit a signal (e.g., a third signal) 120 based on results of operating the collision detection component 114…determine to operate according to higher likelihood(s) of collision from corresponding component(s) by changing to another trajectory (e.g., safer trajectory), regardless of remaining component(s) indicating borderline, or lower likelihood(s) of collision (e.g., the trajectory management component 108 can ignore results associated with borderline, or lower likelihood(s) of collision, and instead perform trajectory selection based on the results associated with the higher likelihood(s) of collision)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708).

          Consider claims 8, 18:
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claims 1, 11. In addition, Agrawal teaches further comprising performing one or more operations to re-train a pre-trained neural network (See Agrawal, e.g., “…the collision detection component 114 can determine machine learned (ML) based one or more collision states based on the stored trajectory(ies), and/or based on one or more object trajectories determined for corresponding object(s) in the environment. The collision detection component 114 can transmit a signal (e.g., a third signal) 120 based on results of operating the collision detection component 114. The signal 120 can include the ML based on collision state(s)…physical based collision states received from the perception component 110, objects received based on filtering performed by the filter component 112, and machine learned (ML) collision states received from the collision detection component 114, can be utilized by the trajectory management component 108 to more accurately select vehicle trajectory(ies)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708) based on at least one of one or more predefined labels or one or more generated labels that are associated with additional sensor data to generate the trained neural network (See Agrawal, e.g., “…The primary system may generally perform processing to control how the vehicle maneuvers within an environment. The primary system may implement various Artificial Intelligence (AI) techniques, such as machine learning, to understand an environment around the vehicle and/or instruct the vehicle to move within the environment. For example, the primary system may implement the AI techniques to localize the vehicle, detect an object around the vehicle, segment sensor data, determine a classification of the object, predict an object track, generate a trajectory for the vehicle, and so on. In examples, the primary system processes data from multiple types of sensors on the vehicle, such as light detection and ranging (lidar) sensors, radar sensors, image sensors, depth sensors (time of flight, structured light, etc.), and the like…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708). Gopalkrishna teaches a visual language model (VLM) (See Gopalkrishna, e.g., “…method is provided for pretraining vision and language models that includes receiving image-text pairs, each including an image and a text describing the image…” of Abstract, ¶ [0005]-¶ [0007], ¶ [0016], ¶ [0031], ¶ [0041], ¶ [0048]-¶ [0056], and Figs. 3-6 elements 300-364, 600-699, steps 400-460A). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Agrawal with the teachings of Gopalkrishna so as, with a reasonable expectation of success, to yield a method, and a system for implementing an enhanced, improved, and robust alignment of the visual and text modalities.  

          Consider claim 9:
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claim 1. In addition, Agrawal teaches wherein the second plan is a predefined plan (See Agrawal, e.g., “…The trajectory data 106 can include one or more trajectories (e.g., trajectory(ies) of different types) for the vehicle. The trajectory(ies) (or “received trajectory(ies)”) (or “candidate trajectory(ies)”) can include a planned trajectory, and/or one or more other trajectories (or “alternative trajectory(ies)”) (e.g., a first safe stop trajectory (or “error trajectory”), a second safe stop trajectory (or “high-priority error trajectory”), a third safe stop trajectory (or “immediate stop trajectory”), etc.)… The data output from the trajectory management component 108 can include control data 122. The control data 122 can be utilized to control the vehicle…can be determined based on one or more of the results (e.g., the validated trajectory(ies) and/or the collision probability(ies)) received from the perception component 110, the results (e.g., the filtered object(s) and/or resulting object(s) based on the filtered object(s) being excluded) received from the filter component 112, and the results (e.g., the collision state(s)) received from the collision detection component 114…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708).

         Consider claims 10, 19:
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claims 1, 11. In addition, Agrawal teaches further comprising generating the second plan based on the sensor data (See Agrawal, e.g., “…The trajectory management component can select or otherwise determine a trajectory from among trajectories generated by, and received from, the primary system based on data generated by, and received from, other components of the safety system…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708).

          Consider claim 15:
                    The combination of Agrawal, Gopalkrishna teaches everything claimed as implemented above in the rejection of claim 11. In addition, Agrawal teaches wherein generating the final plan comprises modifying the first plan (See Agrawal, e.g., “…he trajectory management component 108 can utilize any results from some or all of the perception component 110, the filter component 112, and/or the collision detection component 114 to avoid collisions. In other words, if any results from one or more of the perception component 110, the filter component 112, and/or the collision detection component 114 indicate a potential collision, the trajectory management component 108 can modify the trajectory for the vehicle (e.g., change to a safer trajectory, such as from the planned trajectory to a safe-stop trajectory)…”, of Abstract, ¶ [0010]-¶ [0017], ¶ [0020]-¶ [0030], ¶ [0051]-¶ [0061], ¶ [0063]-¶ [0068], ¶ [0070]-¶ [0078], ¶ [0081]-¶ [0093], ¶ [0113]-¶ [0114], and Figs. 1-2 elements 100-218, Figs. 3-6 elements 300-654, Fig. 7 steps 700-708).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

         Chintakindi (US Pat. No.: 12,246,750 B2) teaches “Apparatuses, systems, and methods are provided for the utilization of vehicle control systems to cause a vehicle to take preventative action responsive to the detection of a near short term adverse driving scenario. A vehicle control system may receive information corresponding to vehicle operation data and ancillary data. Based on the received vehicle operation data and the received ancillary data, a multi-dimension risk score module may calculate risk scores associated with the received vehicle operation data and the received ancillary data. Subsequently, the vehicle control systems may cause the vehicle to perform at least one of a close call detection action and a close call detection alert to lessen the risk associated with the received vehicle operation data and the received ancillary data.”

          KIM et al. (US Pub. No.:2024/0383473 A1) teaches “A method predicting a path of an object includes: recognizing the object by using at least one sensor of the vehicle; generating movement path data associated with the object by tracking a movement path of the object during a first time interval when the object is recognized; and generating prediction path data including at least one prediction path associated with the object during a second time interval based on the generated movement path data.”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BABAR SARWAR whose telephone number is (571)270-5584.  The examiner can normally be reached on Mon-Fri 9:00 AM-5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Faris S. Almatrahi can be reached on (313)446-4821.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)? If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BABAR SARWAR/Primary Examiner, Art Unit 3667
Read full office action
Prosecution Timeline

Nov 01, 2024
Application Filed
Feb 20, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/650,253
Patent 12600370
VEHICULAR CONTROL SYSTEM
2y 5m to grant Granted Apr 14, 2026
18/698,381
Patent 12602800
TIRE STATE ESTIMATION METHOD
2y 5m to grant Granted Apr 14, 2026
18/739,722
Patent 12602933
VEHICULAR SENSING SYSTEM WITH OCCLUSION ESTIMATION FOR USE IN CONTROL OF VEHICLE
2y 5m to grant Granted Apr 14, 2026
18/229,712
Patent 12594947
DISPLAY CONTROL DEVICE, DISPLAY CONTROL METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/376,971
Patent 12586465
METHOD AND APPARATUS FOR ASSISTING RIGHT TURN OF AUTONOMOUS VEHICLE BASED ON UWB COMMUNICATION AND V2X COMMUNICATION AT INTERSECTION
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+20.0%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 1043 resolved cases by this examiner. Grant probability derived from career allow rate.
TECHNIQUES FOR CONTROLLING AUTONOMOUS VEHICLES USING VISION-LANGUAGE MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email