Last updated: April 18, 2026
Application No. 18/791,995
ENVIRONMENTAL TEXT PERCEPTION AND TOLL EVALUATION USING VISION LANGUAGE MODELS

Final Rejection §103
Filed
Aug 01, 2024
Examiner
CHALHOUB, JEFFREY ROBERT
Art Unit
3663
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Nvidia Corporation
OA Round
2 (Final)
Interview Optional

— +52.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 146 resolved cases, 2023–2026
Examiner Intelligence

CHALHOUB, JEFFREY ROBERT View full profile →
Grants 66% — above average
Career Allow Rate
97 granted / 146 resolved
+14.4% vs TC avg
Strong +53% interview lift
Without
With
+52.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
18 currently pending
Career history
164
Total Applications
across all art units
Statute-Specific Performance

§101
25.0%
-15.0% vs TC avg
§103
48.8%
+8.8% vs TC avg
§102
11.4%
-28.6% vs TC avg
§112
14.0%
-26.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 146 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in reply to the Application Number 18/791,995 filed on 08/01/2024.
Claims 1-20 are currently pending and have been examined.
This action is made FINAL in response to the “Amendment” and “Remarks” filed on 12/09/2025.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description:

“110a-n”, 
“120a-n”,
“P1”,
“P0”,
“1004”,
“1316”.

The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference character(s) not mentioned in the description:

“405”,
“1004(B)”,
“1004(A)”.

Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-2, 4, 11-13, 15, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Gerges (U.S. Pub. No. 2020/0027344 A1) in view of Uziel (U.S. Pub. No. 2025/0162613 A1).

Regarding Claim 1:
Gerges teaches:
One or more processors comprising processing circuitry to: identify image data generated using one or more cameras of an ego-machine, the image data including a depiction of at least a portion of one or more toll signs;, (See (Gerges: Detailed Description – 38th-50th, 53rd-58th, and 119th-125th paragraphs))
whether to navigate in one or more toll lanes based at least on the image data;, (See (Gerges: Summary – 13th-22nd paragraphs))
Gerges does not teach but Uziel teaches:
apply, to a vision-language model (VLM) of the ego-machine, a multimodal prompt comprising the image data representing the one or more toll signs and a text prompt to cause the VLM to generate one or more responses determining, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
and control one or more operations of the ego-machine based at least on the one or more responses., (See (Uziel: Introduction – 3rd paragraph and Detailed Description – 33rd-38th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 2:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges further teaches:
The one or more processors of claim 1, wherein the processing circuitry is further to initiate monitoring for the one or more toll signs, (See (Gerges: Detailed Description – 38th-50th, 53rd-58th, and 64th paragraphs))
Gerges does not teach but Uziel teaches:
[…] based at least on the ego-machine entering a detected highway driving mode., (See (Uziel: Detailed Description – 32nd paragraph))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 4:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges further teaches:
[…] evaluate the image data representing the one or more toll signs in response to verifying legibility of the one or more toll signs., (See (Gerges: Detailed Description – 38th-50th, 53rd-60th, and 65th-68th paragraphs))
Gerges does not teach but Uziel teaches:
The one or more processors of claim 1, wherein the processing circuitry is further to apply the multimodal prompt to cause the VLM to, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 11:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges does not teach but Uziel teaches:
The one or more processors of claim 1, wherein the one or more processors are comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system implementing one or more vision language models (VLMs); a system for generating synthetic data; a system for generating synthetic data using AI; a system for performing one or more generative AI operations; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources., (See (Uziel: Introduction – 4th-13th paragraphs paragraph and Detailed Description – 33rd-38th and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 12:
Gerges teaches:
A system comprising one or more processors to, (See (Gerges: Summary – 4th-5th paragraphs and Detailed Description – 119th-125th paragraphs))
whether to drive in one or more toll lanes based at least on image data representing one or more toll signs., (See (Gerges: Summary – 13th-22nd paragraphs and Detailed Description – 38th-50th and 53rd-58th paragraphs))
Gerges does not teach but Uziel teaches:
control one or more operations of an ego- machine, (See (Uziel: Introduction – 3rd paragraph and Detailed Description – 35th-38th paragraphs))
based at least on applying, to a vision-language model (VLM) of the ego-machine, a multimodal prompt to generate one or more responses determining, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 13:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 12. Gerges further teaches:
The system of claim 12, wherein the one or more processors are further to initiate monitoring for the one or more toll signs, (See (Gerges: Detailed Description – 38th-50th, 53rd-58th, and 64th paragraphs))
Gerges does not teach but Uziel teaches:
[…] based at least on the ego-machine entering a detected highway driving mode., (See (Uziel: Detailed Description – 32nd paragraph))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 15:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 12. Gerges further teaches:
[…] the image data representing the one or more toll signs in response to verifying legibility of the one or more toll signs., (See (Gerges: Detailed Description – 38th-50th, 53rd-60th, and 65th-68th paragraphs))
Gerges does not teach but Uziel teaches:
The system of claim 12, wherein the one or more processors are further to apply the multimodal prompt to cause the VLM to evaluate, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 18:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 12. Gerges does not teach but Uziel teaches:
The system of claim 12, wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system implementing one or more vision language models (VLMs); a system for generating synthetic data; a system for generating synthetic data using AI; a system for performing one or more generative AI operations; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources., (See (Uziel: Introduction – 4th-13th paragraphs paragraph and Detailed Description – 33rd-38th and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 19:
Gerges teaches:
A method comprising:, (See (Gerges: Summary – 14th-22nd paragraphs))
evaluating a multimodal prompt comprising a representation of one or more signs detected in an environment exterior to the ego-machine,, (See (Gerges: Detailed Description – 38th-50th and 53rd-58th paragraphs))
Gerges does not teach but Uziel teaches:
generating, based at least on a vision-language model (VLM) of an ego-machine, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
one or more responses determining whether to execute one or more navigational operations based at least on the one or more signs; and controlling one or more operations of the ego-machine based at least on the one or more responses., (See (Uziel: Introduction – 3rd paragraph and Detailed Description – 33rd-38th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Regarding Claim 20:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 19. Gerges does not teach but Uziel teaches:
The method of claim 19, wherein the method is performed by at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system implementing one or more vision language models (VLMs); a system for generating synthetic data; a system for generating synthetic data using AI; a system for performing one or more generative AI operations; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources., (See (Uziel: Introduction – 4th-13th paragraphs paragraph and Detailed Description – 33rd-38th and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)

Claims 3, 5-6, 8-10, 14, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Gerges (U.S. Pub. No. 2020/0027344 A1) in view of Uziel (U.S. Pub. No. 2025/0162613 A1) in further view of Borras (U.S. Pub. No. 2023/0099361 A1).

Regarding Claim 3:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges further teaches:
The one or more processors of claim 1, wherein the processing circuitry is further to initiate monitoring for the one or more toll signs, (See (Gerges: Detailed Description – 38th-50th, 53rd-58th, and 64th paragraphs))
Gerges does not teach but Borras teaches:
[…] based at least on the ego-machine entering or approaching one or more geo-tagged locations., (See (Borras: Detailed Description – 84th-89th paragraphs, FIG. 19-21))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)
Regarding Claim 5:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges further teaches:
[…] based at least on verifying legibility of the one or more toll signs., (See (Gerges: Detailed Description – 38th-50th, 53rd-60th, and 65th-68th paragraphs))
Gerges does not teach but Borras teaches:
The one or more processors of claim 1, wherein the processing circuitry is further to generate a list of one or more upcoming exits of the one or more toll lanes, (See (Borras: Detailed Description – 74th-76th, 84th, 94th-96th, and 116th-117th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)
Regarding Claim 6:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 5. Gerges does not teach but Uziel teaches:
The one or more processors of claim 5, wherein the text prompt of the multimodal prompt is further to cause the VLM to, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Gerges in view of Uziel does not teach but Borras teaches:
[…] determine whether to drive in the one or more toll lanes based at least on a list of one or more upcoming exits of the one or more toll lanes., (See (Borras: Detailed Description – 74th-76th, 84th, 94th-96th, and 116th-117th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)
Regarding Claim 8:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges does not teach but Uziel teaches:
The one or more processors of claim 1, wherein the text prompt of the multimodal prompt is further to cause the VLM to, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Gerges in view of Uziel does not teach but Borras teaches:
[…] determine whether to drive in the one or more toll lanes based at least on a planned exit associated with an active mapping route., (See (Borras: Detailed Description – 41st-47th, 56th-65th, 74th-84th, and 94th-96th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)
Regarding Claim 9:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges further teaches:
[…] determine a cost to drive on one or more upcoming segments of the one or more toll lanes […], (See (Gerges: Summary – 5th-19th paragraphs and Detailed Description – 77th-86th, 92nd-93rd, 101st-102nd, and 109th-110th paragraphs))
Gerges does not teach but Uziel teaches:
The one or more processors of claim 1, wherein the text prompt of the multimodal prompt is further to cause the VLM to, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Gerges in view of Uziel does not teach but Borras teaches:
[…] based at least on a detected number of occupants of the ego-machine., (See (Borras: Detailed Description – 43rd, 47th, and 70th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)
Regarding Claim 10:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges does not teach but Uziel teaches:
The one or more processors of claim 1, wherein the one or more operations of the ego- machine comprise, (See (Uziel: Introduction – 3rd paragraph and Detailed Description – 33rd-38th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Gerges in view of Uziel does not teach but Borras teaches:
[…] initiating a merge into or out of the one or more toll lanes., (See (Borras: Detailed Description – 94th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)
Regarding Claim 14:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 12. Gerges further teaches:
The system of claim 12, wherein the one or more processors are further to initiate monitoring for the one or more toll signs, (See (Gerges: Detailed Description – 38th-50th, 53rd-58th, and 64th paragraphs))
Gerges does not teach but Borras teaches:
[…] based at least on the ego-machine entering or approaching one or more geo-tagged locations., (See (Borras: Detailed Description – 84th-89th paragraphs, FIG. 19-21))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)
Regarding Claim 16:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 12. Gerges further teaches:
[…] based at least on verifying legibility of the one or more toll signs., (See (Gerges: Detailed Description – 38th-50th, 53rd-60th, and 65th-68th paragraphs))
Gerges does not teach but Borras teaches:
The system of claim 12, wherein the one or more processors are further to generate a list of one or more upcoming exits of the one or more toll lanes, (See (Borras: Detailed Description – 74th-76th, 84th, 94th-96th, and 116th-117th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)
Regarding Claim 17:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 12. Gerges does not teach but Uziel teaches:
The system of claim 12, wherein the text prompt of the multimodal prompt is further to cause the VLM to, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Gerges in view of Uziel does not teach but Borras teaches:
[…] determine whether to drive in the one or more toll lanes based at least on a list of one or more upcoming exits of the one or more toll lanes., (See (Borras: Detailed Description – 74th-76th, 84th, 94th-96th, and 116th-117th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Borras in order to create a user-friendly environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Borras’s high accuracy geo-location system and method for mobile payment in order for an autonomous vehicle to determine whether to navigate toll lanes based on: entering or approaching geo-tagged locations, a list of upcoming exits of the toll lanes, a planned exit associated with an active mapping route, and a detected number of occupants of the autonomous vehicle. Combining Gerges and Borras would thus provide “a method and apparatus to improve location accuracy for a variety of vehicular-related applications.” (Borras: Background – 8th paragraph)

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Gerges (U.S. Pub. No. 2020/0027344 A1) in view of Uziel (U.S. Pub. No. 2025/0162613 A1) in further view of Mahlawat (U.S. Pub. No. 2023/0267516 A1).

Regarding Claim 7:
Gerges in view of Uziel, as shown in the rejection above, discloses the limitations of claim 1. Gerges does not teach but Uziel teaches:
The one or more processors of claim 1, wherein the text prompt of the multimodal prompt is further to cause the VLM to, (See (Uziel: Summary – 4th-13th paragraphs and Detailed Description – 44th-59th, 68th-75th, and 80th-86th paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges with these above aforementioned teachings from Uziel in order to create a smart environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Uziel’s image-based system for generation of descriptive and perceptive messages of automotive scenes in order to prompt a vision-language model of an autonomous vehicle to generate responses indicating whether to navigate toll lanes and control the autonomous vehicle based on the responses. Combining Gerges and Uziel would thus provide “perception systems for describing and responding to environmental situations.” (Uziel: Introduction – 2nd paragraph)
Gerges in view of Uziel does not teach but Mahlawat teaches:
[…] determine whether to drive in the one or more toll lanes based at least on a designated maximum toll., (See (Mahlawat: Detailed Description – 70th-72nd paragraphs))
It would have been obvious to one of ordinary skill in the art at the time of filing, before the effective filing date of the claimed invention, to modify Gerges in view of Uziel with these above aforementioned teachings from Mahlawat in order to create a cost-efficient environmental text perception and toll evaluation system using vision language models. At the time the invention was filed, one of ordinary skill in the art would have been motivated to incorporate Gerges’s toll road detection and reporting system with Mahlawat’s system and method for providing customized toll pricing in order for an autonomous vehicle to determine whether to navigate toll lanes based on a designated maximum toll. Combining Gerges and Mahlawat would thus provide “a system and method for efficiently collecting toll from users, ensuring optimal utilization of toll roads, reducing inconvenience for the users, and maximizing revenue generation for toll agencies.” (Mahlawat: Background of the Invention – 7th paragraph)

Response to Arguments

The 35 U.S.C. 101 rejection set forth in the Non-Final Rejection mailed on October 10th, 2025 has been withdrawn as the “Amendments” and “Remarks” filed by the Applicant on December 9th, 2025 satisfactorily overcome this rejection.
Applicant’s arguments filed on December 9th, 2025 with regard to the 35 U.S.C. 103 rejection have been fully considered but are not persuasive.
With regard to the 35 U.S.C. 103 rejection, the limitations are taught in the combination of Uziel and Gerges as has been set forth above, contrary to the Applicant’s assertions. Therefore, the Applicant’s amendments and arguments are insufficient to overcome these prior art rejections.
More specifically, Uziel mentions “A system is disclosed and includes […] to generate the text message.” Furthermore, Uziel states “The image encoder 204 encodes […] in an autoregressive manner (word by word) as described.” Uziel further states “At 420, […] with corresponding captions.” Uziel mentions “In this application, […] and Python®.” In doing so, Uziel addresses the Applicant’s limitation of “apply, to a vision-language model (VLM) of the ego-machine, a multimodal prompt comprising the image data representing the one or more toll signs and a text prompt to cause the VLM to generate one or more responses determining” as set forth in claim 1 and similarly in claims 12 and 19.
Moreover, Gerges mentions “The sign recognition unit may be configured to […] the driver of the lane information.” In doing so, Gerges addresses the Applicant’s limitation of “whether to navigate in one or more toll lanes based at least on the image data” as set forth in claim 1 and similarly in claims 12 and 19.
As a result, the combination of Uziel and Gerges addresses "apply[ing], to a vision-language model (VLM) of the ego-machine, a multimodal prompt comprising the image data representing the one or more toll signs and a text prompt to cause the VLM to generate one or more responses determining whether to navigate in one or more toll lanes based at least on the image data" as set forth by the Applicant in claim 1 and similarly in claims 12 and 19 and the claimed "multimodal prompt" or "text prompt of the multimodal prompt" as set forth by the Applicant in claims 4, 6-9, 12, 15, and 17.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jeffrey Chalhoub whose telephone number is (571) 272-9754. The examiner can normally be reached Mon-Fri 8:30-5:30. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivek Koppikar can be reached on (571) 272-5109. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.R.C./Examiner, Art Unit 3667                                                                                                                                                                                                        
/VIVEK D KOPPIKAR/Supervisory Patent Examiner
Art Unit 3667                                                                                                                                                                                                        
January 24, 2026
Read full office action
Prosecution Timeline

Aug 01, 2024
Application Filed
Oct 02, 2025
Non-Final Rejection — §103
Dec 08, 2025
Examiner Interview Summary
Dec 09, 2025
Response Filed
Jan 21, 2026
Final Rejection — §103
Apr 13, 2026
Request for Continued Examination
Apr 15, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/277,976
Patent 12600377
Cooperative Vehicle Infrastructure Information Processing Method and Apparatus, and Terminal Device
2y 5m to grant Granted Apr 14, 2026
17/955,016
Patent 12594835
SYSTEM FOR CONTROLLING VEHICLE DISPLAY BASED ON OCCUPANT'S GAZE
2y 5m to grant Granted Apr 07, 2026
18/630,162
Patent 12573305
ARTIFICIALLY INTELLIGENT SKYWAY
2y 5m to grant Granted Mar 10, 2026
18/264,865
Patent 12559131
METHOD OF A CONTROL CENTER FOR OPERATING AN AUTOMATED VEHICLE AND AUTOMATED VEHICLE
2y 5m to grant Granted Feb 24, 2026
18/739,345
Patent 12534086
VEHICLE AND COMPUTER PROGRAM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
66%
Grant Probability
99%
With Interview (+52.7%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 146 resolved cases by this examiner. Grant probability derived from career allow rate.