Last updated: April 19, 2026
Application No. 18/791,977
ENVIRONMENTAL TEXT PERCEPTION AND PARKING EVALUATION USING VISION LANGUAGE MODELS

Non-Final OA §101§103§112
Filed
Aug 01, 2024
Examiner
MCCLEARY, CAITLIN RENEE
Art Unit
3669
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
This examiner grants 57% of cases after interview

— +32.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 95 resolved cases, 2023–2026
Examiner Intelligence

MCCLEARY, CAITLIN RENEE View full profile →
Grants 57% of resolved cases
Career Allow Rate
54 granted / 95 resolved
+4.8% vs TC avg
Strong +32% interview lift
Without
With
+32.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
56 currently pending
Career history
151
Total Applications
across all art units
Statute-Specific Performance

§101
12.9%
-27.1% vs TC avg
§103
43.5%
+3.5% vs TC avg
§102
14.0%
-26.0% vs TC avg
§112
27.4%
-12.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 95 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20 are currently pending and have been examined in this application. This communication is the first action on the merits (FAOM).

Examiner's Note
Examiner has cited particular paragraphs/columns and line numbers or figures in the
references as applied to the claims below for the convenience of the applicant. Although the
specified citations are representative of the teachings in the art and are applied to the specific
limitations within the individual claim, other passages and figures may apply as well. It is
respectfully requested from the applicant, in preparing the responses, to fully consider the
references in their entirety as potentially teaching all or part of the claimed invention, as well as
the context of the passage as taught by the prior art or disclosed by the examiner. Applicant is
reminded that the Examiner is entitled to give the broadest reasonable interpretation to the
language of the claims. Furthermore, the Examiner is not limited to Applicant's definition which is not specifically set forth in the disclosure.

Claim Objections
Claims 1-8, 10-11, and 14-17 are objected to because of the following informalities: 
Claim 1 recites “processing circuitry to” but should instead recite --processing circuitry configured to--.
Claims 2-8 and 10-11 recite “processing circuitry is further to” but should instead recite --processing circuitry is further configured to--.
Claims 14-17 recite “the one or more processors are further to” but should instead recite -- the one or more processors are further configured to--.
Appropriate correction is required.

Claim Interpretation
	Use of the word "means" ( or "step for") in a claim with functional language creates a
rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C.
112(-f) (pre-AIA  35 U.S.C. 112, sixth paragraph). The presumption that 35 U.S.C. 112(-f) (pre-
AIA  35 U.S.C. 112, sixth paragraph) is invoked is rebutted when the function is recited with
sufficient structure, material, or acts within the claim itself to entirely perform the recited
function.
Absence of the word "means" ( or "step for") in a claim creates a rebuttable
presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(-f)
(pre-AIA  35 U.S.C. 112, sixth paragraph). The presumption that 35 U.S.C. 112(-f) (pre-AIA  35
U.S.C. 112, sixth paragraph) is not invoked is rebutted when the claim element recites function
but fails to recite sufficiently definite structure, material or acts to perform that function.
The claims in this application are given their broadest reasonable interpretation using
the plain meaning of the claim language in light of the specification as it would be understood
by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element
(also commonly referred to as a claim limitation) is limited by the description in the
specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following
three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth
paragraph:
the claim limitation uses the term “means” or “step” or a term used as a substitute for
“means” that is a generic placeholder (also called a nonce term or a non-structural term
having no specific structural meaning) for performing the claimed function;
the term “means” or “step” or the generic placeholder is modified by functional
language, typically, but not always linked by the transition word “for” (e.g., “means for”)
or another linking word or phrase, such as “configured to” or “so that”; and
the term “means” or “step” or the generic placeholder is not modified by sufficient
structure, material, or acts for performing the claimed function.
Claim limitations in this application that use the word “means” (or “step”) are being
interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as
otherwise indicated in an Office action. Conversely, claim limitations in this application that do
not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-
AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word
“means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112,
sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with
functional language without reciting sufficient structure to perform the recited function and the
generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:  “a system implementing one or more vision language models (VLMs)” (see 35 U.S.C. 112(b) rejection below, this is interpreted as a system implementing the vision language model) in claims 12 and 18, “a control system” in claim 20, “a perception system” in claim 20, each instance of “a system for” in claim 20, each instance of “a system implementing” in claim 20, and each instance of a system incorporating” in claim 20.
Examiner note: Claim 12 recites “wherein the one or more processors are comprised in at least one of” and the only option listed in claim 12 that is linked to a function performed in claim 1 is “a system implementing one or more vision language models (VLMs)”. Therefore, the rest of the limitations in claim 12 don’t meet the 3-prong requirement of 35 U.S.C. 112(f). Claim 18 recites “wherein the system is comprised in at least one of” and the only option listed in claim 18 that is linked to a function performed in claim 13 is “a system implementing one or more vision language models (VLMs)”. Therefore, the rest of the limitations in claim 18 don’t meet the 3-prong requirement of 35 U.S.C. 112(f). Claim 20 recites “wherein the method is performed by at least one of” and therefore links the functions performed in claim 19 to the options listed in claim 20. The only limitations in claim 20 that don’t meet the 3-prong requirement of 35 U.S.C. 112(f) are “a system implemented using an edge device”, “a system implemented using a robot”, “a system implemented at least partially in a data center”, and “a system implemented at least partially using cloud computing resources.”
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or
pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the
corresponding structure described in the specification as performing the claimed function, and
equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C.
112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim
limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112,
sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2)
present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform
the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 
35 U.S.C. 112, sixth paragraph.
The above-referenced claim limitations has/have been interpreted under 35 U.S.C.
112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because: “a system implementing one or more vision language models (VLMs)” in claims 12 and 18, “a control system” in claim 20, “a perception system” in claim 20, each instance of “a system for” in claim 20, each instance of “a system implementing” in claim 20, and each instance of a system incorporating” in claim 20 all use a generic placeholder “system” coupled with functional language without reciting sufficient structure to achieve the function. Furthermore, the generic placeholder is not preceded by a structural modifier.
Since the claim limitation(s) invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth
paragraph, the claims have been interpreted to cover the corresponding structure described in
the specification that achieves the claimed function, and equivalents thereof.
If applicant wishes to provide further explanation or dispute the examiner's interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action.
If applicant does not intend to have the claim limitation(s) treated under 35 U.S.C. l 12(f)
or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may amend the claim(s) so that it/they will
clearly not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, or present a
sufficient showing that the claim recites/recite sufficient structure, material, or acts for
performing the claimed function to preclude application of 35 U.S.C. 112(f) or pre-AIA  35 U.S.C.
112, sixth paragraph.
For more information, see MPEP § 2173 et seq. and Supplementary Examination
Guidelines for Determining Compliance With 35 U.S. C. 112 and for Treatment of Related Issues
in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 12, 18, and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The specification doesn’t recite the structural elements of the following limitations: “a system implementing one or more vision language models (VLMs)” in claims 12 and 18, “a control system” in claim 20, “a perception system” in claim 20, each instance of “a system for” in claim 20, each instance of “a system implementing” in claim 20, and each instance of a system incorporating” in claim 20.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 12, 18, and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim limitations “a system implementing one or more vision language models (VLMs)” in claims 12 and 18, “a control system” in claim 20, “a perception system” in claim 20, each instance of “a system for” in claim 20, each instance of “a system implementing” in claim 20, and each instance of a system incorporating” in claim 20 invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claim 12 recites “a system implementing one or more vision language models (VLMs)” and it is unclear if the VLM recited in claim 1 is included in these one or more VLMs or not. The metes and bounds of the claim limitation are vague and ill-defined, rendering the claim indefinite. As best understood, the claim will be interpreted broadly such that claim 12 is referring to a system implementing the vision language model (VLM).
 Claim 18 recites “a system implementing one or more vision language models (VLMs)” and it is unclear if the VLM recited in claim 13 is included in these one or more VLMs or not. Furthermore, since claim 18 depends on claim 13 which introduces a system, it is unclear if the various instances of system in claim 18 are referring to the same or a different system as the one introduced in claim 13. The metes and bounds of the claim limitation are vague and ill-defined, rendering the claim indefinite. As best understood, the claim will be interpreted broadly such that claim 18 includes either a new system or the same system as in claim 13 that is implementing the vision language model (VLM), and each separate  instance of system is referring to either a new system or the same system as is introduced in claim 13.
Claim 20 recites “a system implementing one or more vision language models (VLMs)” and it is unclear if the VLM recited in claim 19 is included in these one or more VLMs or not. The metes and bounds of the claim limitation are vague and ill-defined, rendering the claim indefinite. As best understood, the claim will be interpreted broadly such that claim 20 is referring to a system implementing the vision language model (VLM).

Claim Rejections - 35 USC § 101
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

The claims are either directed to one or more processors, a system, or a method, which are each one of the statutory categories of invention. (Step 1: YES)
The examiner has identified claim 1 as the claim that represents the claimed invention for analysis. Claim 1 recites the limitations of:


“One or more processors comprising processing circuitry to: identify image data generated using one or more cameras of an ego-machine and representing one or more parking signs; prompt a vision-language model (VLM) to generate one or more responses indicating whether parking is permitted in one or more candidate parking spaces based at least on the image data representing the one or more parking signs; and control, using an Advanced Driver Assistance System (ADAS) of the ego-machine, one or more parking operations of the ego-machine with respect to at least one candidate parking space of the one or more candidate parking spaces based at least on the one or more responses.”


The limitations of identifying and generating, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “one or more processors comprising processor circuitry”, nothing in the claim element precludes the step from practically being performed in the human mind. For example, but for the “one or more processors comprising processor circuitry” language, identifying and generating in the context of the claim encompasses a person looking at images of parking signs which have been captured by a camera, determining if parking in the space is permitted or not, and making a decision about whether to park in the space or not. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “mental processes” grouping of abstract ideas. (Step2A-Prong 1: YES. The claims are abstract)

This judicial exception is not integrated into a practical application. Limitations that are
not indicative of integration into a practical application include: (1) Adding the words "apply it"
(or an equivalent) with the judicial exception, or mere instructions to implement an abstract
idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP
2106.05.f), (2) Adding insignificant extra-solution activity to the judicial exception (MPEP
2106.05.g), (3) Generally linking the use of the judicial exception to a particular technological
environment or field of use (MPEP 2106.05.h). 
	In particular, the claims recite additional elements of using one or more processors comprising processing circuitry to perform the recited steps. The one or more processors are recited at a high-level of generality (i.e., as generic processors performing generic computer functions) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. The claims recite additional elements of a vision language model (VLM). The VLM is used to generally apply the abstract idea without placing any limits on how the VLM functions. Rather, these limitations only recite the outcome of “prompting… to generate one or more responses” and do not include any details about how the “prompting… to generate” are accomplished. See MPEP 2106.05(f). The recitation of “prompting a vision language model (VLM) to generate…” also merely indicates a field of use or technological environment in which the judicial exception is performed. Although the additional element “prompting a vision language model (VLM) to generate…” limits the identified judicial exceptions, this type of limitation merely confines the use of the abstract idea to a particular technological environment (language models) and thus fails to add an inventive concept to the claims. See MPEP 2106.05(h). The limitation of controlling, using an ADAS, one or more parking operations based on the one or more responses is also considered an additional element. The ADAS is recited at a high-level of generality, and thus is considered insignificant extra-solution activity. The step of controlling one or more parking operations, under broadest reasonable interpretation (see claim 9 for example) is considered a visual or audible output (i.e., a generic output in response to the generating step) and is therefore considered insignificant post-solution activity. Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore claim 1 is directed to an abstract idea without a practical application. (Step 2A-Prong 2: NO. The additional claimed elements are not integrated into a practical application) 

The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered separately and as an ordered combination, they do not add significantly more (also known as an "inventive concept") to the exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amounts to no more than generally linking the use of the judicial exception to a particular technological environment or field of use. The additional elements claimed amount to insignificant extra-solution activities. See 2106.05(g) for more details. Generally linking the use of the judicial exception to a particular technological environment or field of use, cannot provide an inventive concept- rendering the claim patent ineligible. Thus claim 1 (and similarly claims 13 and 19) is not patent eligible. (Step 2B: NO. The claims do not provide significantly more) 

Claims 2-12, 14-18, and 20 further define the abstract idea that is present in their respective independent claims and hence are abstract for at least the reasons presented above. The dependent claims do not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception when considered both individually and as an ordered combination. Therefore, the dependent claims are directed to an abstract idea. Thus, the aforementioned claims are not patent-eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 12-16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yalla (US 2018/0321685 A1) in view of Uziel (US 2025/0162613 A1, cited in the IDS dated 9/3/2025).

Regarding claim 1, Yalla discloses one or more processors comprising processing circuitry (see at least [0019] – controller 150 may be implemented as one or more processors) to: identify image data generated using one or more cameras of an ego-machine and representing one or more parking signs (see at least [0014, 0027-0029] – sensors 212 may use digital photographic imaging, computer vision, and the like to determine the position and/or identities of objects relative to autonomous vehicle 204... detect sign 208); prompt to generate one or more responses indicating whether parking is permitted in one or more candidate parking spaces based at least on the image data representing the one or more parking signs (see at least [0027-0029, 0031] - sensors 212 may use digital photographic imaging, computer vision, and the like to determine the position and/or identities of objects relative to autonomous vehicle 204... detect sign 208… determine whether or not the autonomous vehicle 204 may be permissioned to park in parking place 206); and control, using an Advanced Driver Assistance System (ADAS) of the ego-machine, one or more parking operations of the ego-machine with respect to at least one candidate parking space of the one or more candidate parking spaces based at least on the one or more responses (see at least [0029, 0031] – when the autonomous vehicle 204 may be permissioned to park in parking place 206 the vehicle control unit 220 initiates a command to position autonomous vehicle 204 in parking place 206… when autonomous vehicle 204 is not permissioned to park in parking place 206, vehicle-parking controller 156 can then initiate a command to eliminate parking place 206 as a parking option and to continue to search for a permissioned parking place).  
	Yalla does not appear to explicitly disclose a vision-language model (VLM).
	Uziel, in the same field of endeavor, teaches the following limitations: prompt a vision-language model (VLM) to generate one or more responses (see at least [0044-0045, 0054-0056] – vision language model).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Uziel into the invention of Yalla with a reasonable expectation of success for the purpose of enhancing perception and decision making abilities of the vehicle (Uziel – [0031]). This is applying a known technique (VLM) to a known application (sign recognition in vehicle applications) ready for improvement to yield predictable results. Uziel’s vision-language model is an improvement over Yalla’s computer vision/text recognition since VLMs bridge vision and text, enabling complex multimodal tasks. This is particularly advantageous in vehicle applications where the environment and scenarios are rapidly changing and extremely complex.

Regarding claim 2, Yalla discloses wherein the processing circuitry is further to initiate monitoring for the one or more parking signs based at least on the ego-machine entering a detected parking mode (see at least [0020-0025] - Autonomous vehicle 204 may schedule routing to arrive at City Hall at 1:30 PM to initiate a search for parking within a range of distance from which the passenger may walk (e.g., at a predicted rate of walking speeds) from a parking space to the destination. Upon arrival at City Hall, autonomous vehicle 204 can initiate a search for available parking beginning at the entrance of City Hall and continuing along the perimeter of the nearest city block until an available parking place is determined.).  

Regarding claim 3, Yalla discloses wherein the processing circuitry is further to detect a parking domain of the ego-machine by performing at least one of (BRI requires only one of the following): using a mapping application (see at least [0020-0025, 0033] - Autonomous vehicle 204 may schedule routing to arrive at City Hall at 1:30 PM to initiate a search for parking within a range of distance from which the passenger may walk (e.g., at a predicted rate of walking speeds) from a parking space to the destination. Upon arrival at City Hall, autonomous vehicle 204 can initiate a search for available parking beginning at the entrance of City Hall and continuing along the perimeter of the nearest city block until an available parking place is determined… Utilizing computer vision, digital photographic imaging, text recognition, onboard database(s) and systems, external database(s), and/or GPS location, autonomous vehicle 204 can identify one or more sign details 210 related to multiple signs to determine permissioned parking relative to multiple classes of restricted parking. Thus, based on sign details 210, Saturday at 11:00 AM would be a permissioned time to park in parking place 206. However, if a temporary additional sign (not depicted), such as a construction sign (e.g., “NO PARKING on Tuesday”), is present adjacent to parking place 206, one or more sensors 212 can detect sign details 210 to provide permission and/or restriction data related to parking place 206. Based on sign details 210 of at least one sign 208, vehicle control unit 220 will initiate a command to park autonomous vehicle 204 in parking place 206 or to initiate a command to continue to search for permissioned parking.), or prompting the VLM to detect the parking domain based at least on one or more frames comprising at least some of the image data.  

Regarding claim 4, Yalla discloses wherein the processing circuitry is further to detect the one or more parking signs based at least on detecting one or more classes of parking signs associated with a detected parking domain of the ego-machine (see at least [0033] - Utilizing computer vision, digital photographic imaging, text recognition, onboard database(s) and systems, external database(s), and/or GPS location, autonomous vehicle 204 can identify one or more sign details 210 related to multiple signs to determine permissioned parking relative to multiple classes of restricted parking. Thus, based on sign details 210, Saturday at 11:00 AM would be a permissioned time to park in parking place 206. However, if a temporary additional sign (not depicted), such as a construction sign (e.g., “NO PARKING on Tuesday”), is present adjacent to parking place 206, one or more sensors 212 can detect sign details 210 to provide permission and/or restriction data related to parking place 206. Based on sign details 210 of at least one sign 208, vehicle control unit 220 will initiate a command to park autonomous vehicle 204 in parking place 206 or to initiate a command to continue to search for permissioned parking.). 

Regarding claim 12, Yalla discloses wherein the one or more processors are comprised in at least one of (BRI requires only one of the following): a control system for an autonomous or semi-autonomous machine (see at least [0014, 0019] - autonomy controller 150 including a sensor fusion module 154, an ancillary sensor manager 110, a vehicle-parking controller 156, and a vehicle control unit 113); a perception system for an autonomous or semi-autonomous machine (see at least [0014, 0019] - autonomy controller 150 including a sensor fusion module 154, an ancillary sensor manager 110, a vehicle-parking controller 156, and a vehicle control unit 113); a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational Al operations; a system implementing one or more language models; a system implementing one or more large language models (LLMs); a system implementing one or more vision language models (VLMs); a system for generating synthetic data; a system for generating synthetic data using Al; a system for performing one or more generative Al operations; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources.  
Uziel, in the same field of endeavor, also teaches the following limitations: a system implementing one or more vision language models (VLMs) (see at least [0044-0045, 0054-0056]). 
The motivation to combine Yalla and Uziel are the same as in the rejection of claim 1 above.

Regarding claims 13 and 19, all the limitations have been analyzed in view of claim 1, and it has been determined that claims 13 and 19 do not teach or define any new limitations beyond those previously recited in claim 1; therefore, claims 13 and 19 are also rejected over the same rationale as claim 1.

Regarding claim 14, all the limitations have been analyzed in view of claim 2, and it has been determined that claim 14 does not teach or define any new limitations beyond those previously recited in claim 2; therefore, claim 14 is also rejected over the same rationale as claim 2.

Regarding claim 15, all the limitations have been analyzed in view of claim 3, and it has been determined that claim 15 does not teach or define any new limitations beyond those previously recited in claim 3; therefore, claim 15 is also rejected over the same rationale as claim 3.

Regarding claim 16, all the limitations have been analyzed in view of claim 4, and it has been determined that claim 16 does not teach or define any new limitations beyond those previously recited in claim 4; therefore, claim 16 is also rejected over the same rationale as claim 4.

Regarding claims 18 and 20, all the limitations have been analyzed in view of claim 12, and it has been determined that claims 18 and 20 do not teach or define any new limitations beyond those previously recited in claim 12; therefore, claims 18 and 20 are also rejected over the same rationale as claim 12.

Claims 5-7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Yalla in view of Uziel and Chen (CN 116052182 A, a machine translation is attached and being relied upon).

Regarding claim 5, Yalla does not appear to explicitly disclose wherein the processing circuitry is further to verify legibility of the one or more parking signs based at least on one or more detected regions of interest representing the one or more parking signs.
However, Yalla does disclose wherein the processing circuitry is further to identify the one or more parking signs based at least on one or more detected regions of interest representing the one or more parking signs (see at least [0027-0029, 0031] - sensors 212 may use digital photographic imaging, computer vision, and the like to determine the position and/or identities of objects relative to autonomous vehicle 204... detect sign 208… determine whether or not the autonomous vehicle 204 may be permissioned to park in parking place 206).
Chen, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to verify legibility of the one or more detected regions of interest (see at least [0048-0062] – text detection performed on a target object in the current video frame to obtain text detection information… based on the text detection confidence score of the text region to be identified, a quality score for the text region to be identified is determined).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Chen into the invention of Yalla with a reasonable expectation of success for the purpose of reducing the amount of time required to recognize text in video frames in the field of transportation (Chen – [0002-0004, 0034]).

Regarding claim 6, Yalla does not appear to explicitly disclose wherein the processing circuitry is further to prompt the VLM to verify legibility of the one or more parking signs.  
However, Yalla does disclose wherein the processing circuitry is further to prompt to identify the one or more parking signs (see at least [0027-0029, 0031] - sensors 212 may use digital photographic imaging, computer vision, and the like to determine the position and/or identities of objects relative to autonomous vehicle 204... detect sign 208… determine whether or not the autonomous vehicle 204 may be permissioned to park in parking place 206).
	Uziel, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to prompt the VLM to identify the one or more parking signs (see at least [0044-0045, 0054-0056]). 
The motivation to combine Yalla and Uziel are the same as in the rejection of claim 1 above.
	Chen, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to prompt to verify legibility of one or more detected regions of interest (see at least [0048-0062] – text detection performed on a target object in the current video frame to obtain text detection information… based on the text detection confidence score of the text region to be identified, a quality score for the text region to be identified is determined).
The motivation to combine Yalla and Chen are the same as in the rejection of claim 5 above.

Regarding claim 7, Yalla does not appear to explicitly disclose wherein the processing circuitry is further to: cache the image data representing the one or more parking signs based at least on verifying legibility of the one or more parking signs, and prompt the VLM to evaluate the cached image data in response to detecting the one or more candidate parking spaces.  
However, Yalla does disclose wherein the processing circuitry is further to: prompt to evaluate the image data in response to detecting the one or more candidate parking spaces (see at least [0025-0034] - In an embodiment, autonomous vehicle 204 utilizes one or more sensors 212 to determine a permissioned parking place 206 relative to multiple classes of restricted parking. Examples of computations or logic that can be used to determine permissions for a parking place include cross-referencing databases of information related to types of parking at various locations. For instance, a central database of disabled person parking locations and their respective GPS positions may be analyzed with data received from one or more sensors 212 to confirm that a parking place is reserved for disabled persons. Examples of restricted parking may be: disabled person parking, commercial parking only, time-restricted parking, loading zone only, employee only, customer parking only, residents only, and the like. Examples of permissioned parking can be: a parking place not subject to a parking restriction, paid parking, parking lots, parking garages, reserved parking for a passenger who possesses a permission to park such as an identifying placard, sign, badge, identification card, sticker, etc. Likewise, various embodiments can employ similar technology for use on city streets, in parking garages, in parking lots, etc.).  
Uziel, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to prompt the VLM to evaluate the image data (see at least [0044-0045, 0054-0056]). 
The motivation to combine Yalla and Uziel are the same as in the rejection of claim 1 above.
	Chen, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to: cache the image data representing one or more detected regions of interest based at least on verifying legibility of the one or more detected regions of interest (see at least [0063-0068] – selecting the text region to be identified corresponding to the highest quality score in the current video frame…text region to be recognized corresponding to the target object in the current video frame is stored into the cached image sequence to obtain the updated cached image sequence), and prompt to evaluate the cached image data (see at least [0069-0073] – video frame corresponding to the highest quality score in the updated cached image sequence is selected and input into the text recognition model for text recognition).
The motivation to combine Yalla and Chen are the same as in the rejection of claim 5 above.

Regarding claim 17, all the limitations have been analyzed in view of claim 5, and it has been determined that claim 17 does not teach or define any new limitations beyond those previously recited in claim 5; therefore, claim 17 is also rejected over the same rationale as claim 5.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Yalla in view of Uziel and Borras (US 2021/0183169 A1).

Regarding claim 8, Yalla does not appear to explicitly disclose wherein the processing circuitry is further to prompt the VLM to evaluate whether parking is permitted in the one or more candidate parking spaces based at least on one or more geo-tagged parking permits.  
Borras, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to evaluate whether parking is permitted in the one or more candidate parking spaces based at least on one or more geo-tagged parking permits (see at least Figs. 10, 14, [0047, 0078-0079, 0082-0083] – determine whether the vehicle is parked in parking space based on geo-location coordinates… user has an active parking account… an indicator can be activated in the vehicle or the cellular phone device to indicate that the vehicle is properly parked and payment will be accepted to avoid a parking violation).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Borras into the invention of Yalla with a reasonable expectation of success. The motivation of doing so is to derive high geo-location accuracy determination for dynamically defined tolling lanes and parking spaces for mobile payments (Borras – [0002]). This method would save vehicle occupants time in evaluating whether or not they can park in a space.

Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over Yalla in view of Uziel and Malczyk (US 2020/0278218 A1).

Regarding claim 9, Yalla does not appear to explicitly disclose wherein the one or more parking operations of the ego-machine comprise outputting at least one of a visual or an audible representation of whether parking is permitted in the one or more candidate parking spaces.  
Malczyk, in the same field of endeavor, teaches the following limitations: wherein the one or more parking operations of the ego-machine comprise outputting at least one of a visual or an audible representation of whether parking is permitted in the one or more candidate parking spaces (see at least Fig. 3B, [0041-0042] – display an icon for the parking bay in which parking is or is not permitted).
	It would have been obvious to one of ordinary skill in the art before the effective filing date to have incorporated the teachings of Malczyk into the invention of Yalla with a reasonable expectation of success for the purpose of enabling the user to easily obtain parking information, such as whether parking is permitted in certain bays and also the price of parking in a bay (Malczyk – [0017, 0041-0042]).

Regarding claim 10, Yalla does not appear to explicitly disclose wherein the processing circuitry is further to prompt the VLM to determine a cost to park in the one or more candidate parking spaces for a designated duration of time.
Uziel, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to prompt the VLM (see at least [0044-0045, 0054-0056]). 
The motivation to combine Yalla and Uziel are the same as in the rejection of claim 1 above.
Malczyk, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to determine a cost to park in the one or more candidate parking spaces for a designated duration of time (see at least Fig. 3B, [0042] – display the price of parking in the bay).
	The motivation to combine Yalla and Malczyk is the same as in the rejection of claim 9 above.

	Regarding claim 11, Yalla does not appear to explicitly disclose wherein the processing circuitry is further to output at least one of a visual or an audible representation of a cost to park in the one or more candidate parking spaces determined using the VLM.
Uziel, in the same field of endeavor, teaches the following limitations: the VLM (see at least [0044-0045, 0054-0056]). 
The motivation to combine Yalla and Uziel are the same as in the rejection of claim 1 above.
Malczyk, in the same field of endeavor, teaches the following limitations: wherein the processing circuitry is further to output at least one of a visual or an audible representation of a cost to park in the one or more candidate parking spaces determined (see at least Fig. 3B, [0042] – display the price of parking in the bay).
	The motivation to combine Yalla and Malczyk is the same as in the rejection of claim 9 above.

	
Conclusion
The prior art made of record, and not relied upon, considered pertinent to applicant’s disclosure or directed to the state of art is listed on the enclosed PTO-982. The following is a brief description for relevant prior art that was cited but not applied:
Graefe (US 2021/0114586 A1) is directed to systems and methods for automated driving vehicle parking detection are described herein. The systems and methods are directed to detecting an available space, the available space comprising a space dimension larger than the automated driving vehicle, detecting a road marker associated with the available space, and determining that the available space is not a parking space based on the road marker and the space dimension. The systems and methods are also directed to guiding the automated driving vehicle to park in the available space in response to determining that the available space is not the parking space, and activating an occlusion prevention subsystem.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAITLIN MCCLEARY whose telephone number is (703)756-1674. The examiner can normally be reached Monday - Friday 10:00 am - 7:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Navid Z Mehdizadeh can be reached at (571) 272-7691. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/C.R.M./Examiner, Art Unit 3669                                                                                                                                                                                                        
/NAVID Z. MEHDIZADEH/Supervisory Patent Examiner, Art Unit 3669
Read full office action
Prosecution Timeline

Aug 01, 2024
Application Filed
Dec 11, 2025
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/159,454
Patent 12589771
VEHICLE CONTROL DEVICE, STORAGE MEDIUM FOR STORING COMPUTER PROGRAM FOR VEHICLE CONTROL, AND METHOD FOR CONTROLLING VEHICLE
2y 5m to grant Granted Mar 31, 2026
17/671,719
Patent 12583670
LIFT ARM ASSEMBLY FOR A FRONT END LOADING REFUSE VEHICLE
2y 5m to grant Granted Mar 24, 2026
18/660,237
Patent 12552379
STAGGERING DETERMINATION DEVICE, STAGGERING DETERMINATION METHOD, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 17, 2026
17/972,782
Patent 12539840
SYSTEM AND METHOD FOR PROBING PROPERTIES OF A TRAILER TOWED BY A TOWING VEHICLE IN A HEAVY-DUTY VEHICLE COMBINATION
2y 5m to grant Granted Feb 03, 2026
17/794,529
Patent 12509934
Sensor Device
2y 5m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
57%
Grant Probability
89%
With Interview (+32.0%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 95 resolved cases by this examiner. Grant probability derived from career allow rate.