Last updated: April 19, 2026
Application No. 18/474,591
USING LANGUAGE MODELS TO VERIFY MAP DATA IN MAP GENERATION SYSTEMS AND APPLICATIONS

Final Rejection §101§103§112
Filed
Sep 26, 2023
Examiner
HUTCHESON, CODY DOUGLAS
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
2 (Final)
Interview Optional

— +47.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

HUTCHESON, CODY DOUGLAS View full profile →
Grants 62% of resolved cases
Career Allow Rate
15 granted / 24 resolved
+0.5% vs TC avg
Strong +47% interview lift
Without
With
+47.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
34 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
33.9%
-6.1% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
7.5%
-32.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
1. Regarding the objections to claims 5, 12, and 13, Applicant’s has amended the claims to address the minor informalities. Accordingly, the objections have been withdrawn.

2. Regarding the rejection of claims 19 and 20 under 35 USC § 112(b), Applicant has amended each claim. Claim 20 has been amended to now recite “the system”, which has proper antecedent basis provided in claim 16. Accordingly, the rejection of claim 20 under 35 USC § 112(b) has been withdrawn. Claim 19 has been amended to recite the “first, second, and third language models”, which each have antecedent basis. While the rejection of claim 19 under 35 USC § 112(b) has been withdrawn, the claim is objected to due to minor informalities.

3. Regarding the rejection of claims 1-20 under 35 USC § 101, Applicant's arguments filed 12/09/2025 on pgs. 9-17 of the Remarks have been fully considered but they are not persuasive. 

Applicant first argues on pgs. 11-12 that claim 1 does not recite an abstract idea under Step 2A Prong 1, as a human cannot practically perform the claimed language-model processing, tokenized-description generation, issue detection based on token sequences, or map-data updating for ingestion by a machine-readable repository, which would require trained machine learning models, specialized computation, and algorithmic processing (see pg. 12, 2nd para). The Examiner respectfully disagrees. Claim 1 as currently amended contains several limitations which under their broadest reasonable interpretation can be performed by a person mentally with the aid of pen and paper. A person can write down a tokenized description for a preliminary map section (e.g. write down what they see in the map, such as buildings, roads, trees, etc., and how different components interact/connect). Furthermore, a person can identify issues in the map using this written tokenized description (e.g. can determine that there is a missing stop sign for an intersection which should have one). Based on this determination, a person can further generate textual representations/recommendations regarding the issues (e.g. can decide that the map can be fixed by adding the missing stop sign), and can update the map using this representation/recommendation (e.g. can draw on top of map to add in the missing sign). Therefore, claim 1 recites abstract ideas in the form of mental processes.
Applicant further argues on pgs. 12-14 that claim 1 integrates the alleged abstract ideas into a practical application under Step 2A Prong 2. Specifically, Applicant argues that claim 1 improves a technical field related to computer-implemented map validation, reconstructions, and automated generation of machine-readable environmental representations by generating tokenized descriptions using a language model, identifying issues using semantic and topological reasoning, generating textual recommendations using a second language model, and updating map data (see pg. 14, 1st para.). Applicant further argues that the claims reflect improvements to traditional map generation and map verification by reciting a concrete sequence of steps that solve the specifical technical problems through automated tokenization, semantic and topological inference, issue detection, and map-data updating (see pg. 15, 1st and 2nd para.). The Examiner respectfully disagrees. Under Step 2A Prong 2 analysis, additional elements are identified to see if the additional elements alone or viewed in combination integrate the judicial exception into a practical application. The only additional limitations currently recited in claim 1 are mere instructions to implement the judicial exception using a generic computer. Specifically, the step of generating a tokenized description of a map, which can be performed as a mental process, is being performed by “a first language model”. The limitation is recited broadly, and merely amounts to implementing a mental process “based at least on a first language model”. Furthermore, the step of generating textual representations and textual recommendation is similarly performed “based at least on a second language model”, which merely implements a mental process using a generic computer component. The limitation of “a machine-readable map repository used by an autonomous or semi-autonomous navigation system” is similarly recited broadly, and further recites generic computer components. None of the identified additional elements integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the mental process.  Further detail regarding the structural components would be needed in claim 1 in order to integrate the judicial exception into a practical application. Applicant’s arguments on pgs. 15-17 of claim 1 being directed toward a specific implementation of a solution to a problem in software is not persuasive for the same reasons as stated above. Applicant’s arguments on pg. 17 regarding claims 2-20 are not persuasive for the same reasons as stated above.

Hence, Applicant’s arguments regarding the rejection under 35 U.S.C. § 101 are not persuasive.

4. Regarding the following rejections:
Of claims 1-4 and 7-9 under 35 U.S.C. § 102(a)(2) over Xie
Of claim 5 under 35 U.S.C. § 103 over Xie in view of Vala
Of claims 6 and 10-20 under 35 U.S.C. § 103 over Xie in view of Barut
Applicant’s arguments with respect to the above rejections have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Objections
5. Claim 19 is objected to because of the following informalities:
In claim 19, “wherein at least two of the first, second, and third language models, the second language model, and the third language model are portions of a single language model” is recited. The claim is currently unclear as both the second language model and the third language model are recited twice in the list to choose from (see above bold for emphasis), and it is difficult to interpret the “at least two of” limitation as it is written. For example, one possible way of interpreting the claim is at least two of:
(i) the first, second, and third language models (all three models)
(ii) the second language model
(iii) the third language model
It would not make sense to select, for example, (i) and (ii) from the above list as (i) already includes all three models. The Examiner recommends amending the claim for clarity purposes, such as by instead reciting “wherein at least two of the first language model, the second language model, and the third language model are portions of a single language model”.
The Examiner further notes that claim 19 recites the “first…language model”, which is not the same terminology introduced in independent claim 16 (which only refers to a language model, not “a first language model”), and thus recommends keeping terminology for this language model consistent between the claims for clarity purposes.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

6. Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.

Independent claim 1 was amended to recite “wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system”. Independent claim 10 was amended to recite “wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system”. Independent claim 16 was amended to recite “wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system”. These features were not described in the specification in such a way as to reasonably convey to a person of ordinary skill in the art that the inventor(s) had possession of the claimed invention at the time the application was filed. The specification describes storing map data to a map repository (see, for example, para. 0085), but does not specifically describe that the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system, as recited by the claims. These limitations are therefore new matter and rejected for inadequate written description. Dependent claims 2-9, 11-15, and 17-20 inherit the rejections from independent claims 1, 10, and 16 respectively.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


7. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

	Regarding claim 1, “A method” is recited, which is directed to one of the four statutory categories of invention (process) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
	The following limitations, under their broadest reasonable interpretation, recite mental processes:
generating…based at least one…processing data associated with at least a section of a preliminary map, a tokenized description of the section of the preliminary map: a person looks at a map, and writes down a tokenized description (e.g. words) describing the map
identifying, based in part on the tokenized description, one or more potential issues with respect to the section of the preliminary map: a person uses the description to determine issues in the map
generating, based at least on…processing data for the one or more potential issues, one or more textual representations or textual recommendations regarding the one or more potential issues: a person analyzes the issues they identified in order to write down textual representations or recommendations regarding the issues
updating, using the one or more textual representations or textual recommendations regarding the one or more potential issues, map data of the preliminary map to generate updated map data: a person uses the representations or recommendations to update a map (e.g. adding a drawing to a map to fix a particular issue)
wherein the updated map data is formatted for ingestion: a person writes down the updated map data in a particular format

Claim 1 does not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are “based at least on a first language model” and “based at least on a second language model”, and “wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system”. These limitations are recited broadly and amounts to mere instructions to implement the judicial exception using a generic computer, which does not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, claim 1 is directed to an abstract idea.
Claim 1 does not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the additional limitations amount to mere instructions to implement the judicial exception using a generic computer, which do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claim 1 is not patent eligible.

	Regarding dependent claims 2-9, “The method” is recited, which is directed to one of the four statutory categories of invention (process) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
	The following limitations, under their broadest reasonable interpretation, recite mental processes:

	Claim 2:
	Claim 2 contains the additional limitation: “wherein the first language model and the second language model are portions of a single language model”, which amounts to mere instructions to implement the judicial exception using a generic computer.

	Claim 3:
	Claim 3 contains the additional limitation “wherein the identifying of the one or more potential issues is performed using a third language model, wherein the third language model is one of a standalone language model, part of the first language model, part of the second language model, or part of the single language model”, which amounts to mere instructions to implement the judicial exception using a generic computer.

	Claim 4:
wherein the one or more potential issues relate to at least one of a correction of an identified error, an addition of information determined to be absent from the preliminary map, or an enhancement to the preliminary map: a person identifies issues such as correcting an error, adding missing information in the map, or enhancing the map.

Claim 5:
generating, based at least on…processing data associated with a second section of the preliminary map, a tokenized description of the second section of the preliminary map: a person writes down a tokenized description of a viewed map
determining, based in part on the tokenized description of the second section of the preliminary map, that there are no potential modifications to be made to the second section of the preliminary map: a person analyzes the description they wrote and determines that no modifications are to be made
providing indication of validation of the second section of the preliminary map: a person writes down the result (e.g. message saying that the map is validated).
Claim 5 contains the additional limitations “based at least on the first language model”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 6:
presenting the one or more textual representations or textual recommendations regarding the one of more potential issues; and based at least on the presenting, receiving data corresponding to one or more data inputs indicating whether to implement one or more modifications to the map data of the preliminary map: a person presents the results to a user, and obtains data indicating whether or not to implement the modifications

Claim 7:
wherein the tokenized description is a tokenized text string representative of the section of a preliminary map, the tokenized text string including a sequence of tokens associated with objects in the section of the preliminary map: a person writes down a description, including tokens describing objects they see in the map

Claim 8:
wherein the tokenized text string is written in a road topology language (RTL) or a domain specific language (DSL): a person writes down the text string in a domain specific language (writes down the description in a specific format)

Claim 9:
wherein the one or more textual representations or textual recommendations are tokenized text strings: a person writes down the representations or recommendations as a series of text tokens on pen and paper.

Claims 2-9 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). As discussed above, the additional limitations amount to mere instructions to implement the judicial exception using a generic computer, which do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, claims 2-9 are directed to an abstract idea.
Claims 2-9 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 2-9 are not patent eligible.

	Regarding claim 10, “A processor” is recited, which is directed to one of the four statutory categories of invention (machine) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
	The following limitations, under their broadest reasonable interpretation, recite mental processes:
generate… a tokenized description of at least a section of preliminary map data: a person looks at a map, and writes down a tokenized description (e.g. words) describing the map
…determine probability values for individual tokens of the tokenized description: a person can write down an accompanying probability of each token (e.g. probability of the token being correct/no modifications needed).
identify, based at least on the probability values, one or more potential modifications to be made with respect to the section of the preliminary map data: a person uses the values they wrote down to determine whether to identify modifications to be made
update, using the identified one or more potential modifications, the preliminary map data to generate updated map data: a person uses the representations or recommendations to update a map (e.g. adding a drawing to a map to fix a particular issue)
wherein the updated map data is formatted for ingestion: a person writes down the updated map data in a particular format
Claim 10 does not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are “A processor, comprising: one or more circuits to” “using a first language model” and “use a second language model”, and “wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system”. These limitations are recited broadly and amount to mere instructions to implement the judicial exception using a generic computer, which do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, claim 10 is directed to an abstract idea.
Claim 10 does not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the additional limitations are mere instructions to implement the judicial exception using a generic computer, which do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claim 10 is not patent eligible.

	Regarding dependent claims 11-15, “The processor” is recited, which is directed to one of the four statutory categories of invention (machine) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
	The following limitations, under their broadest reasonable interpretation, recite mental processes:

	Claim 11:
generate, based at least on processing the one or more potential modifications, one or more textual representations or textual recommendations regarding the one or more potential modifications: a person analyzes the modifications they identified, and then write down representations or recommendations regarding the issues.
Claim 11 contains the additional limitation “use at least one language model to”, which amounts to mere instructions to implement the judicial exception using a generic computer.

	Claim 12:
wherein the one or more potential modifications relate to at least one of a correction of an identified error, an addition of information determined to be absent from the preliminary map data, or an enhancement to the preliminary map data: a person identifies issues such as correcting an error, adding missing information in the map, or enhancing the map.

Claim 13:
wherein the tokenized description is a tokenized text string representative of the section of a preliminary map data, the tokenized text string including a sequence of tokens associated with objects in the section of the preliminary map data: a person writes down a description, including tokens describing objects they see in the map

Claim 14:
Claim 14 contains the additional limitation: “wherein the first language model and the second language model are portions of a single language model”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 15:
wherein the processor is comprised in at least one of:a system for performing simulation operations;a system for performing simulation operations to test or validate autonomous machine applications;a system for performing digital twin operations;a system for performing light transport simulation;a system for rendering graphical output;a system for performing deep learning operations;a system for performing generative Al operations using a large language model (LLM);a system implemented using an edge device;a system for generating or presenting virtual reality (VR) content;a system for generating or presenting augmented reality (AR) content;a system for generating or presenting mixed reality (MR) content;a system incorporating one or more Virtual Machines (VMs);a system implemented at least partially in a data center;a system for performing hardware testing using simulation;a system for performing generative operations using a language model (LM);a system for synthetic data generation;a collaborative content creation platform for 3D assets; or a system implemented at least partially using cloud computing resources: this limitation is recited at a high level of generality, and amounts to mere instructions to implement the judicial exception using a generic computer.

Claims 11-15 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). As discussed above, the additional limitations amount to mere instructions to implement the judicial exception using a generic computer, which do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, claims 11-15 are directed to an abstract idea.
Claims 11-15 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 11-15 are not patent eligible.

Regarding claim 16, “A system” is recited, which is directed to one of the four statutory categories of invention (machine) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
	The following limitations, under their broadest reasonable interpretation, recite mental processes:
identify one or more modifications to be performed with respect to at least a section of a map based in part on a tokenized description of at least the section of the map: a person looks at a map, and writes down a tokenized description (e.g. words) describing the map: a person looks at a map, identifies modifications to be made to the map based on writing down a tokenized description of the map
update, using the identified one or more modifications, map data of the map to generate updated map data: a person uses the representations or recommendations to update a map (e.g. adding a drawing to a map to fix a particular issue)
wherein the updated map data is formatted for ingestion: a person writes down the updated map data in a particular format

Claim 16 does not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are “A system comprising: one or more processors to” and “use a language model”, and “wherein the update map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system”. These limitations are recited broadly and amounts to mere instructions to implement the judicial exception using a generic computer, which does not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, claim 16 is directed to an abstract idea.
Claim 16 does not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the additional limitations amount to mere instructions to implement the judicial exception using a generic computer, which do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claim 16 is not patent eligible.

Regarding dependent claims 17-20, “The system” is recited, which is directed to one of the four statutory categories of invention (machine) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
	The following limitations, under their broadest reasonable interpretation, recite mental processes:

	Claim 17:
generate the tokenized description of the section of the map: a person writes down a tokenized description describing the map
Claim 17 contains the additional limitation “use a second language model to”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 18:
generate, based at least on processing the one or more modifications, one or more textual representations or textual recommendations regarding the one or more modifications: a person writes down representations or recommendations regarding the modifications.
Claim 18 contains the additional limitation “use a third language model to”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 19:
	Claim 19 contains the additional limitation “wherein the at least two of the first, second, and third language models, the second language model, and the third language model are portions of a single language model”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 20:
wherein the simulation system comprises at least one of:a system for performing simulation operations;a system for performing simulation operations to test or validate autonomous machine applications;a system for performing digital twin operations;a system for performing light transport simulation;a system for rendering graphical output;a system for performing deep learning operations;a system for performing generative AI operations using a large language model (LLM);a system implemented using an edge device;a system for generating or presenting virtual reality (VR) content;a system for generating or presenting augmented reality (AR) content;a system for generating or presenting mixed reality (MR) content;a system incorporating one or more Virtual Machines (VMs);a system implemented at least partially in a data center;a system for performing hardware testing using simulation; a system for performing generative operations using a language model (LM);a system for synthetic data generation;a collaborative content creation platform for 3D assets; or a system implemented at least partially using cloud computing resources: this limitation is recited at a high level of generality, and amounts to mere instructions to implement the judicial exception using a generic computer.

Claims 17-20 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). As discussed above, the additional limitations amount to mere instructions to implement the judicial exception using a generic computer, which do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, claims 17-20 are directed to an abstract idea.
Claims 17-20 do not contain any additional elements which amount to significantly more than the judicial exception. As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 17-20 are not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


8. Claims 1-4 and 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Xie et al. (US 2023/0394855 A1, hereinafter Xie) in view of Levinson & Sibley (US 2017/0248963 A1, hereinafter Levinson).

Regarding claim 1, Xie discloses generating, based at least on a first language model processing data associated with at least a section of a preliminary map (para. 0019 “Architecture 100 intakes an image 102 and has a vision language model 112, a captioner 114, and an object detector 116. Object detector 116 detects a plurality of objects 118 in image 102. Vision language model 112 and captioner 114 produce visual information 120 from image 102 and plurality of objects 118.”), a tokenized description of the section of the preliminary map (para. 0020 “Visual information 120 comprises text that describes what is contained within image 102, such as image tags 122, an initial image caption 124, and object information 126.”; para. 0028 “A baseline graph 304 is generated for an image 302, using either tags for objects automatically detected in image 302”); identifying, based in part on the tokenized description, one or more potential issues with respect to the section of the preliminary map (para. 0026 “Visual clues 130 are provided to generative language model 140, which in some examples, comprises an autoregressive language model that uses deep learning to produce human-like text. Generative language model 140 produces crisp language descriptions that are informative to a user, without being cluttered with irrelevant information from visual clues 130.”); and generating, based at least on a second language model processing data for the one or more potential issues (para. 0027 “In some examples, vision language model 150 directly selects selected story caption 154 from among plurality of image story caption candidates 144, whereas in other examples, vision language model 150 scores plurality of image story caption candidates 144 and a down selection component 152 selects selected story caption 154 based on at least the scores from vision language model 150.”), one or more textual representations or textual recommendations regarding the one or more issues (para. 0025 “A vision language model 150 selects a selected story caption 154”).
Xie does not specifically disclose updating, using the one or more textual representation or textual recommendations regarding the one or more potential issues, map data of the preliminary map to generate updated map data, wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system.
Levinson teaches updating, using the one or more textual representation or textual recommendations regarding the one or more potential issues (para. 0066 “For example, perception engine 366 may be able to detect and classify external objects as pedestrians, bicyclists, dogs, other vehicles, etc. (e.g., perception engine 366 is configured to classify the objects in accordance with a type of classification, which may be associated with semantic information, including a label). Based on the classification of these external objects, the external objects may be labeled as dynamic objects or static objects. For example, an external object classified as a tree may be labeled as a static object, while an external object classified as a pedestrian may be labeled as a static object. External objects labeled as static may or may not be described in map data. Examples of external objects likely to be labeled as static include traffic cones, cement barriers arranged across a roadway, lane closure signs, newly-placed mailboxes or trash cans adjacent a roadway, etc. Examples of external objects likely to be labeled as dynamic include bicyclists, pedestrians, animals, other vehicles, etc. If the external object is labeled as dynamic, and further data about the external object may indicate a typical level of activity and velocity, as well as behavior patterns associated with the classification type.”; para. 0109 “Classifier 2360 is configured to identify an object and to classify that object by classification type (e.g., as a pedestrian, cyclist, etc.) and by energy/activity (e.g. whether the object is dynamic or static), whereby data representing classification is described by a semantic label.”), map data of the preliminary map to generate updated map data (para. 0157 “Data change detector 3653 is configured to detect changes in data sets 3655a and 3655b, which are examples of any number of data sets of 3-D map data. Data change detector 3653 also is configured to generate data identifying a portion of map data that has changed, as well as optionally identifying or classifying an object associated with the changed portion of map data. …At time, T2, however, data change detector 3653 may detect that another number of data sets, including data set 3655b, includes data representing the presence of external objects in portions of map data 3665 of 3-D model data 3661, whereby portions of map data 3665 coincide with portions of map data 3664 at different times. Therefore, data change detector 3653 may detect changes in map data, and may further adaptively modify map data to include the changed map data (e.g., as updated map data).”; para. 0159 “As shown, map data 3692 stored map repository 3605a is associated with, or linked to, indication data (“delta data”) 3694 that indicated that an associated portion of map data has changed. Further to the example shown, indication data 3694 may identify a set of traffic cones, as changed portions of map data 3665, disposed in a physical environment associated with 3-D model 3661 through which an autonomous vehicle travels.”; para. 0160 “A tile generator 3656 may be configured to generate two-dimensional or three-dimensional map tiles based on map data from data sets 3655a and 3655b. The map tiles may be transmitted for storage in map repository 3605a. Tile generator 3656 may generate map tiles that include indicator data for indicating a portion of the map is an updated portion of map data.”), wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system (para. 0160 “Tile generator 3656 may generate map tiles that include indicator data for indicating a portion of the map is an updated portion of map data. Further, an updated map portion may be incorporated into a reference data repository 3605 in an autonomous vehicle…”).
Xie and Levinson are considered to be analogous to the claimed invention as
they both are in the same field of image processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie to incorporate the teachings of Levinson in order to specifically update using the one or more textual representations or textual recommendations regarding the one or more potential issues, map data of the preliminary map to generate updated map data, wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system. Doing so would be beneficial, as this would enhance the accuracy of localization functions for autonomous vehicles (Levinson, para. 0146).
	
Regarding claim 2, Xie in view of Levinson discloses wherein the first language model and the second language model are portions of a single language model (Xie, para. 0025 “In some examples, vision language model 112 and vision language model 150 both comprise a common vision language model.”).

Regarding claim 3, Xie in view of Levinson discloses wherein identifying of the one or more potential issues is performed using a third language model, wherein the third language model is one of a standalone model, part of the first language model, part of the second language model, or part of the single language model (Xie, standalone model: para. 0026 “Visual clues 130 are provided to generative language model 140, which in some examples, comprises an autoregressive language model that uses deep learning to produce human-like text.”).

Regarding claim 4, Xie in view of Levinson discloses wherein the one or more potential issues relate to at least one of a correction of an identified error, an addition of information determined to be absent from the preliminary map, or an enhancement to the preliminary map (Xie, para. 0018 “A rich semantic representation of an input image, such as image tags, object attributes and locations, and captions, is constructed as a structured textual prompt, termed “visual clues”, using a vision foundation model.”; para. 0021 “A generative language model 140 generates a plurality of image story caption candidates 144, which includes story captions 141, 142, and 143, from visual information 120, which includes, from visual clues 130.”).

Regarding claim 7, Xie in view of Levinson discloses wherein the tokenized description is a tokenized text string representative of the section of a preliminary map (Xie, para. 0020 “Visual information 120 comprises text that describes what is contained within image 102, such as image tags 122, an initial image caption 124, and object information 126.”), the tokenized text string including a sequence of tokens associated with objects in the section of the preliminary map (Xie, para. 0020 “In some examples, image tags 122 includes one or more tags identifying objects within image 102, and object information 126 includes additional tags, captions, attributes, and locations for objects within image 102. Visual information 120 also includes visual clues 130 that is based on at least image tags 122, initial image caption 124, and object information 126. Visual clues 130 is a semantic representation of image 102 and comprises semantic components from object and attribute tags to localized detection regions and region captions.”).

Regarding claim 8, Xie in view of Levinson discloses wherein the tokenized text string is written in a road topology language (RTL) or a domain specific language (Xie, embedded visual cues reads on a domain specific language: para. 0020 “Visual clues 130 is a semantic representation of image 102 and comprises semantic components from object and attribute tags to localized detection regions and region captions.”).

Regarding claim 9, Xie in view of Levinson discloses wherein the one or more textual representations or textual recommendations are tokenized text strings (Xie, para. 0021 “A vision language model 150 selects a selected story caption 154, which was previously story caption 141”; Fig. 2, selected caption 154).

9. Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Xie in view of Levinson, and further in view of Vala et al. (US 2023/0113292, hereinafter Vala).

Regarding claim 5, Xie in view of Levinson discloses generating, based at least on the first language model processing data associated with a second section of the preliminary map, a tokenized description of the second section of the preliminary map (Xie, para. 0019 “Architecture 100 intakes an image 102 and has a vision language model 112, a captioner 114, and an object detector 116. Object detector 116 detects a plurality of objects 118 in image 102. Vision language model 112 and captioner 114 produce visual information 120 from image 102 and plurality of objects 118.”; para. 0020 “Visual information 120 comprises text that describes what is contained within image 102, such as image tags 122, an initial image caption 124, and object information 126.”; para. 0028 “A baseline graph 304 is generated for an image 302, using either tags for objects automatically detected in image 302”).
Xie in view of Levinson does not specifically disclose determining, based in part on the tokenized description of the second section of the preliminary map, that there are no potential modifications to be made to the second section of the preliminary map; and providing indication of validation of the second section of the preliminary map.
Vala teaches determining, based in part on the tokenized description of the second section of the preliminary map (para. 0068 “Further, the electronic device (100) extracts the tabular features at operation 509 from the table blocks and the textual features at operation 508 (i.e. second text) from the text blocks using an Optical Character Recognition (OCR) method.”; para. 0069 “Further, the electronic device (100) extracts the visual features from the image blocks and concatenates at operation 511 the visual features with the textual features, the tabular features, the text and the context.”), that there are no potential modifications to be made to the second section of the preliminary map (para. 0062 “At operation 302, the method includes analyzing the content with reference to the determined context. At operation 303, the method includes identifying the portions of the content as non-shareable based on the analysis. At operation 304, the method includes suggesting the modification action on the portions identified as non-shareable.”; para. 0097 “At 1205, upon detecting the private medium and the private class of the contact, the electronic device (100) shares the image of the financial card without any modification to the known contact (1203) through the private messaging application (1103).”); and providing indication of validation of the second section of the preliminary map (para. 0097 “At 1205, upon detecting the private medium and the private class of the contact, the electronic device (100) shares the image of the financial card without any modification to the known contact (1203) through the private messaging application (1103). At 1206, even the medium is private, but due to the public class of the contact, the electronic device (100) shares the image of the financial card to the unknown contact (1204) through the private messaging application (1103) after pixilating the critical information in the image.”; image without modifications (e.g. pixelated regions) indicates validation (no potential modifications are proposed)).
Xie, Levinson, and Vala are considered to be analogous to the claimed invention as they are in the same field of image processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie in view of Levinson to incorporate the teachings of Vala in order to determine based on a tokenized description that no potential modifications are to be made with respect to the preliminary map section and provide an indication of validation of the preliminary map section. Doing so would be beneficial, as this would allow for modifications to be made to those images requiring modification without a person having to manually perform the modification themselves, improving user experience (para. 0004).

10. Claims 6, and 10-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xie in view Levinson, and further in view of Barut et al. (US 12,045,288, hereinafter Barut).

Regarding claim 6, Xie in view of Levinson discloses presenting the one or more textual representations or textual recommendations regarding the one or more potential issues (Xie, para. 0022 “Example applications for leveraging the capability of architecture 100, which may be instructed using caption focus 132, include visual storytelling, automatic advertisement generation, social media posting, background explanation, accessibility, and machine learning (ML) training data annotation.”; para. 0023 “For automatic advertisement generation, a seller uploads image 104 and caption focus 132 may be “Write a product description to sell in an online marketplace.” In some examples, caption focus 132 may indicate a number of different objects within image 104 for which to generate an advertisement. For a social media posting, the actual posting may be performed by a bot, and caption focus 132 may be “Social media post.” In some applications, the user may wish to edit selected story caption 154.”).
Xie in view of Levinson does not specifically disclose based at least on the presenting, receiving data corresponding to one or more inputs indicating whether to implement one or more modifications to the map data of the preliminary map.
Barut teaches based at least on the presenting, receiving data corresponding to one or more inputs indicating whether to implement one or more modifications to the map data of the preliminary map (Col. 16 Lines 8-29 “In some examples, the ranking component 320 may determine that two different skills are equally applicable for processing the input data. In such examples, the decider engine 332 may determine that disambiguation should occur. Accordingly, the routing plan 334 may include sending the input data to a dialog skill 352 that may output (via TTS) one or more questions (e.g., a disambiguation request) used to prompt the user to disambiguate between the two equally likely (or approximately equally likely) interpretations of the input data. For example, it may be unclear, based on a user's request, whether the user intended to invoke a movie playback skill or a music playback skill, as a movie and a soundtrack for the movie may be identified using the same name. …the dialog skill 352 may inquire whether the user intended to play the movie or the soundtrack.”).
Xie, Levinson, and Barut are considered to be analogous to the claimed invention as they are in the same field of image processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie in view of Levinson to incorporate the teachings of Barut in order to receive data corresponding to one or more inputs indicating whether to implement one or more modifications to the preliminary map data based at least on the presenting. Doing so would be beneficial, as this would allow for disambiguation to occur when it is unclear which modification should be performed (Col. 16 Lines 8-29), ensuring that the desired action is performed which improves user experience.

Regarding claim 10, Xie discloses A processor (Fig. 6, 614), comprising: one or more circuits to (para. 0084 “Processor(s) 614 may include any quantity of processing units that read data from various entities, such as memory 612 or I/O components 620. Specifically, processor(s) 614 are programmed to execute computer-executable instructions for implementing aspects of the disclosure. … Moreover, in some examples, the processor(s) 614 represent an implementation of analog techniques to perform the operations described herein. … One skilled in the art will understand and appreciate that computer data may be presented in a number of ways, such as visually in a graphical user interface (GUI), audibly through speakers, wirelessly between computing devices 600, across a wired connection,”): generate, using a first language model, a tokenized description of at least a section of preliminary map data (para. 0019 “Architecture 100 intakes an image 102 and has a vision language model 112, a captioner 114, and an object detector 116. Object detector 116 detects a plurality of objects 118 in image 102. Vision language model 112 and captioner 114 produce visual information 120 from image 102 and plurality of objects 118.”); use a second language model (para. 0027 “In some examples, vision language model 150 directly selects selected story caption 154 from among plurality of image story caption candidates 144, whereas in other examples, vision language model 150 scores plurality of image story caption candidates 144 and a down selection component 152 selects selected story caption 154 based on at least the scores from vision language model 150.”)…
Xie does not specifically disclose update, using the identified one or more potential modifications, the preliminary map data to generate updated map data, wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system.
Levinson teaches update, using the identified one or more potential modifications (para. 0066 “For example, perception engine 366 may be able to detect and classify external objects as pedestrians, bicyclists, dogs, other vehicles, etc. (e.g., perception engine 366 is configured to classify the objects in accordance with a type of classification, which may be associated with semantic information, including a label). Based on the classification of these external objects, the external objects may be labeled as dynamic objects or static objects. For example, an external object classified as a tree may be labeled as a static object, while an external object classified as a pedestrian may be labeled as a static object. External objects labeled as static may or may not be described in map data. Examples of external objects likely to be labeled as static include traffic cones, cement barriers arranged across a roadway, lane closure signs, newly-placed mailboxes or trash cans adjacent a roadway, etc. Examples of external objects likely to be labeled as dynamic include bicyclists, pedestrians, animals, other vehicles, etc. If the external object is labeled as dynamic, and further data about the external object may indicate a typical level of activity and velocity, as well as behavior patterns associated with the classification type.”; para. 0109 “Classifier 2360 is configured to identify an object and to classify that object by classification type (e.g., as a pedestrian, cyclist, etc.) and by energy/activity (e.g. whether the object is dynamic or static), whereby data representing classification is described by a semantic label.”), the preliminary map data to generate updated map data (para. 0157 “Data change detector 3653 is configured to detect changes in data sets 3655a and 3655b, which are examples of any number of data sets of 3-D map data. Data change detector 3653 also is configured to generate data identifying a portion of map data that has changed, as well as optionally identifying or classifying an object associated with the changed portion of map data. …At time, T2, however, data change detector 3653 may detect that another number of data sets, including data set 3655b, includes data representing the presence of external objects in portions of map data 3665 of 3-D model data 3661, whereby portions of map data 3665 coincide with portions of map data 3664 at different times. Therefore, data change detector 3653 may detect changes in map data, and may further adaptively modify map data to include the changed map data (e.g., as updated map data).”; para. 0159 “As shown, map data 3692 stored map repository 3605a is associated with, or linked to, indication data (“delta data”) 3694 that indicated that an associated portion of map data has changed. Further to the example shown, indication data 3694 may identify a set of traffic cones, as changed portions of map data 3665, disposed in a physical environment associated with 3-D model 3661 through which an autonomous vehicle travels.”; para. 0160 “A tile generator 3656 may be configured to generate two-dimensional or three-dimensional map tiles based on map data from data sets 3655a and 3655b. The map tiles may be transmitted for storage in map repository 3605a. Tile generator 3656 may generate map tiles that include indicator data for indicating a portion of the map is an updated portion of map data.”), wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system (para. 0160 “Tile generator 3656 may generate map tiles that include indicator data for indicating a portion of the map is an updated portion of map data. Further, an updated map portion may be incorporated into a reference data repository 3605 in an autonomous vehicle…”).
Xie and Levinson are considered to be analogous to the claimed invention as
they both are in the same field of image processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie to incorporate the teachings of Levinson in order to specifically update using the one or more potential modifications, preliminary map data to generate updated map data, wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system. Doing so would be beneficial, as this would enhance the accuracy of localization functions for autonomous vehicles (Levinson, para. 0146).
Xie in view of Levinson does not specifically disclose [use a second language model] to determine probability values for individual tokens of the tokenized description;
Identify, based at least one the probability values, one or more potential modifications to be made with respect to the section of the preliminary map data.
Barut teaches [use a second language model] (Col. 8 Lines 25-40 “NLU component 160 may employ a number of different natural language understanding strategies in order to understand natural language input data. In various examples, if NLU component 160 is multi-modal, image data 112 (representing the image data corresponding in time with the audio data 102) may be sent to NLU component 160. In such examples, the multi-modal NLU component 160 may be effective to generate not only the word embedding data 123 representing the user's utterance, but also the region of interest data 117 (comprising visual feature data representing a predicted object-of-interest), attention maps 119 (e.g., attention map data), and relative location data 121, as described in further detail below. In various other examples, NLU component 160 may generate word embedding data (or other token embeddings) representing the user's utterance”) to determine probability values for individual tokens of the tokenized description (Col. 9 Lines 47-51 “In various examples, a softmax layer may be used to generate the probability of the input query corresponding to each candidate object detected by the object detector 115. Accordingly, the selected object data 172 may represent the candidate object with the highest probability.”);
Identify, based at least one the probability values, one or more potential modifications to be made with respect to the section of the preliminary map data (Col. 9 Lines 57-67 and Col. 10 Lines 1-9 “The selected object data 172 may be sent back to NLU component 160 and/or to some other component of natural language processing system 120 to take an action requested in the input query (e.g., an action requested in the user utterance). For example, if the user query is “Computer, zoom in on the blue chair on the right,” the selected object data 172 may represent a blue chair on the right side of displayed image 182. The image data (e.g., visual feature data and/or pixel data representing the blue chair) may be sent to natural language processing system as entity data related to the input query, for example. Thereafter, the NLU component 160 may determine an intent related to the user query (e.g., a zoom-in intent). Data representing the intent, the entity data representing the blue chair, and/or the image data 112 may be sent to a skill effective to perform the zoom-in action in response to the intent. Accordingly, the skill may zoom in on the blue chair and the modified, zoomed-in image of the blue chair may be displayed on the user's display.”).
Xie, Levinson, and Barut are considered to be analogous to the claimed invention as they are in the same field of image processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie in view of Levinson to incorporate the teachings of Barut in order to use a second language model to calculate probability values for tokens in the tokenized description, and to identify, based on the probabilities, one or more potential modifications to perform with respect to preliminary map data. Doing so would be beneficial as this would enable the system to process user commands, aiding the user in requests without the user having to perform independent research (Col. 2 Lines 19-26).

Regarding claim 11, Xie in view of Levinson and Barut discloses wherein the one or more circuits are further to use at least one language model (Barut, Col. 13 Lines 9-11 “Accordingly, NLU output data 306 may comprise data representing the selected object selected by the multi-modal transformer 170.”) to generate, based at least on processing the one or more potential modifications (Col. 14 Lines 28-35 “NLU output data 306 (which may, in some examples, include data representing the object selected by the multi-modal transformer 170) and top K skills 308 may be sent by NLU component 160 to orchestrator 330. Orchestrator 330 may send the top K skills 308 and the NLU output data 306 to routing service 312. Routing service 312 may send the top K skills 308 and NLU output data 306 to skill proposal component 314.”; Fig. 3 Processing steps 314, 316, 320, and 332), one or more textual representations or textual recommendations regarding the one or more modifications (Col. 16 Lines 1-29 “Decider engine 332 may output plan data that comprises a routing plan 334 for processing the input data. …In some examples, the ranking component 320 may determine that two different skills are equally applicable for processing the input data. In such examples, the decider engine 332 may determine that disambiguation should occur. Accordingly, the routing plan 334 may include sending the input data to a dialog skill 352 that may output (via TTS) one or more questions (e.g., a disambiguation request) used to prompt the user to disambiguate between the two equally likely (or approximately equally likely) interpretations of the input data. …Accordingly, the routing plan 334 may route the input data to the dialog skill 352, and the dialog skill 352 may inquire whether the user intended to play the movie or the soundtrack.”; text representation generated for use in text-to-speech in dialog skill).
Xie, Levinson, and Barut are considered to be analogous to the claimed invention as they are in the same field of processing images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie to incorporate the teachings of Barut in order to use at least one language model to generate, based on processing the one or more modifications, one or more textual representations regarding the one or more modifications. Doing so would be beneficial, as this would allow for disambiguation to occur when it is unclear which modification should be performed (Col. 16 Lines 8-29), ensuring that the desired action is performed which improves user experience.

Regarding claim 12 Xie in view of Levinson and Barut discloses wherein the one or more potential modifications relate to at least one of a correction of an identified error, an addition of information determined to be absent from the preliminary map data, or an enhancement to the preliminary map data (Barut, enhancement: Col. 9 Lines 57-67 and Col. 10 Lines 1-9 “The selected object data 172 may be sent back to NLU component 160 and/or to some other component of natural language processing system 120 to take an action requested in the input query (e.g., an action requested in the user utterance). For example, if the user query is “Computer, zoom in on the blue chair on the right,” the selected object data 172 may represent a blue chair on the right side of displayed image 182. The image data (e.g., visual feature data and/or pixel data representing the blue chair) may be sent to natural language processing system as entity data related to the input query, for example. Thereafter, the NLU component 160 may determine an intent related to the user query (e.g., a zoom-in intent). Data representing the intent, the entity data representing the blue chair, and/or the image data 112 may be sent to a skill effective to perform the zoom-in action in response to the intent. Accordingly, the skill may zoom in on the blue chair and the modified, zoomed-in image of the blue chair may be displayed on the user's display.”).
Xie, Levinson, and Barut are considered to be analogous to the claimed invention as they are in the same field of image processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie in view of Levinson to incorporate the teachings of Barut in order to specifically have the modification be an enhancement to the preliminary map data. Doing so would be beneficial, as this would enable the user to view enhanced images (e.g. easier to see image that has been zoomed-in, Col. 9 Lines 52-67), improving user experience.

Regarding claim 13, Xie in view of Levinson and Barut discloses wherein the tokenized description is a tokenized text string representative of the section of the preliminary map data (Xie, para. 0020 “Visual information 120 comprises text that describes what is contained within image 102, such as image tags 122, an initial image caption 124, and object information 126.”), the tokenized text string including a sequence of tokens associated with objects in the section of the preliminary map data (Xie, para. 0020 “In some examples, image tags 122 includes one or more tags identifying objects within image 102, and object information 126 includes additional tags, captions, attributes, and locations for objects within image 102. Visual information 120 also includes visual clues 130 that is based on at least image tags 122, initial image caption 124, and object information 126. Visual clues 130 is a semantic representation of image 102 and comprises semantic components from object and attribute tags to localized detection regions and region captions.”).

Regarding claim 14, Xie in view of Levinson and Barut discloses wherein the first language model and the second language model are portions of a single language model (Xie, para. 0025 “In some examples, vision language model 112 and vision language model 150 both comprise a common vision language model.”).

Regarding claim 15, Xie in view of Levinson and Barut discloses wherein the processor is comprised in at least one of: a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing digital twin operations; a system for performing light transport simulation; a system for rendering graphical output; a system for performing deep learning operations; a system for performing generative Al operations using a large language model (LLM);a system implemented using an edge device; a system for generating or presenting virtual reality (VR) content; a system for generating or presenting augmented reality (AR) content; a system for generating or presenting mixed reality (MR) content; a system incorporating one or more Virtual Machines (VMs);a system implemented at least partially in a data center; a system for performing hardware testing using simulation; a system for performing generative operations using a language model (LM);a system for synthetic data generation; a collaborative content creation platform for 3D assets; or a system implemented at least partially using cloud computing resources (Xie, a system for performing generative operations using a language model: Fig. 6 discloses system, Fig. 2, “Generative Language Model 140”).

Regarding claim 16, Xie discloses A system (Fig. 6) comprising: one or more processors (Fig. 6, 614) to use a language model (para. 0027 “In some examples, vision language model 150 directly selects selected story caption 154 from among plurality of image story caption candidates 144, whereas in other examples, vision language model 150 scores plurality of image story caption candidates 144 and a down selection component 152 selects selected story caption 154 based on at least the scores from vision language model 150.”) … with respect to at least a section of a map based in part on a tokenized description of at least the section of the map (para. 0019 “Architecture 100 intakes an image 102 and has a vision language model 112, a captioner 114, and an object detector 116. Object detector 116 detects a plurality of objects 118 in image 102. Vision language model 112 and captioner 114 produce visual information 120 from image 102 and plurality of objects 118.”).
Xie does not specifically disclose update, using the identified one or more modifications, map data of the map to generate updated map data, wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system.
Levinson teaches update, using the identified one or more potential modifications (para. 0066 “For example, perception engine 366 may be able to detect and classify external objects as pedestrians, bicyclists, dogs, other vehicles, etc. (e.g., perception engine 366 is configured to classify the objects in accordance with a type of classification, which may be associated with semantic information, including a label). Based on the classification of these external objects, the external objects may be labeled as dynamic objects or static objects. For example, an external object classified as a tree may be labeled as a static object, while an external object classified as a pedestrian may be labeled as a static object. External objects labeled as static may or may not be described in map data. Examples of external objects likely to be labeled as static include traffic cones, cement barriers arranged across a roadway, lane closure signs, newly-placed mailboxes or trash cans adjacent a roadway, etc. Examples of external objects likely to be labeled as dynamic include bicyclists, pedestrians, animals, other vehicles, etc. If the external object is labeled as dynamic, and further data about the external object may indicate a typical level of activity and velocity, as well as behavior patterns associated with the classification type.”; para. 0109 “Classifier 2360 is configured to identify an object and to classify that object by classification type (e.g., as a pedestrian, cyclist, etc.) and by energy/activity (e.g. whether the object is dynamic or static), whereby data representing classification is described by a semantic label.”), the preliminary map data to generate updated map data (para. 0157 “Data change detector 3653 is configured to detect changes in data sets 3655a and 3655b, which are examples of any number of data sets of 3-D map data. Data change detector 3653 also is configured to generate data identifying a portion of map data that has changed, as well as optionally identifying or classifying an object associated with the changed portion of map data. …At time, T2, however, data change detector 3653 may detect that another number of data sets, including data set 3655b, includes data representing the presence of external objects in portions of map data 3665 of 3-D model data 3661, whereby portions of map data 3665 coincide with portions of map data 3664 at different times. Therefore, data change detector 3653 may detect changes in map data, and may further adaptively modify map data to include the changed map data (e.g., as updated map data).”; para. 0159 “As shown, map data 3692 stored map repository 3605a is associated with, or linked to, indication data (“delta data”) 3694 that indicated that an associated portion of map data has changed. Further to the example shown, indication data 3694 may identify a set of traffic cones, as changed portions of map data 3665, disposed in a physical environment associated with 3-D model 3661 through which an autonomous vehicle travels.”; para. 0160 “A tile generator 3656 may be configured to generate two-dimensional or three-dimensional map tiles based on map data from data sets 3655a and 3655b. The map tiles may be transmitted for storage in map repository 3605a. Tile generator 3656 may generate map tiles that include indicator data for indicating a portion of the map is an updated portion of map data.”), wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system (para. 0160 “Tile generator 3656 may generate map tiles that include indicator data for indicating a portion of the map is an updated portion of map data. Further, an updated map portion may be incorporated into a reference data repository 3605 in an autonomous vehicle…”).
Xie and Levinson are considered to be analogous to the claimed invention as
they both are in the same field of image processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie to incorporate the teachings of Levinson in order to specifically update using the one or more potential modifications, preliminary map data to generate updated map data, wherein the updated map data is formatted for ingestion by a machine-readable map repository used by an autonomous or semi-autonomous navigation system. Doing so would be beneficial, as this would enhance the accuracy of localization functions for autonomous vehicles (Levinson, para. 0146).
Xie in view of Levinson does not specifically disclose to use a language model to identify one or more modifications to be performed with respect to at least a section of a map.
Barut teaches [to use a language model] (Col. 8 Lines 25-40 “NLU component 160 may employ a number of different natural language understanding strategies in order to understand natural language input data. In various examples, if NLU component 160 is multi-modal, image data 112 (representing the image data corresponding in time with the audio data 102) may be sent to NLU component 160…”) to identify one or more modifications to be performed with respect to at least a section of a map (Col. 9 Lines 57-67 and Col. 10 Lines 1-9 “The selected object data 172 may be sent back to NLU component 160 and/or to some other component of natural language processing system 120 to take an action requested in the input query (e.g., an action requested in the user utterance). For example, if the user query is “Computer, zoom in on the blue chair on the right,” the selected object data 172 may represent a blue chair on the right side of displayed image 182. The image data (e.g., visual feature data and/or pixel data representing the blue chair) may be sent to natural language processing system as entity data related to the input query, for example. Thereafter, the NLU component 160 may determine an intent related to the user query (e.g., a zoom-in intent). Data representing the intent, the entity data representing the blue chair, and/or the image data 112 may be sent to a skill effective to perform the zoom-in action in response to the intent. Accordingly, the skill may zoom in on the blue chair and the modified, zoomed-in image of the blue chair may be displayed on the user's display.”).
Xie, Levinson, and Barut are considered to be analogous to the claimed invention as they are in the same field of image processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Xie in view of Levinson to incorporate the teachings of Barut in order to use a language model to identify one or more potential modifications to perform with respect to preliminary map data. Doing so would be beneficial as this would enable the system to process user commands, aiding the user in requests without the user having to perform independent research (Col. 2 Lines 19-26).
	
	Regarding claim 17, Xie in view of Levinson and Barut discloses wherein the one or more processors are further to use a second language model to generate the tokenized description of the section of the map (Xie, para. 0019 “Architecture 100 intakes an image 102 and has a vision language model 112, a captioner 114, and an object detector 116. Object detector 116 detects a plurality of objects 118 in image 102. Vision language model 112 and captioner 114 produce visual information 120 from image 102 and plurality of objects 118.”; para. 0020 “Visual information 120 comprises text that describes what is contained within image 102, such as image tags 122, an initial image caption 124, and object information 126.”; para. 0028 “A baseline graph 304 is generated for an image 302, using either tags for objects automatically detected in image 302”).

	Regarding claim 18, Xie in view of Levinson and Barut discloses wherein the one or more processors are further to use a third language model (Barut, Col. 13 Lines 9-11 “Accordingly, NLU output data 306 may comprise data representing the selected object selected by the multi-modal transformer 170.”) to generate, based at least on processing the one or more modifications (Col. 14 Lines 28-35 “NLU output data 306 (which may, in some examples, include data representing the object selected by the multi-modal transformer 170) and top K skills 308 may be sent by NLU component 160 to orchestrator 330. Orchestrator 330 may send the top K skills 308 and the NLU output data 306 to routing service 312. Routing service 312 may send the top K skills 308 and NLU output data 306 to skill proposal component 314.”; Fig. 3 Processing steps 314, 316, 320, and 332), one or more textual representations or textual recommendations regarding the one or more modifications (Col. 16 Lines 1-29 “Decider engine 332 may output plan data that comprises a routing plan 334 for processing the input data. …In some examples, the ranking component 320 may determine that two different skills are equally applicable for processing the input data. In such examples, the decider engine 332 may determine that disambiguation should occur. Accordingly, the routing plan 334 may include sending the input data to a dialog skill 352 that may output (via TTS) one or more questions (e.g., a disambiguation request) used to prompt the user to disambiguate between the two equally likely (or approximately equally likely) interpretations of the input data. …Accordingly, the routing plan 334 may route the input data to the dialog skill 352, and the dialog skill 352 may inquire whether the user intended to play the movie or the soundtrack.”; text representation generated for use in text-to-speech in dialog skill).

	Regarding claim 19, Xie in view of Levinson and Barut discloses wherein at least two of the first, second, and third language models, the second language model, and the third language model are portions of a single language model (para. 0025 “In some examples, vision language model 112 and vision language model 150 both comprise a common vision language model.”).

	Regarding claim 20, Xie in view of Levinson and Barut discloses wherein the system is comprised in least one of: a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing digital twin operations; a system for performing light transport simulation; a system for rendering graphical output; a system for performing deep learning operations; a system for performing generative AI operations using a large language model (LLM);a system implemented using an edge device; a system for generating or presenting virtual reality (VR) content; a system for generating or presenting augmented reality (AR) content; a system for generating or presenting mixed reality (MR) content; a system incorporating one or more Virtual Machines (VMs);a system implemented at least partially in a data center ;a system for performing hardware testing using simulation; a system for performing generative operations using a language model (LM);a system for synthetic data generation; a collaborative content creation platform for 3D assets; or a system implemented at least partially using cloud computing resources (Xie, a system for performing generative operations using a language model: Fig. 6 discloses system, Fig. 2, “Generative Language Model 140).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Agrawal et al. (US 2024/0212265 A1): building VR world map data utilizing tokenized description of user speech (Fig. 3)
Palanisamy & Mudalige (US 2020/0356828 A1): updating map data with new lane markings, building structures, constructions zones, etc (para. 0052)
Soryal & Reid (US 2020/0209852 A1): map instance as lightweight text file, used for detecting discrepancy between generated map instance and received map instances (Fig. 6, para. 0070), generating a new map instance in response to discrepancy (Abstract)
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CODY DOUGLAS HUTCHESON whose telephone number is (703)756-1601. The examiner can normally be reached M-F 8:00AM-5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CODY DOUGLAS HUTCHESON/           Examiner, Art Unit 2659        

/PIERRE LOUIS DESIR/           Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Sep 26, 2023
Application Filed
Aug 13, 2025
Non-Final Rejection — §101, §103, §112
Nov 12, 2025
Applicant Interview (Telephonic)
Nov 12, 2025
Examiner Interview Summary
Dec 09, 2025
Response Filed
Feb 13, 2026
Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/330,472
Patent 12603096
VOICE ENHANCEMENT METHODS AND SYSTEMS
2y 5m to grant Granted Apr 14, 2026
18/545,677
Patent 12591750
GENERATIVE LANGUAGE MODEL UNLEARNING
2y 5m to grant Granted Mar 31, 2026
18/163,230
Patent 12579447
TECHNIQUES FOR TWO-STAGE ENTITY-AWARE DATA AUGMENTATION
2y 5m to grant Granted Mar 17, 2026
18/217,880
Patent 12537018
METHOD AND SYSTEM FOR PREDICTING A MENTAL CONDITION OF A SPEAKER
2y 5m to grant Granted Jan 27, 2026
17/877,543
Patent 12530529
DOMAIN-SPECIFIC NAMED ENTITY RECOGNITION VIA GRAPH NEURAL NETWORKS
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+47.1%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allow rate.