Last updated: May 29, 2026
Application No. 18/385,496
METHOD, DEVICE, SYSTEM AND COMPUTER READABLE STORAGE MEDIUM FOR LOCATING VEHICLES

Non-Final OA §101§103
Filed
Oct 31, 2023
Priority
Oct 31, 2022 — CN 202211346634.3
Examiner
AWORUNSE, OLUWABUSAYO ADEBANJO
Art Unit
3662
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Volvo Car Corporation
OA Round
3 (Non-Final)
Interview Optional

— +0.0% interview lift. Interview lift (+0.0%) is below the 15.0% threshold. A written response is recommended.
Based on 4 resolved cases, 2023–2026
Examiner Intelligence

AWORUNSE, OLUWABUSAYO ADEBANJO View full profile →
Grants only 0% of cases
Career Allowance Rate
0 granted / 4 resolved
-52.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
27 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
6.3%
-33.7% vs TC avg
§103
87.3%
+47.3% vs TC avg
§102
3.8%
-36.2% vs TC avg
§112
2.5%
-37.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 4 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1,2,9,10,17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Luo (US 20190065863 A1), in view of Flint (US 20160350926 A1)

Regarding Claim 1,
Luo teaches or renders obvious:
A method for vehicle positioning, comprising:
See at least: “Embodiments of the present disclosure provide a method of localization for a non-transitory computer readable storage medium storing one or more programs...” ([0010]) and “The field of the disclosure is in general related to autonomous vehicles and, in particular, to a system and a method for localization using a camera-based reconstructed submap and a LiDAR-based global map” ([0002]).
Rationale: Luo expressly discloses a localization method for an autonomous vehicle using camera images and LiDAR data. One of ordinary skill in the art would have understood such localization to constitute vehicle positioning under the broadest reasonable interpretation of the claim language.
before a vehicle enters a scenario where the vehicle is located, comprising:
See at least: “before computing matching score, the method further comprises: constructing at least one 3D submap; and constructing a global map” ([0012]) and “In operation 13, a three-dimensional (3D) submap and a global map are constructed” ([0069]).
Rationale: Although Luo does not expressly recite that the vehicle obtains the map before entering the scenario, Luo does disclose that the maps used for localization are prepared before matching and position estimation are performed. Thus, it would have been understood by one of ordinary skill in the art that the localization map is made available in advance of vehicle use within the mapped environment. Accordingly, this temporal aspect is at least implicit in Luo, or would have been obvious therefrom.
obtaining from a cloud network by the vehicle and storing in a memory of the vehicle a previously constructed fused map of the scenario
See at least: “the system comprises an internet server, comprising: an I/O port, configured to transmit and receive electrical signals to and from a client device; a memory...” ([0023]), “the computer server 62 is configured to utilize the I/O port 65 communicate with external devices via a network 68, such as a wireless network” ([0102]), “the system 60 includes ... a memory 69” ([0099]), and “constructing at least one 3D submap; and constructing a global map” ([0012]).
Rationale: Luo does not expressly disclose the precise implementation in which the vehicle downloads a previously constructed fused map from a cloud network and stores that map in vehicle memory. However, Luo does disclose a networked server/client architecture, memory within the localization system, and the use of previously constructed map data for localization. In view of these teachings, it would have been obvious to one of ordinary skill in the art to provide the previously prepared map data to the vehicle via the disclosed network architecture and to store that data locally in memory for use during onboard localization. Such an implementation amounts to no more than the predictable use of known server-client data distribution and local caching techniques to support real-time localization with reduced latency and improved operational efficiency.
wherein the fused map includes a point cloud basemap and a vector map describing the scenarioSee at least: “constructing a city-scale 3D map based on the data from the LiDAR, using LiDAR mapping” ([0014]), “the global map is constructed, based on data from the LiDAR, using LiDAR mapping. The global map includes a 3D city-scale map” ([0069]), and “the structured features include at least one of planes, straight lines and curved lines, and the unstructured features include sparse 3D points” ([0017]); see also “The extracted features include structured features such as planes, straight lines and curved lines, and unstructured features such as sparse 3D points” ([0082]).
Rationale: Luo expressly discloses a LiDAR-derived 3D global map and sparse 3D point features, which reasonably correspond to the claimed point cloud basemap. Luo further discloses structured geometric features, such as planes, straight lines, and curved lines, which one of ordinary skill in the art would have recognized as vector-like descriptors of the mapped environment. Because Luo employs both the LiDAR-derived map data and the structured feature representation together for localization, it would have been obvious to treat these complementary map representations as a fused mapping framework describing the scenario. The claimed fused map, therefore, is at least rendered obvious by Luo’s disclosed dual-map architecture and its known use for a common positioning purpose.
capturing at least one image frame of a surrounding environment of the vehicle within the scenario through a camera unit and extracting a plurality of feature points from the at least one image frame
See at least: “constructing at least one 3D submap comprises: obtaining images from a camera; and constructing at least one 3D submap based on the images, using visual SLAM” ([0013]), “images of the environment are captured by the camera in approximately 30 Hz” ([0068]), and “the unstructured features include sparse 3D points” ([0017]); see also “The extracted features include ... unstructured features such as sparse 3D points” ([0082]).
Rationale: Luo expressly teaches capturing camera images of the surrounding environment and extracting features, including sparse 3D points, as part of the map construction and localization process. Since the 3D submap is expressly built from the captured camera images, one of ordinary skill in the art would have understood the extracted feature points to be derived from the image frames obtained by the camera unit.
performing a matching of the plurality of feature points with point cloud data in the point cloud basemap to determine a position of the vehicle within the vector map according to a result of the matching
See at least: “computing, in response to features from a 3D submap and features from a global map, matching score between corresponding features of a same class between the 3D submap and the global map...” ([0010]), “the features extracted from the 3D submap are matched against the features extracted from the global map” ([0071]), “location of the 3D submap is iteratively estimated” ([0072]), “for features classified in a same class, a matching score ... is computed based on the distribution of 3D points” ([0085]), and “coordinate of the 3D submap is transformed to coordinate of the global map” ([0091]).
Rationale: Luo expressly teaches matching features derived from the camera-based 3D submap against features of the LiDAR-based global map and determining location based on the result of that matching process. The LiDAR-based global map and sparse 3D point disclosures reasonably correspond to the claimed point cloud basemap under the broadest reasonable interpretation. To the extent the claim recites position determination within a vector map, that aspect is rendered obvious for the reasons set forth above, namely, Luo’s use of structured geometric map descriptors in conjunction with the LiDAR-derived map data for the same localization function.
Luo teaches or renders obvious the core camera/LiDAR localization workflow of Claim 1. The pre-entry timing, cloud-acquisition, vehicle-memory storage of the downloaded map, fused-map characterization, and vector-map characterization are properly understood as obvious implementations or characterizations arising from Luo’s disclosed server/network/map architecture and dual-map localization framework.




Flint teaches:
and measuring a relative displacement of the vehicle within the scenario through an inertial measurement unit and updating the position of the vehicle within the vector map according to the relative displacement.
See at least: “The device 100 also includes an inertial measurement unit 104. The IMU 104 may include several electronic hardware components, including a tri-axial gyroscope and accelerometer, for recording inertial data...” ([0030]), “Images recorded by the image sensor 102 and processed by the module 106 may also be referred to as ‘frames’” ([0031]), “The propagation algorithm takes a sequence of inertial readings between a first time and a second time ... and produces an estimate for the inertial state of the device at the second time” ([0036]), and “During recording of the images and inertial data, the visual-based inertial navigation system calculates and stores estimates of the state of the computing device relative to a starting point, a process commonly referred to as ‘dead reckoning’” ([0029]).
Rationale: Flint expressly discloses an inertial measurement unit that records inertial data and further discloses propagating device state over time based on inertial readings, including dead reckoning. One of ordinary skill in the art would have understood such state propagation to include relative displacement-based updating of position. Accordingly, Flint teaches the claimed IMU-based relative displacement measurement and position updating functionality. To the extent the claim places that updating within the previously discussed map context, that context is supplied by Luo.

Motivation to Combine Luo and Flint
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo and Flint before them, to modify Luo’s camera/LiDAR map-based localization process by incorporating Flint’s IMU-based propagation and dead-reckoning update so that Luo’s position estimate could be advanced between successive map-matching updates, because Luo already employs an inertial navigation module within its sensing and localization framework, and Flint teaches a known and compatible technique for deriving short-interval positional change from IMU readings. One of ordinary skill in the art would have looked to Flint to improve continuity, robustness, and update rate in Luo’s localization process, with the predictable result of smoother and more reliable vehicle positioning. Further, such modification would merely apply a known inertial-update technique to a known map-based localization system, would not alter Luo’s basic principle of operation, and would not render either reference unsatisfactory for its intended purpose. After combining the teachings of Luo and Flint, Claim 1 is rendered obvious by the combination of Luo in view of Flint. In particular, Luo teaches the camera/LiDAR map-based localization workflow, while Flint teaches IMU-based relative displacement measurement and position updating. The remaining aspects concerning pre-entry availability of map data, network-based provision, local memory storage, and characterization of the mapping framework as fused point-cloud and vector-map data would have been obvious to one of ordinary skill in the art in view of Luo’s disclosed architecture and the predictable implementation choices available at the time of the invention.

Regarding Claim 2,
The combination of Luo and Flint establishes the method of Claim 1, which is the basis for Claim 2.
Luo teaches or renders obvious:
whereinSee at least: “The one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform...” ([0010]).Rationale: Luo discloses a method implemented through a series of executed operations. Thus, the recited “wherein” clause is properly understood as introducing additional conditions on how the previously recited localization method is carried out.
determining the position of the vehicle
See at least: “vehicle poses, including position and orientation, are collected in an ‘east north up’ (ENU) coordinate by the inertial navigation module in approximately 50 Hz” ([0068]) and “location of the 3D submap is iteratively estimated until a distance between corresponding features is minimized” ([0072]).
Rationale: Luo expressly discloses determining vehicle-related position/location within the localization workflow. In particular, Luo discloses estimation of the location of the 3D submap and collection of vehicle poses, thereby supporting the claimed position-determination aspect.
according to the result of the matching
See at least: “the features extracted from the 3D submap are matched against the features extracted from the global map” ([0071]) and “location of the 3D submap is iteratively estimated until a distance between corresponding features is minimized” ([0072]).Rationale: Luo expressly discloses that position determination follows from the feature-matching process. Thus, Luo teaches determining position according to the result of matching.
is executed at a first frequency,
See at least: “images of the environment are captured by the camera in approximately 30 Hz. LiDAR scans are collected in the form of a sweep in approximately 20 Hz” ([0068]).Rationale: Luo does not expressly use the phrase “first frequency” for the matching-based position determination. However, Luo expressly discloses the operative frequencies of the sensed data used in the matching/localization workflow, namely camera images at approximately 30 Hz and LiDAR scans at approximately 20 Hz. In view of these teachings, one of ordinary skill in the art would have understood the matching-based determination of position to be executed at a frequency associated with, and practically bounded by, the incoming image/LiDAR data used for the matching process. Accordingly, Luo at least renders obvious that such matching-based position determination is executed at a first operating frequency.

Luo teaches or renders obvious the matching-based position-determination portion of Claim 2, including that such determination is performed at an operating frequency associated with the image/LiDAR localization pipeline. The more specific comparative relationship between that frequency and a higher inertial-update frequency is more directly shown by Flint.

Flint teaches:
and updating the position of the vehicle
See at least: “The propagation algorithm takes a sequence of inertial readings between a first time and a second time together with the inertial state of the device at the first time and produces an estimate for the inertial state of the device at the second time” ([0036]) and “Using features tracked in the images and corresponding information obtained from an IMU also operating on the device, the SWF obtains estimates for the inertial state of the device (e.g., position, orientation, velocity...)” ([0028]).
Rationale: Flint expressly discloses updating device position/state through IMU-based propagation between successive times. Thus, Flint teaches the claimed updating of position.
according to the relative displacement
See at least: “During recording of the images and inertial data, the visual-based inertial navigation system calculates and stores estimates of the state of the computing device relative to a starting point, a process commonly referred to as ‘dead reckoning’” ([0029]) and “The IMU 104 may include ... a tri-axial gyroscope and accelerometer, for recording inertial data...” ([0030]).Rationale: Flint expressly discloses dead reckoning based on inertial measurements, which one of ordinary skill in the art would have understood to constitute updating position according to relative displacement.
is executed at a second frequency,
See at least: “the module 106 receives the inertial readings from the IMU 104 and identifies readings that occur at or close to the time at which each keyframe was captured ... and containing all the inertial readings occurring between those endpoints” ([0034]-[0035]) and “The propagation algorithm takes a sequence of inertial readings between a first time and a second time ... and produces an estimate for the inertial state of the device at the second time” ([0036]).Rationale: Flint expressly discloses repeated inertial-readings-based propagation between image keyframes. Thus, Flint teaches that the inertial-update process is executed at its own operating frequency driven by the sequence of IMU readings, i.e., a second frequency.
wherein the first frequency is lower than the second frequency.
See at least: “the pre-processing module 106 also is configured to select a subset of the frames received from the image sensor 102 as ‘keyframes’” ([0032]) and “the module 106 receives the inertial readings from the IMU 104 ... and containing all the inertial readings occurring between those endpoints.” ([0034]); see also “The propagation algorithm takes a sequence of inertial readings between a first time and a second time ... and produces an estimate...” ([0036]).See also Luo further: “images of the environment are captured by the camera in approximately 30 Hz. LiDAR scans are collected in the form of a sweep in approximately 20 Hz. Vehicle poses, including position and orientation, are collected ... by the inertial navigation module in approximately 50 Hz.” ([0068]).
Rationale: Flint expressly teaches that inertial readings are processed between image keyframes, thereby supporting a higher-rate inertial-update process relative to the lower-rate image/keyframe-based processing. Flint alone does not provide the specific numerical comparison. Luo, however, expressly discloses numerical rates showing camera images at approximately 30 Hz, LiDAR sweeps at approximately 20 Hz, and inertial-navigation-module vehicle poses at approximately 50 Hz. In view of these combined teachings, one of ordinary skill in the art would have understood that the matching-based position determination associated with the camera/LiDAR localization pipeline is executed at a lower frequency than the inertial relative-displacement update, which is executed at the higher IMU-driven frequency. Accordingly, the claimed relationship that the first frequency is lower than the second frequency is rendered obvious by the combination of Luo and Flint.

Motivation to Combine Luo and Flint
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo and Flint before them, to modify Luo’s map-matching-based vehicle-localization process so that the matching-based position determination is performed at the lower camera/LiDAR processing frequency while the relative-displacement-based position updating is performed at the higher inertial-measurement frequency taught by Flint, because Luo already discloses a multi-sensor localization architecture including camera, LiDAR, and inertial navigation inputs, and Flint teaches the known and compatible practice of propagating position/state using IMU readings between successive image/keyframe updates. One of ordinary skill in the art would have made such modification to improve responsiveness, continuity, and smoothness of vehicle localization between lower-rate map-matching determinations, with the predictable benefit of providing higher-rate pose updating while preserving the accuracy advantages of the lower-rate feature-matching correction. Such a modification would merely apply a known higher-rate inertial-update technique to a known lower-rate map-matching localization framework and would not alter the principle of operation of either reference or render either reference unsatisfactory for its intended purpose. Therefor, after combining the teachings of Luo and Flint, Claim 2 is rendered obvious by the combination of Luo in view of Flint. In particular, Luo teaches the matching-based position-determination portion of the claimed method and discloses the operative camera/LiDAR and inertial frequencies, while Flint teaches inertial-readings-based position propagation and updating between image/keyframe events. The claimed frequency relationship, namely that the first frequency is lower than the second frequency, would have been understood by one of ordinary skill in the art from Luo’s disclosed rates and Flint’s higher-rate IMU propagation framework.
Regarding Claim 9,
Luo renders obvious:
A device for vehicle positioning,
See at least: “FIG. 6 is a block diagram of a system 60 for localization, in accordance with some embodiments.” ([0098]) and “The field of the disclosure is in general related to autonomous vehicles and, in particular, to a system and a method for localization using a camera-based reconstructed submap and a LiDAR-based global map.” ([0002]).Rationale: Luo expressly discloses a localization system/device for an autonomous-vehicle positioning context. One of ordinary skill in the art would have understood such a localization system to constitute a device for vehicle positioning under the broadest reasonable interpretation of the claim language.
comprising:See at least: “Referring to FIG. 6, the system 60 includes a processor 61, a computer server 62, a network interface 63, an input and output (I/O) device 65, a storage device 67, a memory 69, and a bus or network 68.” ([0099]).
Rationale: Luo expressly discloses a device/system including multiple recited components, thereby satisfying the open-ended transitional phrase.
a memory having stored computer instructions thereon;
See at least: “the system 60 includes ... a memory 69” ([0099]) and “The processor 61 is configured to execute program instructions that include a tool module configured to perform a method as described and illustrated with reference to FIGS. 1 to 5.” ([0104]).Rationale: Luo expressly discloses a memory and expressly discloses program instructions executed by the processor to perform the disclosed localization method. One of ordinary skill in the art would have understood the program instructions to be stored in the disclosed system memory for processor execution.
andSee at least: “the system 60 includes a processor 61 ... and a memory 69...” ([0099]).Rationale: Luo expressly discloses the conjunction of the recited device components.
a processor,
See at least: “the system 60 includes a processor 61...” ([0099]) and “the processor 61 is configured to enable the computer server 62 ... to perform specific operations disclosed herein.” ([0100]).Rationale: Luo expressly discloses the claimed processor.
wherein the instructions, when executed by the processor, cause the processor to perform a method for vehicle positioning,
See at least: “The processor 61 is configured to execute program instructions that include a tool module configured to perform a method as described and illustrated with reference to FIGS. 1 to 5.” ([0104]) and “Embodiments of the present disclosure provide a method of localization...” ([0010]).Rationale: Luo expressly discloses the processor executing program instructions to perform the disclosed localization method. As discussed above, localization of the vehicle relative to the environment/map constitutes vehicle positioning under a broad but reasonable reading of the claim.
the method comprising:
See at least: “the tool module is configured to execute the operations including: performing data alignment, analyzing data collected in an environment using sensors including a camera, a LiDAR and an inertial navigation module, constructing at least one 3D submap and a global map, extracting features from the 3D submap and the global map, matching features extracted from the 3D submap against those from the global map, refining feature correspondence and refining the 3D submap.” ([0104]).
Rationale: Luo expressly discloses that the processor-executed instructions carry out a sequence of localization-method operations, thereby satisfying the recited method wrapper.
before a vehicle enters a scenario where the vehicle is located,
See at least: “before computing matching score, the method further comprises: constructing at least one 3D submap; and constructing a global map.” ([0012]) and “In operation 13, a three-dimensional (3D) submap and a global map are constructed.” ([0069]).
Rationale: Although Luo does not expressly recite that the vehicle obtains the map before entering the scenario, Luo does disclose that the maps used for localization are prepared before matching and location estimation are performed. Thus, one of ordinary skill in the art would have understood that the localization map is made available in advance of vehicle use within the mapped environment. Accordingly, this temporal aspect is at least implicit in Luo, or would have been obvious therefrom.
obtaining from a cloud network by the vehicle
See at least: “the system comprises an internet server...” ([0023]) and “the computer server 62 is configured to utilize the I/O port 65 communicate with external devices via a network 68, such as a wireless network ... the internet server 62 is configured to utilize the I/O port 65 to wirelessly communicate with a client device 64...” ([0102]).
Rationale: Luo does not expressly disclose the precise implementation in which the vehicle obtains the previously constructed map from a cloud network. However, Luo does disclose an internet server, network communication, and client-device communication over a wireless network. In view of these teachings, it would have been obvious to one of ordinary skill in the art to provide the previously prepared localization map data to the vehicle through the disclosed network/server architecture as a predictable deployment choice for enabling onboard localization.
and storing in a memory of the vehicle
See at least: “the system 60 includes ... a memory 69...” ([0099]) and “The positions may be used to compute the path 10, which is stored in memory of the device...” ([0029]).Rationale: Luo expressly discloses memory within the localization system, and Flint expressly discloses position/path information stored in device memory. Thus, in the combined teachings, one of ordinary skill in the art would have found it obvious to store localization-related map and position information in device memory for use during execution of the positioning method.
a previously constructed fused map of the scenario,
See at least: “constructing at least one 3D submap; and constructing a global map.” ([0012]) and “In operation 13, a three-dimensional (3D) submap and a global map are constructed. In an embodiment, the 3D submap is constructed, based on images from the camera, using visual SLAM ... the global map is constructed, based on data from the LiDAR, using LiDAR mapping.” ([0069]).Rationale: Luo expressly discloses a previously constructed 3D submap and a previously constructed global map used together in the localization process. Luo does not expressly use the phrase “fused map.” However, one of ordinary skill in the art would have understood that jointly using these complementary map representations for a common localization purpose amounts to a fused mapping framework, or at least would have found such characterization obvious in view of Luo’s disclosed dual-map architecture.
wherein the fused map includes a point cloud basemap
See at least: “the global map is constructed, based on data from the LiDAR, using LiDAR mapping. The global map includes a 3D city-scale map.” ([0069]) and “the unstructured features may include sparse 3D points.” ([0070]).
Rationale: Luo expressly discloses a LiDAR-derived 3D global map and sparse 3D points. One of ordinary skill in the art would have understood such LiDAR-derived 3D map data to correspond to a point-cloud-type basemap under the broadest reasonable interpretation of the claim language.
and a vector map describing the scenario,
See at least: “the structured features may include, for example, planes, straight lines and curved lines...” ([0070]) and “The extracted features include structured features such as planes, straight lines and curved lines...” ([0082]).
Rationale: Luo does not expressly recite a “vector map.” However, Luo expressly discloses structured geometric features describing the environment, such as planes, straight lines, and curved lines. One of ordinary skill in the art would have recognized such structured geometric descriptors as vector-like representations of the mapped scenario. Accordingly, this aspect is at least rendered obvious by Luo’s structured-feature map representation.
wherein the fused map includes a point cloud basemap of the scenario
See at least: “the global map includes a 3D city-scale map.” ([0069]) and “the unstructured features may include sparse 3D points.” ([0070]).
Rationale: For the reasons discussed above, Luo’s LiDAR-derived global map and sparse 3D point disclosure reasonably correspond to a point cloud basemap of the scenario.
and a vector map describing the scenario;
See at least: “the structured features may include, for example, planes, straight lines and curved lines...” ([0070]) and “extracting structured features and unstructured features from the 3D submap and the 3D global map...” ([0106]).
Rationale: For the reasons discussed above, Luo’s structured-feature representation of the environment would have been understood by one of ordinary skill in the art as, or at least obvious as, a vector-like description of the scenario.
capturing at least one image frame of a surrounding environment of the vehicle within the scenario through a camera unit
See at least: “images of the environment are captured by the camera in approximately 30 Hz.” ([0068]) and “Images recorded by the image sensor 102 and processed by the module 106 may also be referred to as ‘frames.’” ([0031]).
Rationale: Luo expressly discloses capturing images of the environment through a camera, and Flint expressly identifies recorded images as frames. Thus, in the combined teachings, the claimed image-frame capture through a camera unit is expressly taught.
and extracting a plurality of feature points from the at least one image frame;See at least: “features from the 3D submap and the global map are extracted ... the unstructured features may include sparse 3D points.” ([0070]) and “the pre-processing module 106 performs feature tracking within the recorded frames...” ([0031]).
Rationale: Luo expressly discloses feature extraction, including sparse 3D points, from the submap/global-map pipeline, while Flint expressly discloses feature tracking within recorded frames. One of ordinary skill in the art would have understood the extracted feature points to be derived from the image frames used to construct and process the submap.
performing a matching of the plurality of feature points with point cloud data in the point cloud basemap
See at least: “the features extracted from the 3D submap are matched against the features extracted from the global map...” ([0071]) and “for features classified in a same class, correspondence of corresponding features between the 3D submap and the global map is established.” ([0095]).
Rationale: Luo expressly discloses matching features extracted from the 3D submap against features extracted from the LiDAR-derived global map. In view of Luo’s disclosure of sparse 3D points in the global map, one of ordinary skill in the art would have understood this to correspond to matching feature points against point-cloud-type map data.
to determine a position of the vehicle within the vector map
See at least: “location of the 3D submap is iteratively estimated...” ([0072]) and “coordinate of the 3D submap is transformed to coordinate of the global map.” ([0091]).Rationale: Luo expressly discloses determining location from the submap/global-map relationship. To the extent the claim recites determining position within a vector map, that map-context aspect is rendered obvious by Luo’s structured-feature map representation, as discussed above.
according to a result of the matching; andSee at least: “the features extracted from the 3D submap are matched against the features extracted from the global map...” ([0071]) and “location of the 3D submap is iteratively estimated until a distance between corresponding features is minimized.” ([0072]).Rationale: Luo expressly discloses that the location estimation follows from the matching process and its resulting feature correspondences. Accordingly, the position determination according to the result of the matching is expressly taught.

Luo discloses or renders obvious the processor/memory device framework and the core camera/LiDAR map-based localization workflow of Claim 9. The pre-entry timing, cloud-acquisition, local memory storage of downloaded map content, fused-map characterization, and vector-map characterization are better understood as obvious implementations or characterizations arising from Luo’s disclosed server/network/map architecture and dual-map localization framework.

Flint discloses:
measuring a relative displacement of the vehicle within the scenario through an inertial measurement unit
See at least: “The device 100 also includes an inertial measurement unit 104 ... including a tri-axial gyroscope and accelerometer, for recording inertial data of the device 100.” ([0030]) and “the portable electronic computing device also includes an IMU having components ... that record inertial data such as linear acceleration and angular velocity of the device.” ([0029]).Rationale: Flint expressly discloses an inertial measurement unit that records inertial data indicative of movement. One of ordinary skill in the art would have understood such inertial measurements to provide relative-displacement information for updating position over time.
and updating the position of the vehicle within the vector map according to the relative displacement.See at least: “During recording of the images and inertial data, the visual-based inertial navigation system calculates and stores estimates of the state of the computing device relative to a starting point, a process commonly referred to as ‘dead reckoning.’” ([0029]), “The propagation algorithm takes a sequence of inertial readings between a first time and a second time ... and produces an estimate for the inertial state of the device at the second time.” ([0036]), and “Using features tracked in the images and corresponding information obtained from an IMU ... the SWF obtains estimates for the inertial state of the device (e.g., position, orientation, velocity...)” ([0028]).Rationale: Flint expressly discloses dead-reckoning-based and propagation-based updating of the device position/state using IMU-derived information. To the extent the claim places that updating within the previously discussed map context, that context is supplied by Luo’s map-based localization framework. Accordingly, the combined teachings render obvious updating the position according to relative displacement within the map representation.

Motivation to Combine Luo and Flint
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo and Flint before them, to modify Luo’s processor-executed camera/LiDAR map-based vehicle-localization device so as to incorporate Flint’s IMU-based relative-displacement propagation and dead-reckoning position updating, because Luo already discloses a localization architecture using camera, LiDAR, and inertial-navigation inputs, and Flint teaches a known and compatible technique for using IMU readings to propagate and update position between map-matching determinations. One of ordinary skill in the art would have made such modification to improve continuity, responsiveness, and robustness of the positioning process while preserving the map-based accuracy provided by Luo’s feature-matching localization. Such a combination would merely apply a known inertial-update technique to a known vehicle-localization device and would not alter the principle of operation of either reference or render either reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo and Flint, Claim 9 is rendered obvious by the combination of Luo in view of Flint. In particular, Luo teaches the device framework including memory, processor, stored instructions, and the core camera/LiDAR map-based localization workflow, while Flint teaches IMU-based relative-displacement measurement and position updating. The remaining aspects concerning pre-entry map availability, network-based provision of map content, local memory storage, and characterization of the mapping framework as fused point-cloud and vector-map data would have been obvious to one of ordinary skill in the art in view of Luo’s disclosed architecture and the predictable implementation choices available at the time of the invention.

Regarding Claim 10,
The combination of Luo and Flint establishes the device of Claim 9, which is the basis for Claim 10.
Luo teaches or renders obvious:
whereinSee at least: “The processor 61 is configured to execute program instructions that include a tool module configured to perform a method as described and illustrated with reference to FIGS. 1 to 5.” ([0104]).
Rationale: Luo expressly discloses a device whose processor-executed instructions perform the recited localization operations. Thus, the “wherein” clause properly introduces additional conditions on execution of the device-implemented method.
determining the position of the vehicle
See at least: “computing location of the 3D submap in the global map, using the inertial navigation module, and aligning the 3D submap with the global map.” ([0105]) and “Subsequently, in operation 17, location of the 3D submap is iteratively estimated until a distance between corresponding features is minimized.” ([0072]).
Rationale: Luo expressly discloses determining vehicle-related location within the localization process. In particular, Luo discloses computing location of the 3D submap in the global map and iteratively estimating that location, thereby teaching the claimed position-determination aspect.
according to the result of the matching
See at least: “the features extracted from the 3D submap are matched against the features extracted from the global map” ([0071]) and “location of the 3D submap is iteratively estimated until a distance between corresponding features is minimized.” ([0072]).
Rationale: Luo expressly discloses that location determination follows from the matching process between features extracted from the 3D submap and the global map. Thus, Luo teaches determining position according to the result of the matching.
is executed at a first frequency,
See at least: “images of the environment are captured by the camera in approximately 30 Hz. LiDAR scans are collected in the form of a sweep in approximately 20 Hz.” ([0068]).
Rationale: Luo does not expressly use the phrase “first frequency” for the matching-based position determination. However, Luo expressly discloses the operating rates of the sensor data used for the map-matching localization process, namely approximately 30 Hz for camera images and approximately 20 Hz for LiDAR sweeps. In view of these teachings, one of ordinary skill in the art would have understood the matching-based position determination to be executed at a frequency associated with, and practically bounded by, the image/LiDAR localization pipeline. Accordingly, Luo at least renders obvious that such matching-based position determination is executed at a first operating frequency.

Luo teaches or renders obvious the matching-based position-determination portion of Claim 10, including that such determination is performed at an operating frequency associated with the image/LiDAR localization workflow. The more specific higher-frequency inertial-update relationship is more directly shown by Flint, with Luo supplying the express numerical rates.

Flint teaches:
and updating the position of the vehicle
See at least: “Using features tracked in the images and corresponding information obtained from an IMU also operating on the device, the SWF obtains estimates for the inertial state of the device (e.g., position, orientation, velocity, and/or gyro and accelerometer biases)” ([0028]) and “The propagation algorithm takes a sequence of inertial readings between a first time and a second time together with the inertial state of the device at the first time and produces an estimate for the inertial state of the device at the second time.” ([0036]).
Rationale: Flint expressly discloses updating device position/state through IMU-based propagation between successive times. Thus, Flint teaches the claimed updating of position.
according to the relative displacement
See at least: “During recording of the images and inertial data, the visual-based inertial navigation system calculates and stores estimates of the state of the computing device relative to a starting point, a process commonly referred to as ‘dead reckoning.’” ([0029]) and “The portable electronic computing device also includes an IMU having components (e.g., accelerometer(s) and gyroscope(s)) that record inertial data such as linear acceleration and angular velocity of the device.” ([0029]).
Rationale: Flint expressly discloses dead reckoning based on inertial measurements, which one of ordinary skill in the art would have understood to constitute updating position according to relative displacement.
is executed at a second frequency,
See at least: “the pre-processing module 106 also is configured to select a subset of the frames received from the image sensor 102 as ‘keyframes’” ([0032]) and “the module 106 receives the inertial readings from the IMU 104 and identifies readings that occur at or close to the time at which each keyframe was captured ... and containing all the inertial readings occurring between those endpoints.” ([0034]).
See also: “The propagation algorithm takes a sequence of inertial readings between a first time and a second time together with the inertial state of the device at the first time and produces an estimate for the inertial state of the device at the second time.” ([0036]).
Rationale: Flint expressly discloses repeated IMU-readings-based propagation between image keyframes. Thus, Flint teaches that the inertial-update process is executed at its own operating frequency driven by the sequence of inertial readings, i.e., a second frequency.
wherein the first frequency is lower than the second frequency.
See at least: “the pre-processing module 106 also is configured to select a subset of the frames received from the image sensor 102 as ‘keyframes’” ([0032]) and “the module 106 receives the inertial readings from the IMU 104 ... and containing all the inertial readings occurring between those endpoints.” ([0034]); see also “The propagation algorithm takes a sequence of inertial readings between a first time and a second time ... and produces an estimate...” ([0036]).See also Luo further: “images of the environment are captured by the camera in approximately 30 Hz. LiDAR scans are collected in the form of a sweep in approximately 20 Hz. Vehicle poses, including position and orientation, are collected ... by the inertial navigation module in approximately 50 Hz.” ([0068]).
Rationale: Flint expressly teaches that inertial readings are processed between image keyframes, thereby supporting a higher-rate inertial-update process relative to the lower-rate image/keyframe-based processing. Flint alone does not provide the specific numerical comparison. Luo, however, expressly discloses numerical rates showing camera images at approximately 30 Hz, LiDAR sweeps at approximately 20 Hz, and inertial-navigation-module vehicle poses at approximately 50 Hz. In view of these combined teachings, one of ordinary skill in the art would have understood that the matching-based position determination associated with the camera/LiDAR localization pipeline is executed at a lower frequency than the inertial relative-displacement update, which is executed at the higher IMU-driven frequency. Accordingly, the claimed relationship that the first frequency is lower than the second frequency is rendered obvious by the combination of Luo and Flint.

Motivation to Combine Luo and Flint
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo and Flint before them, to modify Luo’s device-implemented map-matching-based vehicle-positioning process so that the matching-based position determination is performed at the lower camera/LiDAR processing frequency while the relative-displacement-based position updating is performed at the higher inertial-measurement frequency taught by Flint, because Luo already discloses a multi-sensor localization architecture including camera, LiDAR, and inertial-navigation inputs, and Flint teaches the known and compatible practice of propagating position/state using IMU readings between successive image/keyframe updates. One of ordinary skill in the art would have made such modification to improve responsiveness, continuity, and smoothness of vehicle positioning between lower-rate map-matching determinations, with the predictable benefit of providing higher-rate pose updating while preserving the accuracy advantages of the lower-rate feature-matching correction. Such a modification would merely apply a known higher-rate inertial-update technique to a known lower-rate map-matching localization framework and would not alter the principle of operation of either reference or render either reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo and Flint, Claim 10 is rendered obvious by the combination of Luo in view of Flint. In particular, Luo teaches the device framework and the matching-based position-determination portion of the claimed positioning method, including the operative camera/LiDAR frequencies, while Flint teaches inertial-readings-based position propagation and updating between image/keyframe events. The claimed frequency relationship, namely that the first frequency is lower than the second frequency, is supported by Luo’s express numerical rates and Flint’s teaching of higher-rate IMU propagation between image/keyframe updates.

Regarding Claim 17,
Luo renders obvious:
A non-transitory computer-readable storage medium
See at least: “Embodiments of the present disclosure provide a method of localization for a non-transitory computer readable storage medium storing one or more programs.” ([0010]) and “In some embodiments in accordance with the present disclosure, a non-transitory, i.e., non-volatile, computer readable storage medium is provided.” ([0055]).
Rationale: Luo expressly discloses the claimed non-transitory computer-readable storage medium.
storing instructions
See at least: “The one or more programs comprise instructions, which when executed by a computing device...” ([0010]) and “The non-transitory computer readable storage medium is stored with one or more programs.” ([0055]).
Rationale: Luo expressly discloses a storage medium storing programs/instructions.
that cause a processor
See at least: “When the program is executed by the processing unit of a computing device ... the computing device is caused to conduct specific operations...” ([0055]) and “The processor 61 is configured to execute program instructions that include a tool module configured to perform a method...” ([0104]).
Rationale: Luo expressly discloses that the stored instructions, when executed, cause the processing unit/processor to carry out the disclosed operations.
to perform a method for vehicle positioning,
See at least: “Embodiments of the present disclosure provide a method of localization...” ([0010]) and “The field of the disclosure is in general related to autonomous vehicles and, in particular, to a system and a method for localization using a camera-based reconstructed submap and a LiDAR-based global map.” ([0002]).
Rationale: Luo expressly discloses a localization method for an autonomous vehicle. One of ordinary skill in the art would have understood such localization to constitute vehicle positioning under the broadest reasonable interpretation of the claim language.
the method comprising:
See at least: “The one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform ... using the following steps comprising...” ([0010]).
Rationale: Luo expressly discloses that the processor-executed instructions perform a sequence of method steps, thereby satisfying the recited method wrapper.
before a vehicle enters a scenario where the vehicle is located,
See at least: “before computing matching score, the method further comprises: constructing at least one 3D submap; and constructing a global map.” ([0012]) and “In operation 13, a three-dimensional (3D) submap and a global map are constructed.” ([0069]).
Rationale: Although Luo does not expressly recite that the vehicle obtains the map before entering the scenario, Luo does disclose that the maps used for localization are prepared before matching and location estimation are performed. Thus, one of ordinary skill in the art would have understood that the localization map is made available in advance of vehicle use within the mapped environment. Accordingly, this temporal aspect is at least implicit in Luo, or would have been obvious therefrom.
obtaining from a cloud network by the vehicle
See at least: “the system comprises an internet server...” ([0023]) and “the computer server 62 is configured to utilize the I/O port 65 communicate with external devices via a network 68, such as a wireless network.” ([0102]).
Rationale: Luo does not expressly disclose the precise implementation in which the vehicle obtains the previously constructed map from a cloud network. However, Luo does disclose an internet server and network communication over a wireless network. In view of these teachings, it would have been obvious to one of ordinary skill in the art to provide the previously prepared localization map data to the vehicle through the disclosed network/server architecture as a predictable deployment choice for enabling onboard localization.
and storing in a memory of the vehicle
See at least: “The processor 61 is configured to execute program instructions...” ([0104]) and “Referring to FIG. 6, the system 60 includes ... a memory 69...” ([0099]).
Rationale: Luo expressly discloses memory within the localization system and expressly discloses processor-executed instructions that perform the localization method. Thus, one of ordinary skill in the art would have found it obvious to store the map/instruction data used by the localization process in the disclosed system memory for device-side operation.
a previously constructed fused map of the scenario,
See at least: “constructing at least one 3D submap; and constructing a global map.” ([0012]) and “In operation 13, a three-dimensional (3D) submap and a global map are constructed ... the 3D submap is constructed, based on images from the camera, using visual SLAM ... the global map is constructed, based on data from the LiDAR, using LiDAR mapping.” ([0069]).Rationale: Luo expressly discloses a previously constructed 3D submap and a previously constructed global map used together in localization. Luo does not expressly use the phrase “fused map.” However, one of ordinary skill in the art would have understood that jointly using these complementary map representations for a common localization purpose amounts to a fused mapping framework, or at least would have found such characterization obvious in view of Luo’s disclosed dual-map architecture.
wherein the fused map includes a point cloud basemap
See at least: “constructing a city-scale 3D map based on the data from the LiDAR, using LiDAR mapping.” ([0014]) and “the global map includes a 3D city-scale map.” ([0069]); see also “the unstructured features may include sparse 3D points.” ([0017]).
Rationale: Luo expressly discloses a LiDAR-derived 3D global map and sparse 3D point features. One of ordinary skill in the art would have understood such LiDAR-derived 3D map data to correspond to a point-cloud-type basemap under the broadest reasonable interpretation of the claim language.
and a vector map describing the scenario;
See at least: “the structured features include at least one of planes, straight lines and curved lines...” ([0017]) and “The structured features may include, for example, planes, straight lines and curved lines...” ([0070]).
Rationale: Luo does not expressly recite a “vector map.” However, Luo expressly discloses structured geometric features describing the environment, such as planes, straight lines, and curved lines. One of ordinary skill in the art would have recognized such structured geometric descriptors as vector-like representations of the mapped scenario. Accordingly, this aspect is at least rendered obvious by Luo’s structured-feature map representation.
capturing at least one image frame of a surrounding environment of the vehicle within the scenario through a camera unit
See at least: “constructing at least one 3D submap comprises: obtaining images from a camera...” ([0013]) and “images of the environment are captured by the camera in approximately 30 Hz.” ([0068]).Rationale: Luo expressly discloses capturing images of the environment using a camera. One of ordinary skill in the art would have understood such captured images to correspond to image frames within the claimed camera-based localization method.
and extracting a plurality of feature points from the at least one image frame;See at least: “extracting structured features and unstructured features from 3D submap and the global map.” ([0016]) and “the unstructured features may include sparse 3D points.” ([0017]); see also “In operation 14, features from the 3D submap and the global map are extracted.” ([0070]).Rationale: Luo expressly discloses feature extraction, including sparse 3D points, from the camera-based submap/global-map pipeline. Since the 3D submap is expressly built from camera images, one of ordinary skill in the art would have understood the extracted feature points to be derived from the image frames obtained by the camera.
performing a matching of the plurality of feature points with point cloud data in the point cloud basemap
See at least: “computing, in response to features from a 3D submap and features from a global map, matching score between corresponding features...” ([0010]) and “the features extracted from the 3D submap are matched against the features extracted from the global map...” ([0071]).Rationale: Luo expressly discloses matching features extracted from the 3D submap against features extracted from the LiDAR-derived global map. In view of Luo’s disclosure of sparse 3D points in the global map, one of ordinary skill in the art would have understood this to correspond to matching feature points against point-cloud-type map data.
to determine a position of the vehicle within the vector map
See at least: “Subsequently, in operation 17, location of the 3D submap is iteratively estimated...” ([0072]) and “coordinate of the 3D submap is transformed to coordinate of the global map.” ([0091]).Rationale: Luo expressly discloses determining location from the submap/global-map relationship. To the extent the claim recites determining position within a vector map, that map-context aspect is rendered obvious by Luo’s structured-feature map representation, as discussed above.
according to a result of the matching;
See at least: “the features extracted from the 3D submap are matched against the features extracted from the global map...” ([0071]) and “location of the 3D submap is iteratively estimated until a distance between corresponding features is minimized.” ([0072]).Rationale: Luo expressly discloses that the location estimation follows from the matching process and its resulting feature correspondences. Accordingly, position determination according to the result of the matching is expressly taught.

Luo teaches or renders obvious the non-transitory storage-medium framework and the core camera/LiDAR map-based localization workflow of Claim 17. The pre-entry timing, cloud-acquisition, local memory storage of downloaded map content, fused-map characterization, and vector-map characterization are better understood as obvious implementations or characterizations arising from Luo’s disclosed server/network/map architecture and dual-map localization framework.

Flint teaches:
measuring a relative displacement of the vehicle within the scenario through an inertial measurement unit
See at least: “The device 100 also includes an inertial measurement unit 104... The IMU 104 may include several electronic hardware components, including a tri-axial gyroscope and accelerometer, for recording inertial data of the device 100.” ([0030]) and “the portable electronic computing device also includes an IMU having components ... that record inertial data such as linear acceleration and angular velocity of the device.” ([0029]).
Rationale: Flint expressly discloses an inertial measurement unit that records inertial data indicative of movement. One of ordinary skill in the art would have understood such inertial measurements to provide relative-displacement information for updating position over time.
and updating the position of the vehicle within the vector map according to the relative displacement.See at least: “During recording of the images and inertial data, the visual-based inertial navigation system calculates and stores estimates of the state of the computing device relative to a starting point, a process commonly referred to as ‘dead reckoning.’” ([0029]), “The propagation algorithm takes a sequence of inertial readings between a first time and a second time together with the inertial state of the device at the first time and produces an estimate for the inertial state of the device at the second time.” ([0036]), and “Using features tracked in the images and corresponding information obtained from an IMU also operating on the device, the SWF obtains estimates for the inertial state of the device (e.g., position, orientation, velocity...)” ([0028]).Rationale: Flint expressly discloses dead-reckoning-based and propagation-based updating of the device position/state using IMU-derived information. To the extent the claim places that updating within the previously discussed map context, that context is supplied by Luo’s map-based localization framework. Accordingly, the combined teachings render obvious updating the position according to relative displacement within the map representation.

Motivation to Combine Luo and Flint
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo and Flint before them, to modify Luo’s processor-executed storage-medium-based vehicle-localization method so as to incorporate Flint’s IMU-based relative-displacement propagation and dead-reckoning position updating, because Luo already discloses a localization architecture using camera, LiDAR, and inertial-navigation inputs, and Flint teaches a known and compatible technique for using IMU readings to propagate and update position between map-matching determinations. One of ordinary skill in the art would have made such modification to improve continuity, responsiveness, and robustness of the positioning process while preserving the map-based accuracy provided by Luo’s feature-matching localization. Such a combination would merely apply a known inertial-update technique to a known localization framework and would not alter the principle of operation of either reference or render either reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo and Flint, Claim 17 is rendered obvious by the combination of Luo in view of Flint. In particular, Luo teaches the non-transitory storage-medium framework and the core camera/LiDAR map-based localization workflow, while Flint teaches IMU-based relative-displacement measurement and position updating. The remaining aspects concerning pre-entry map availability, network-based provision of map content, local memory storage, and characterization of the mapping framework as fused point-cloud and vector-map data would have been obvious to one of ordinary skill in the art in view of Luo’s disclosed architecture and the predictable implementation choices available at the time of the invention.

Regarding Claim 18,
The combination of Luo and Flint establishes the non-transitory computer-readable storage medium of Claim 17, which is the basis for Claim 18.

Luo renders obvious:
whereinSee at least: “The one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform...” ([0010]) and “When the program is executed by the processing unit of a computing device ... the computing device is caused to conduct specific operations...” ([0055]).Rationale: The reference expressly discloses that the stored instructions, when executed, cause the processor/computing device to carry out the recited localization operations. Thus, the “wherein” clause properly introduces additional execution conditions on that storage-medium-implemented method.
determining the position of the vehicleSee at least: “computing, in response to features from a 3D submap and features from a global map, matching score...” ([0010]) and “In yet still another embodiment, the method further comprises: refining location of the 3D submap.” ([0021]); see also “refining location of the 3D submap comprises: performing an iterative estimation of location of the 3D submap...” ([0022]).Rationale: The reference expressly discloses determining vehicle-related location within the localization process through matching and subsequent refinement of the 3D submap location. One of ordinary skill in the art would have understood such refined localization to constitute determining vehicle position.
according to the result of the matching
See at least: “computing, in response to features from a 3D submap and features from a global map, matching score...” ([0010]) and “selecting, for each feature in the 3D submap, a corresponding feature with the highest matching score from the global map...” ([0010]).Rationale: The reference expressly discloses that the localization process relies on matching features from the 3D submap and the global map. Thus, position determination according to the result of the matching is taught.
is executed at a first frequency,
See at least: “images of the environment are captured by the camera in approximately 30 Hz. LiDAR scans are collected in the form of a sweep in approximately 20 Hz.” ([0068]).Rationale: Although Luo does not expressly recite a “first frequency” for the matching-based position determination, Luo expressly discloses the operating rates of the camera and LiDAR data used in the map-matching localization pipeline. In view of these teachings, one of ordinary skill in the art would have understood the matching-based position determination to be executed at a frequency associated with, and practically bounded by, the image/LiDAR localization process. Accordingly, this first-frequency aspect is at least rendered obvious by Luo.

Luo teaches or renders obvious the matching-based position-determination portion of Claim 18, including that such determination is performed at an operating frequency associated with the image/LiDAR localization workflow. The more specific higher-frequency inertial-update relationship is more directly shown by Flint, with Luo supplying the express numerical rates.

Flint discloses:
and updating the position of the vehicle
See at least: “Using features tracked in the images and corresponding information obtained from an IMU also operating on the device, the SWF obtains estimates for the inertial state of the device (e.g., position, orientation, velocity...)” ([0028]) and “The propagation algorithm takes a sequence of inertial readings between a first time and a second time together with the inertial state of the device at the first time and produces an estimate for the inertial state of the device at the second time.” ([0036]).
Rationale: The reference expressly discloses updating device position/state through IMU-based propagation between successive times. Thus, Flint teaches the claimed updating of position.
according to the relative displacement
See at least: “During recording of the images and inertial data, the visual-based inertial navigation system calculates and stores estimates of the state of the computing device relative to a starting point, a process commonly referred to as ‘dead reckoning.’” ([0029]) and “the portable electronic computing device also includes an IMU having components ... that record inertial data such as linear acceleration and angular velocity...” ([0029]).
Rationale: The reference expressly discloses dead reckoning based on inertial measurements, which one of ordinary skill in the art would have understood to constitute updating position according to relative displacement.
is executed at a second frequency,
See at least: “the pre-processing module 106 also is configured to select a subset of the frames received from the image sensor 102 as ‘keyframes’” ([0032]) and “the module 106 receives the inertial readings from the IMU 104 ... and containing all the inertial readings occurring between those endpoints.” ([0034]); see also “The propagation algorithm takes a sequence of inertial readings between a first time and a second time ... and produces an estimate...” ([0036]).Rationale: The reference expressly discloses repeated IMU-readings-based propagation between image keyframes. Thus, Flint teaches that the inertial-update process is executed at its own operating frequency driven by the sequence of inertial readings, i.e., a second frequency.
wherein the first frequency is lower than the second frequency.See at least: “the pre-processing module 106 also is configured to select a subset of the frames received from the image sensor 102 as ‘keyframes’” ([0032]) and “the module 106 receives the inertial readings from the IMU 104 ... and containing all the inertial readings occurring between those endpoints.” ([0034]); see also “The propagation algorithm takes a sequence of inertial readings between a first time and a second time ... and produces an estimate...” ([0036]).See also Luo: “images of the environment are captured by the camera in approximately 30 Hz. LiDAR scans are collected in the form of a sweep in approximately 20 Hz. Vehicle poses, including position and orientation, are collected ... by the inertial navigation module in approximately 50 Hz.” ([0068]).
Rationale: Flint expressly teaches that inertial readings are processed between image keyframes, thereby supporting a higher-rate inertial-update process relative to the lower-rate image/keyframe-based processing. Flint alone does not provide the specific numerical comparison. Luo, however, expressly discloses numerical rates showing camera images at approximately 30 Hz, LiDAR sweeps at approximately 20 Hz, and inertial-navigation-module vehicle poses at approximately 50 Hz. In view of these combined teachings, one of ordinary skill in the art would have understood that the matching-based position determination associated with the camera/LiDAR localization pipeline is executed at a lower frequency than the inertial relative-displacement update, which is executed at the higher IMU-driven frequency. Accordingly, the claimed relationship that the first frequency is lower than the second frequency is rendered obvious by the combination of Luo and Flint.

Motivation to Combine Luo and Flint
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo and Flint before them, to modify Luo’s storage-medium-implemented map-matching-based vehicle-positioning process so that the matching-based position determination is performed at the lower camera/LiDAR processing frequency while the relative-displacement-based position updating is performed at the higher inertial-measurement frequency taught by Flint, because Luo already discloses a multi-sensor localization architecture including camera, LiDAR, and inertial-navigation inputs, and Flint teaches the known and compatible practice of propagating position/state using IMU readings between successive image/keyframe updates. One of ordinary skill in the art would have made such modification to improve responsiveness, continuity, and smoothness of vehicle positioning between lower-rate map-matching determinations, with the predictable benefit of providing higher-rate pose updating while preserving the accuracy advantages of the lower-rate feature-matching correction. Such a modification would merely apply a known higher-rate inertial-update technique to a known lower-rate map-matching localization framework and would not alter the principle of operation of either reference or render either reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo and Flint, Claim 18 is rendered obvious by the combination of Luo in view of Flint. In particular, Luo teaches the non-transitory storage-medium framework and the matching-based position-determination portion of the claimed positioning method, including the operative camera/LiDAR frequencies, while Flint teaches inertial-readings-based position propagation and updating between image/keyframe events. The claimed frequency relationship, namely that the first frequency is lower than the second frequency, is supported by Luo’s express numerical rates and Flint’s teaching of higher-rate IMU propagation between image/keyframe updates.

Claims 3, 4, 11, 12, and 19 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Luo, in view of Flint, and in view of Zhao (Automatic Vector-Based Road Structure Mapping Using Multibeam LiDAR).

Regarding Claim 3,
The combination of Luo, Flint, and Zhao establishes the method of Claim 1, which is the basis for Claim 3.

Luo teaches or renders obvious:
whereinSee at least: “The one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform...” ([0010]).Rationale: The reference expressly discloses a processor-executed localization method. Thus, the “wherein” clause properly introduces additional conditions on the previously established method.
the point cloud basemap includes point cloud data
See at least: “constructing a global map comprises: obtaining the data from the LiDAR; and constructing a city-scale 3D map based on the data from the LiDAR, using LiDAR mapping.” ([0014]) and “the unstructured features may include sparse 3D points.” ([0017]); see also “The global map includes a 3D city-scale map.” ([0076]).Rationale: Luo expressly discloses a LiDAR-derived 3D global map and sparse 3D point features. One of ordinary skill in the art would have understood such LiDAR-derived 3D map data to correspond to point-cloud-type map data. Accordingly, Luo teaches or at least renders obvious the point-cloud-data aspect of the claimed point cloud basemap.
describing measurement of objects within the scenario
See at least: “obtaining the data from the LiDAR; and constructing a city-scale 3D map based on the data from the LiDAR” ([0014]) and “the unstructured features may include sparse 3D points.” ([0017]).Rationale: Luo expressly discloses LiDAR-acquired 3D map data and 3D point-based features representing measured environmental content. Thus, Luo teaches that the point-cloud-type map data describes measured objects/features in the mapped scene, although Luo does not expressly tie those objects to roads and intersections specifically.
and the vector map includes vector graphic elements
See at least: “the structured features include at least one of planes, straight lines and curved lines...” ([0017]) and “The extracted features include structured features such as planes, straight lines and curved lines...” ([0082]).
Rationale: Luo does not expressly use the phrase “vector graphic elements.” However, Luo expressly discloses structured geometric features such as planes, straight lines, and curved lines. One of ordinary skill in the art would have recognized such structured geometric descriptors as vector-like elements of a map representation. Accordingly, Luo at least renders obvious the vector-element aspect of the claim.
Luo renders obvious the point-cloud-data aspect of Claim 3 and further provides a strong basis for treating structured geometric map features as vector-like elements. The road-specific, intersection-specific, subset-generation, and explicit mapping/construction aspects are more directly shown by Zhao.
Zhao teaches:
describing measurement of objects within the scenario
See at least: “The LiDAR-based SLAM generates accurate maps represented by occupancy grids or 3D point clouds...” (page 2) and “the 3D point cloud of a scene” and “the 2D projection” and “the road boundaries extracted in one frame” and “the multiframe probabilistic fusion” (page 5, Figure 2 and accompanying text).
Rationale: Zhao expressly discloses LiDAR-based mapping from 3D point clouds of a scene and extraction of road-boundary objects from those measurements. Thus, Zhao teaches point-cloud data describing measured objects within the mapped scenario.
at roads and intersections within the scenario,
See at least: “The high-precision high-definition (HD) map of the road environment is now recognized as one of the cornerstones for autonomous driving. The reliable mapping of road boundaries, lanes, and other road structures...” (page 2) and “Figure 11c-I, II, and VI show successfully mapped intersections; all the road boundaries along the trajectory are mapped...” (page 13).
Rationale: Zhao expressly discloses that its mapping concerns road structures in road environments and further expressly identifies mapped intersections. Accordingly, Zhao teaches the road- and intersection-specific context missing from Luo.
and the vector map includes vector graphic elements
See at least: “we propose a vector-based SLAM method for the road structure mapping using vehicle-mounted multibeam LiDAR. We propose using polylines as the primary mapping element instead of grid maps or point clouds because the vector-based representation is lightweight and precise.” (page 2) and “Vector primitives, for example, polygons and line segments, are widely adopted as the representation of road geometry...” (page 3).
Rationale: Zhao expressly discloses vector-based mapping and expressly identifies polylines, polygons, and line segments as vector-based mapping elements. Thus, Zhao teaches the claimed vector graphic elements.
describing geometric characteristics of roads and intersections within the scenario,See at least: “The high-precision high-definition (HD) map of the road environment ... The reliable mapping of road boundaries, lanes, and other road structures...” (page 2) and “Vector primitives ... are widely adopted as the representation of road geometry...” (page 3).Rationale: Zhao expressly discloses vector-based representation of road geometry and road structures, and further shows mapped intersections in the resulting vectorized map discussion. One of ordinary skill in the art would have understood such vector elements to describe geometric characteristics of roads and intersections within the scenario.
and wherein,
See at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2)Rationale: Zhao expressly discloses an ordered mapping pipeline in which additional map-construction operations are performed. Thus, the clause properly introduces further map-preconstruction steps.
the method further comprises pre-constructing the fused map by:
See at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2) and “First, the LVMs are fused without concatenation... Finally, a reconstruction method is employed to globally reconstruct the vector map...” (page 9).
Rationale: Zhao expressly discloses a mapping workflow in which road-structure data is extracted, vectorized, fused, and reconstructed into a vector map. One of ordinary skill in the art would have understood this to be a pre-construction process for creating the map used later for localization.
creating corresponding point cloud data subsets
See at least: “the virtual scan method ... is applied to eliminate obstacles unrelated to the road boundary...” (page 5) and “the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6). and “The fused LVMs ... To filter out the ghost effects, we utilize the trajectories of the vehicle and employ ‘probing’ of the innermost road boundaries along the trajectory.” (page 9).
Rationale: Zhao expressly discloses extracting road-boundary-related subsets from larger point-cloud measurements and then operating on those localized/fused map portions. Although Zhao does not use the exact phrase “point cloud data subsets,” one of ordinary skill in the art would have understood the extracted road-boundary portions and local maps derived from the point-cloud scene data to constitute corresponding point-cloud-data subsets for later map generation.
respectively for each road and each intersection within the scenarioSee at least: “The data collecting path is approximately 2 km in length, passing through all road segments, including 3 cross-intersections and 8 T-junctions...” (page 13) and “Figure 11c-I, II, and VI show successfully mapped intersections; all the road boundaries along the trajectory are mapped...” (page 13).
Rationale: Zhao expressly discloses mapping across multiple road segments and intersections, including cross-intersections and T-junctions. Thus, Zhao teaches road- and intersection-specific map generation, which one of ordinary skill in the art would have understood as requiring corresponding subsets or local portions associated with those respective road/intersection regions.
to generate the point cloud data; and
See at least: “To fuse multiframes locally with high precision, we develop a reckoning system...” (page 5) and “As a result, the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6).
Rationale: Zhao expressly discloses generating locally fused map data from multiple LiDAR-derived frames. Thus, Zhao teaches generating point-cloud-derived map data from extracted/fused subsets of the measured scene.
mapping the point cloud data with the vector graphic elementsSee at least: “The LGM generated in the previous step is still a grid map-based representation of road boundaries. To convert the grid map into a vector-based map...” (page 6) and “we employ the virtual scan method to locally vectorize the LGM” (page 6); see also “In Figure 9e, the ordered points belonging to a sector of the road boundaries are connected to realize the initial vectorization.” (page 11).
Rationale: Zhao expressly discloses converting LiDAR/point-cloud-derived road-boundary map data into vector-based mapping elements through vectorization. In other words, Zhao teaches mapping the point-cloud-derived road data with vector graphic elements.
to construct the fused map.
See at least: “The optimization and reconstruction of local vector-based maps into a global vector map of the road structure is achieved.” (page 3) and “Finally, a reconstruction method is employed to globally reconstruct the vector map from sampled LVMs.” (page 9).Rationale: Zhao expressly discloses constructing a final global vector map by fusing, reconstructing, and vectorizing the road-structure data. In view of Luo’s point-cloud/global-map framework and Zhao’s vector-road-map construction workflow, one of ordinary skill in the art would have found it obvious to construct the claimed fused map by associating point-cloud-derived road/intersection data with vector graphic elements.

Motivation to Combine Luo, Flint, and Zhao
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, and Zhao before them, to modify Luo’s camera/LiDAR map-based vehicle-localization method, as further informed by Flint’s IMU-based relative-displacement updating used for the parent Claim 1 combination, so that the fused map used in localization includes road- and intersection-specific point-cloud data subsets and corresponding vector graphic road-geometry elements constructed through a vectorization and reconstruction workflow as taught by Zhao, because Luo already discloses a LiDAR-derived global map, a camera-derived submap, and structured/unstructured map features for localization, Flint teaches a known and compatible inertial-update technique for maintaining vehicle position between map-matching determinations, and Zhao teaches a known and compatible technique for generating vector-based road-structure maps from LiDAR-derived road-boundary data, including road and intersection mapping, local subset generation, vectorization, and reconstruction. One of ordinary skill in the art would have made such modification to provide a more lightweight, road-geometry-aware, and structured fused map representation for localization while preserving the continuous vehicle-position updating already supported by the Luo/Flint combination, with the predictable benefit of improving map usability and localization performance in road/intersection environments. Such a modification would merely apply a known vector-road-map generation technique to a known LiDAR-based localization framework, while retaining known inertial-update processing, and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, and Zhao, Claim 3 is rendered obvious by the combination of Luo in view of Flint and Zhao. In particular, Luo teaches the underlying camera/LiDAR localization framework and point-cloud/global-map features, Flint teaches the IMU-based updating portion carried through from parent Claim 1, and Zhao teaches road- and intersection-specific vector-map construction from LiDAR-derived road-structure data, including point-cloud-derived subsets, vector graphic elements, and reconstruction of the fused/vector map. The additional map-content and pre-construction aspects of Claim 3 would have been obvious to one of ordinary skill in the art in view of Zhao’s disclosed vector-road-mapping architecture and its predictable integration with the Luo/Flint localization framework.
Regarding Claim 4,
The combination of Luo, Flint, and Zhao establishes the method of Claim 3, which is the basis for Claim 4.

Zhao renders obvious:
whereinSee at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2)Rationale: Zhao expressly discloses an ordered mapping pipeline with additional processing stages. Thus, the “wherein” clause properly introduces additional conditions on the subset-generation portion of the already established method.
creating corresponding point cloud data subsetsSee at least: “Figure 2. Road boundary extraction. (a) the 3D point cloud of a scene ... (d) the road boundaries extracted in one frame. (e) the multiframe probabilistic fusion.” (page 5) and “the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6)Rationale: Zhao expressly discloses beginning with scene point-cloud data, extracting road-boundary portions therefrom, and then generating local map representations from those extracted portions. One of ordinary skill in the art would have understood those extracted road-boundary portions and local maps to constitute corresponding subsets of the original point-cloud data for subsequent processing.
respectively for each road and each intersection within the scenarioSee at least: “The data collecting path is approximately 2 km in length, passing through all road segments, including 3 cross-intersections and 8 T-junctions...” (page 13) and “Figure 11c-I, II, and VI show successfully mapped intersections; all the road boundaries along the trajectory are mapped...” (page 13)Rationale: Zhao expressly discloses mapping multiple road segments and multiple intersections, including cross-intersections and T-junctions. Accordingly, Zhao teaches road- and intersection-specific local mapping, which one of ordinary skill in the art would have understood as involving corresponding subsets or local portions associated with the respective road and intersection regions.
comprises:See at least: “We explored a combined pipeline of road structure extraction and vectorization...” (page 2)Rationale: Zhao expressly discloses an open-ended sequence of mapping operations, thereby supporting the recited transitional phrase for the additional subset-processing steps.
obtaining dense point cloud data subsetsSee at least: “However, the extracted road boundary in one frame is sparse and can be incomplete; therefore, it should be further densified by the multiframe probabilistic fusion.” (page 5) and “To fuse multiframes locally with high precision...” (page 5)Rationale: Zhao expressly discloses that sparse single-frame road-boundary extraction is further densified by multiframe probabilistic fusion. One of ordinary skill in the art would have understood the densified local fused boundary representation to constitute dense point-cloud-derived subsets for the corresponding road/intersection regions.
for each road and each intersection within the scenario,See at least: “all road segments, including 3 cross-intersections and 8 T-junctions...” (page 13) and “all the road boundaries along the trajectory are mapped” and “successfully mapped intersections” (page 13)Rationale: Zhao expressly discloses dense local road-boundary mapping across multiple road segments and intersections. Thus, the densified subset generation is taught in the road- and intersection-specific context required by the claim.
and determining whether a first total amount of dataSee at least: “Since the vectorized result can be noisy and dense due to the selected angular resolution of the virtual scan...” (page 7) and “The simplified representation can be reduced to merely averaging 6% of the points in the original polyline...” (page 7)Rationale: Zhao does not expressly recite computing a “first total amount of data” in those words. However, Zhao expressly distinguishes between denser and lighter-weight representations and expressly quantifies that simplification reduces the retained representation to about 6% of the original points. In view of these teachings, one of ordinary skill in the art would have understood that the mapping pipeline evaluates whether the amount of point/polyline data is sufficiently large to warrant simplification or sparsification. Accordingly, this data-amount determination is at least implicit in Zhao, or would have been obvious therefrom.
for respective dense point cloud data subsetsSee at least: “the extracted road boundary in one frame is sparse and can be incomplete; therefore, it should be further densified by the multiframe probabilistic fusion.” (page 5) and “the fused local grid map of road boundaries (LGM)” (page 11 figure description)Rationale: Zhao expressly discloses locally fused, densified road-boundary representations derived from multiframe point-cloud data. One of ordinary skill in the art would have understood these fused local representations to be the relevant dense subsets whose amount of data may be evaluated for subsequent simplification.
exceeds a predetermined threshold,See at least: “Since the vectorized result can be noisy and dense due to the selected angular resolution of the virtual scan, we employ the Ramer-Douglas-Peucker algorithm [41] to optimize polylines in the LVM.” (page 7) and “The comparison of the error statistics ... the matching with simplified-LVMs outperforms raw-LVM-based matching ... because of its lightweight and less noisy polyline-based representation.” (page 11)Rationale: Zhao does not expressly state a “predetermined threshold” for a first total amount of data. However, Zhao expressly teaches that when the representation is noisy and dense, a simplification operation is applied to reduce the amount of retained map data. One of ordinary skill in the art would have understood that, in implementing such density-driven simplification, some threshold or criterion must be used to decide when the dense representation should be retained and when it should be simplified. Use of a predetermined threshold for that decision would have been a routine and predictable implementation choice.
wherein when the first total amount of data does not exceed the predetermined threshold,See at least: “For a clear description, we refer to the simplified version of LVM as simplified LVM and the original version as raw LVM.” (page 7)Rationale: Zhao expressly distinguishes between an original, denser representation and a simplified representation. Although Zhao does not expressly recite the decision branch in the exact words of the claim, one of ordinary skill in the art would have understood that if the amount of data is not excessive, the original denser representation may simply be retained rather than simplified. This branch is therefore rendered obvious by Zhao’s express raw-versus-simplified framework.
taking the respective dense point cloud data subsets as the point cloud data,See at least: “the 3D point cloud of a scene” and “the road boundaries extracted in one frame” and “the multiframe probabilistic fusion” (page 5)Rationale: Zhao expressly discloses use of the extracted and fused road-boundary data as the operative point-cloud-derived map content. Where simplification is not invoked, one of ordinary skill in the art would have understood the dense subsets themselves to serve as the point-cloud data used downstream.
and when the first total amount of data exceeds the predetermined threshold,See at least: “Since the vectorized result can be noisy and dense due to the selected angular resolution of the virtual scan, we employ the Ramer-Douglas-Peucker algorithm [41] to optimize polylines in the LVM.” (page 7)Rationale: Zhao expressly teaches that when the representation is dense and noisy, simplification/optimization is applied. Although Zhao does not expressly frame this in terms of a threshold comparison, the conditional relationship between excessive density and subsequent simplification is clearly taught, and the use of a predetermined threshold to trigger that simplification would have been an obvious implementation detail.
converting the respective dense point cloud data subsetsSee at least: “The LGM generated in the previous step is still a grid map-based representation of road boundaries. To convert the grid map into a vector-based map...” (page 6) and “we employ the virtual scan method to locally vectorize the LGM.” (page 6)Rationale: Zhao expressly discloses converting denser local road-boundary representations into a more compact representation. Although the claim specifically recites converting dense point-cloud subsets into sparse point-cloud subsets, Zhao’s express teaching of transforming dense local road-boundary data into a lighter-weight representation strongly supports the claimed conversion concept, at least as rendered obvious to one of ordinary skill in the art.
into sparse point cloud data subsetsSee at least: “the extracted road boundary in one frame is sparse and can be incomplete” (page 5) and “The simplified representation can be reduced to merely averaging 6% of the points in the original polyline...” (page 7)Rationale: Zhao expressly discloses sparse extracted road-boundary data and expressly discloses a simplified representation using a greatly reduced number of points. Thus, Zhao teaches the concept of sparsifying a denser representation into a lighter-weight, reduced-point subset. Even if Zhao’s simplified product is discussed in a vectorized/polyline context, one of ordinary skill in the art would have found it obvious to apply the same density-reduction principle to point-cloud subsets used in the fused-map construction pipeline.
to generate the point cloud data.See at least: “the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6) and “The simplified representation can be reduced to merely averaging 6% of the points in the original polyline ... which is both lightweight and beneficial to the following matching process.” (page 7)Rationale: Zhao expressly discloses generation of operative local map data from extracted/fused road-boundary information and further discloses that a simplified, reduced-point representation is beneficial for subsequent matching. Accordingly, Zhao teaches or at least renders obvious generating the point-cloud/map data from either the denser or reduced subset representation, depending on the implemented density-handling choice.
Zhao teaches or renders obvious the dense/sparse subset-processing concept of Claim 4, including dense local road/intersection data generation, sparsification of overly dense representations, and reduced-point representations beneficial for later matching. The specific “predetermined threshold” wording is not expressly disclosed, but is a routine and predictable implementation detail for deciding when to retain a dense subset and when to convert it into a sparser subset.

Motivation to Combine Luo, Flint, and Zhao
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, and Zhao before them, to modify the Luo/Flint/Zhao combination used for Claims 1 and 3 so that the road- and intersection-specific point-cloud subsets generated during fused-map pre-construction are retained in denser form when the amount of subset data is acceptable, and are converted into sparser subsets when the amount of subset data becomes excessive, because Zhao already teaches densification of sparse road-boundary data through multiframe fusion, further teaches that dense/noisy local representations are simplified into much lighter-weight representations for efficiency and improved matching, Luo teaches a point-cloud/global-map localization framework in which such point-based map content is used for matching-based position determination, and Flint remains part of the parent-claim combination for the IMU-based updating aspect of Claim 1. One of ordinary skill in the art would have made such modification to control map-data volume and noise while preserving the utility of the point-cloud subsets for subsequent matching and localization, with the predictable benefit of improved storage efficiency and matching performance. The use of a predetermined threshold to decide when to retain dense data and when to sparsify it would have been a routine design choice within the level of ordinary skill in the art. Such a modification would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, After combining the teachings of Luo, Flint, and Zhao, Claim 4 is rendered obvious by the combination of Luo in view of Flint and Zhao. In particular, Luo teaches the underlying point-cloud/global-map localization framework, Flint remains part of the parent-claim combination for the IMU-based updating aspect of Claim 1, and Zhao teaches dense local road/intersection map generation, subsequent simplification of overly dense/noisy representations, and reduced-point representations that are beneficial for later matching. The remaining aspects concerning an explicit predetermined threshold and the branch between retaining dense subsets versus converting them into sparse subsets would have been obvious to one of ordinary skill in the art as routine implementation choices in view of Zhao’s disclosed densification/simplification workflow and its predictable integration with the Luo/Flint localization framework.

Regarding Claim 11,
The combination of Luo, Flint, and Zhao establishes the device of Claim 9, which is the basis for Claim 11.

Luo renders obvious:
whereinSee at least: “The processor 61 is configured to execute program instructions that include a tool module configured to perform a method…” ([0104]).Rationale: Luo expressly discloses a device whose processor-executed instructions perform the recited localization operations. Thus, the “wherein” clause properly introduces further conditions on that already established device-implemented method.
The point cloud basemap includes point cloud dataSee at least: “constructing a global map comprises: obtaining the data from the LiDAR; and constructing a city-scale 3D map based on the data from the LiDAR, using LiDAR mapping.” ([0014]) and “the unstructured features may include sparse 3D points.” ([0017]); see also “The global map includes a 3D city-scale map.” ([0076]).Rationale: Luo expressly discloses a LiDAR-derived 3D global map and sparse 3D point features. One of ordinary skill in the art would have understood such LiDAR-derived 3D map data to correspond to point-cloud-type map data. Accordingly, Luo teaches or at least renders obvious the point-cloud-data aspect of the claimed point cloud basemap.
Describing measurement of objects within the scenarioSee at least: “obtaining the data from the LiDAR; and constructing a city-scale 3D map based on the data from the LiDAR” ([0014]) and “the unstructured features may include sparse 3D points.” ([0017]).Rationale: Luo expressly discloses LiDAR-acquired 3D map data and 3D point-based features representing measured environmental content. Thus, Luo teaches that the point-cloud-type map data describes measured objects/features in the mapped scene, although Luo does not expressly tie those objects to roads and intersections specifically.
And the vector map includes vector graphic elementsSee at least: “the structured features include at least one of planes, straight lines and curved lines…” ([0017]) and “The extracted features include structured features such as planes, straight lines and curved lines…” ([0082]).Rationale: Luo does not expressly use the phrase “vector graphic elements.” However, Luo expressly discloses structured geometric features such as planes, straight lines, and curved lines. One of ordinary skill in the art would have recognized such structured geometric descriptors as vector-like elements of a map representation. Accordingly, Luo at least renders obvious the vector-element aspect of the claim.

Luo teaches or renders obvious the device-based point-cloud/global-map framework of Claim 11 and further provides a strong basis for treating structured geometric map features as vector-like elements. The road-specific, intersection-specific, subset-generation, and explicit mapping/construction aspects are more directly shown by Zhao, while Flint remains part of the parent-claim combination for Claim 9.

Zhao teaches:
describing measurement of objects within the scenarioSee at least: “The LiDAR-based SLAM generates accurate maps represented by occupancy grids or 3D point clouds...” (page 2) and “the 3D point cloud of a scene” and “the road boundaries extracted in one frame” and “the multiframe probabilistic fusion” (page 5, Figure 2 and accompanying text).Rationale: Zhao expressly discloses LiDAR-based mapping from 3D point clouds of a scene and extraction of road-boundary objects from those measurements. Thus, Zhao teaches point-cloud data describing measured objects within the mapped scenario.
at roads and intersections within the scenario,See at least: “The high-precision high-definition (HD) map of the road environment is now recognized as one of the cornerstones for autonomous driving. The reliable mapping of road boundaries, lanes, and other road structures...” (page 2) and “Figure 11c-I, II, and VI show successfully mapped intersections; all the road boundaries along the trajectory are mapped...” (page 13).Rationale: Zhao expressly discloses that its mapping concerns road structures in road environments and further expressly identifies mapped intersections. Accordingly, Zhao teaches the road- and intersection-specific context missing from Luo.
and the vector map includes vector graphic elementsSee at least: “we propose a vector-based SLAM method for the road structure mapping using vehicle-mounted multibeam LiDAR. We propose using polylines as the primary mapping element instead of grid maps or point clouds because the vector-based representation is lightweight and precise.” (page 2) and “Vector primitives, for example, polygons and line segments, are widely adopted as the representation of road geometry...” (page 3).Rationale: Zhao expressly discloses vector-based mapping and expressly identifies polylines, polygons, and line segments as vector-based mapping elements. Thus, Zhao teaches the claimed vector graphic elements.
describing geometric characteristics of roads and intersections within the scenario,See at least: “The high-precision high-definition (HD) map of the road environment ... The reliable mapping of road boundaries, lanes, and other road structures...” (page 2) and “Vector primitives ... are widely adopted as the representation of road geometry...” (page 3).Rationale: Zhao expressly discloses vector-based representation of road geometry and road structures, and further shows mapped intersections in the resulting vectorized map discussion. One of ordinary skill in the art would have understood such vector elements to describe geometric characteristics of roads and intersections within the scenario.
and wherein,See at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2).Rationale: Zhao expressly discloses an ordered mapping pipeline in which additional map-construction operations are performed. Thus, the clause properly introduces further map-preconstruction steps.
the method further comprises pre-constructing the fused map by:See at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2) and “First, the LVMs are fused without concatenation... Finally, a reconstruction method is employed to globally reconstruct the vector map...” (page 9).Rationale: Zhao expressly discloses a mapping workflow in which road-structure data is extracted, vectorized, fused, and reconstructed into a vector map. One of ordinary skill in the art would have understood this to be a pre-construction process for creating the map used later for localization.
creating corresponding point cloud data subsetsSee at least: “Figure 2. Road boundary extraction. (a) the 3D point cloud of a scene ... (d) the road boundaries extracted in one frame. (e) the multiframe probabilistic fusion.” (page 5) and “the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6).Rationale: Zhao expressly discloses beginning with scene point-cloud data, extracting road-boundary portions therefrom, and then generating local map representations from those extracted portions. One of ordinary skill in the art would have understood those extracted road-boundary portions and local maps to constitute corresponding subsets of the original point-cloud data for subsequent processing.
respectively for each road and each intersection within the scenarioSee at least: “The data collecting path is approximately 2 km in length, passing through all road segments, including 3 cross-intersections and 8 T-junctions...” (page 13) and “Figure 11c-I, II, and VI show successfully mapped intersections; all the road boundaries along the trajectory are mapped...” (page 13).Rationale: Zhao expressly discloses mapping across multiple road segments and intersections, including cross-intersections and T-junctions. Thus, Zhao teaches road- and intersection-specific map generation, which one of ordinary skill in the art would have understood as involving corresponding subsets or local portions associated with the respective road and intersection regions.
to generate the point cloud data; andSee at least: “To fuse multiframes locally with high precision, we develop a reckoning system...” (page 5) and “As a result, the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6).Rationale: Zhao expressly discloses generating locally fused map data from multiple LiDAR-derived frames. Thus, Zhao teaches generating point-cloud-derived map data from extracted/fused subsets of the measured scene.
mapping the point cloud data with the vector graphic elementsSee at least: “The LGM generated in the previous step is still a grid map-based representation of road boundaries. To convert the grid map into a vector-based map...” (page 6) and “we employ the virtual scan method to locally vectorize the LGM” (page 6); see also “In Figure 9e, the ordered points belonging to a sector of the road boundaries are connected to realize the initial vectorization.” (page 11).Rationale: Zhao expressly discloses converting LiDAR/point-cloud-derived road-boundary map data into vector-based mapping elements through vectorization. In other words, Zhao teaches mapping the point-cloud-derived road data with vector graphic elements.
to construct the fused map.See at least: “The optimization and reconstruction of local vector-based maps into a global vector map of the road structure is achieved.” (page 3) and “Finally, a reconstruction method is employed to globally reconstruct the vector map from sampled LVMs.” (page 9).Rationale: Zhao expressly discloses constructing a final global vector map by fusing, reconstructing, and vectorizing the road-structure data. In view of Luo’s point-cloud/global-map framework and Zhao’s vector-road-map construction workflow, one of ordinary skill in the art would have found it obvious to construct the claimed fused map by associating point-cloud-derived road/intersection data with vector graphic elements.

Motivation to Combine Luo, Flint, and Zhao
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, and Zhao before them, to modify Luo’s device-based camera/LiDAR map-based vehicle-localization system, as further informed by Flint’s IMU-based relative-displacement updating used for the parent Claim 9 combination, so that the fused map used by the device includes road- and intersection-specific point-cloud data subsets and corresponding vector graphic road-geometry elements constructed through a vectorization and reconstruction workflow as taught by Zhao, because Luo already discloses a device framework having processor-executed localization using a LiDAR-derived global map, a camera-derived submap, and structured/unstructured map features, Flint teaches a known and compatible inertial-update technique for maintaining vehicle position between map-matching determinations, and Zhao teaches a known and compatible technique for generating vector-based road-structure maps from LiDAR-derived road-boundary data, including road and intersection mapping, local subset generation, vectorization, and reconstruction. One of ordinary skill in the art would have made such modification to provide a more lightweight, road-geometry-aware, and structured fused map representation for use by Luo’s localization device while preserving the continuous vehicle-position updating already supported by the Luo/Flint combination, with the predictable benefit of improving map usability and localization performance in road/intersection environments. Such a modification would merely apply a known vector-road-map generation technique to a known LiDAR-based localization device, while retaining known inertial-update processing, and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, and Zhao, Claim 11 is rendered obvious by the combination of Luo in view of Flint and Zhao. In particular, Luo teaches the underlying device framework and point-cloud/global-map localization features, Flint teaches the IMU-based updating portion carried through from parent Claim 9, and Zhao teaches road- and intersection-specific vector-map construction from LiDAR-derived road-structure data, including point-cloud-derived subsets, vector graphic elements, and reconstruction of the fused/vector map. The additional map-content and pre-construction aspects of Claim 11 would have been obvious to one of ordinary skill in the art in view of Zhao’s disclosed vector-road-mapping architecture and its predictable integration with the Luo/Flint device-based localization framework.

Regarding Claim 12,
The combination of Luo, Flint, and Zhao establishes the device of Claim 11, which is the basis for Claim 12.

Zhao teaches or renders obvious:
whereinSee at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2)Rationale: Zhao expressly discloses an ordered mapping pipeline with additional processing stages. Thus, the “wherein” clause properly introduces additional conditions on the subset-generation portion of the already established device-based method.
creating corresponding point cloud data subsetsSee at least: “Figure 2. Road boundary extraction. (a) the 3D point cloud of a scene ... (d) the road boundaries extracted in one frame. (e) the multiframe probabilistic fusion.” (page 5) and “the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6)Rationale: Zhao expressly discloses beginning with scene point-cloud data, extracting road-boundary portions therefrom, and generating local map representations from those extracted portions. One of ordinary skill in the art would have understood those extracted road-boundary portions and local maps to constitute corresponding subsets of the original point-cloud data for subsequent processing.
respectively for each road and each intersection within the scenarioSee at least: “The data collecting path is approximately 2 km in length, passing through all road segments, including 3 cross-intersections and 8 T-junctions...” (page 13) and “Figure 11c-I, II, and VI show successfully mapped intersections; all the road boundaries along the trajectory are mapped...” (page 13)Rationale: Zhao expressly discloses mapping across multiple road segments and intersections, including cross-intersections and T-junctions. Thus, Zhao teaches road- and intersection-specific map generation, which one of ordinary skill in the art would have understood as involving corresponding subsets or local portions associated with the respective road and intersection regions.
comprises:See at least: “We explored a combined pipeline of road structure extraction and vectorization...” (page 2)Rationale: Zhao expressly discloses an open-ended sequence of mapping operations, thereby supporting the recited transitional phrase for the additional subset-processing steps.
obtaining dense point cloud data subsetsSee at least: “However, the extracted road boundary in one frame is sparse and can be incomplete; therefore, it should be further densified by the multiframe probabilistic fusion.” (page 5) and “To fuse multiframes locally with high precision...” (page 5)Rationale: Zhao expressly discloses that sparse single-frame road-boundary extraction is further densified by multiframe probabilistic fusion. One of ordinary skill in the art would have understood the densified local fused boundary representation to constitute dense point-cloud-derived subsets for the corresponding road/intersection regions.
for each road and each intersection within the scenario,See at least: “all road segments, including 3 cross-intersections and 8 T-junctions...” (page 13) and “all the road boundaries along the trajectory are mapped” and “successfully mapped intersections” (page 13)Rationale: Zhao expressly discloses dense local road-boundary mapping across multiple road segments and intersections. Thus, the densified subset generation is taught in the road- and intersection-specific context required by the claim.
and determining whether a first total amount of dataSee at least: “Since the vectorized result can be noisy and dense due to the selected angular resolution of the virtual scan...” (page 7) and “The simplified representation can be reduced to merely averaging 6% of the points in the original polyline...” (page 7)Rationale: Zhao does not expressly recite computing a “first total amount of data” in those exact words. However, Zhao expressly distinguishes between denser and lighter-weight representations and expressly quantifies that simplification reduces the retained representation to about 6% of the original points. In view of these teachings, one of ordinary skill in the art would have understood that the mapping pipeline evaluates whether the amount of point/polyline data is sufficiently large to warrant simplification or sparsification. Accordingly, this data-amount determination is at least implicit in Zhao, or would have been obvious therefrom.
for respective dense point cloud data subsetsSee at least: “the extracted road boundary in one frame is sparse and can be incomplete; therefore, it should be further densified by the multiframe probabilistic fusion.” (page 5) and “the fused local grid map of road boundaries (LGM)” (page 11 figure description)Rationale: Zhao expressly discloses locally fused, densified road-boundary representations derived from multiframe point-cloud data. One of ordinary skill in the art would have understood these fused local representations to be the relevant dense subsets whose amount of data may be evaluated for subsequent simplification.
exceeds a predetermined threshold,See at least: “Since the vectorized result can be noisy and dense due to the selected angular resolution of the virtual scan, we employ the Ramer-Douglas-Peucker algorithm [41] to optimize polylines in the LVM.” (page 7) and “The comparison of the error statistics ... the matching with simplified-LVMs outperforms raw-LVM-based matching ... because of its lightweight and less noisy polyline-based representation.” (page 11)Rationale: Zhao does not expressly state a “predetermined threshold” for a first total amount of data. However, Zhao expressly teaches that when the representation is noisy and dense, a simplification operation is applied to reduce the amount of retained map data. One of ordinary skill in the art would have understood that, in implementing such density-driven simplification, some threshold or criterion must be used to decide when the dense representation should be retained and when it should be simplified. Use of a predetermined threshold for that decision would have been a routine and predictable implementation choice.
wherein when the first total amount of data does not exceed the predetermined threshold,See at least: “For a clear description, we refer to the simplified version of LVM as simplified LVM and the original version as raw LVM.” (page 7)Rationale: Zhao expressly distinguishes between an original, denser representation and a simplified representation. Although Zhao does not expressly recite the decision branch in the exact words of the claim, one of ordinary skill in the art would have understood that if the amount of data is not excessive, the original denser representation may simply be retained rather than simplified. This branch is therefore rendered obvious by Zhao’s express raw-versus-simplified framework.
taking the respective dense point cloud data subsets as the point cloud data,See at least: “the 3D point cloud of a scene” and “the road boundaries extracted in one frame” and “the multiframe probabilistic fusion” (page 5)Rationale: Zhao expressly discloses use of the extracted and fused road-boundary data as the operative point-cloud-derived map content. Where simplification is not invoked, one of ordinary skill in the art would have understood the dense subsets themselves to serve as the point-cloud data used downstream.
and when the first total amount of data exceeds the predetermined threshold,See at least: “Since the vectorized result can be noisy and dense due to the selected angular resolution of the virtual scan, we employ the Ramer-Douglas-Peucker algorithm [41] to optimize polylines in the LVM.” (page 7)Rationale: Zhao expressly teaches that when the representation is dense and noisy, simplification/optimization is applied. Although Zhao does not expressly frame this in terms of a threshold comparison, the conditional relationship between excessive density and subsequent simplification is clearly taught, and the use of a predetermined threshold to trigger that simplification would have been an obvious implementation detail.
converting the respective dense point cloud data subsetsSee at least: “The LGM generated in the previous step is still a grid map-based representation of road boundaries. To convert the grid map into a vector-based map...” (page 6) and “we employ the virtual scan method to locally vectorize the LGM.” (page 6)Rationale: Zhao expressly discloses converting denser local road-boundary representations into a more compact representation. Although the claim specifically recites converting dense point-cloud subsets into sparse point-cloud subsets, Zhao’s express teaching of transforming dense local road-boundary data into a lighter-weight representation strongly supports the claimed conversion concept, at least as rendered obvious to one of ordinary skill in the art.
into sparse point cloud data subsetsSee at least: “the extracted road boundary in one frame is sparse and can be incomplete” (page 5) and “The simplified representation can be reduced to merely averaging 6% of the points in the original polyline...” (page 7)Rationale: Zhao expressly discloses sparse extracted road-boundary data and expressly discloses a simplified representation using a greatly reduced number of points. Thus, Zhao teaches the concept of sparsifying a denser representation into a lighter-weight, reduced-point subset. Even if Zhao’s simplified product is discussed in a vectorized/polyline context, one of ordinary skill in the art would have found it obvious to apply the same density-reduction principle to point-cloud subsets used in the fused-map construction pipeline.
to generate the point cloud data.See at least: “the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6) and “The simplified representation can be reduced to merely averaging 6% of the points in the original polyline ... which is both lightweight and beneficial to the following matching process.” (page 7)Rationale: Zhao expressly discloses generation of operative local map data from extracted/fused road-boundary information and further discloses that a simplified, reduced-point representation is beneficial for subsequent matching. Accordingly, Zhao teaches or at least renders obvious generating the point-cloud/map data from either the denser or reduced subset representation, depending on the implemented density-handling choice.
Zhao teaches or renders obvious the dense/sparse subset-processing concept of Claim 12, including dense local road/intersection data generation, sparsification of overly dense representations, and reduced-point representations beneficial for later matching. The specific “predetermined threshold” wording is not expressly disclosed, but is a routine and predictable implementation detail for deciding when to retain a dense subset and when to convert it into a sparser subset.

Motivation to Combine Luo, Flint, and Zhao
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, and Zhao before them, to modify the Luo/Flint/Zhao combination used for Claims 9 and 11 so that the road- and intersection-specific point-cloud subsets generated during fused-map pre-construction for the device are retained in denser form when the amount of subset data is acceptable, and are converted into sparser subsets when the amount of subset data becomes excessive, because Zhao already teaches densification of sparse road-boundary data through multiframe fusion, further teaches that dense/noisy local representations are simplified into much lighter-weight representations for efficiency and improved matching, Luo teaches a device-based point-cloud/global-map localization framework in which such point-based map content is used for matching-based position determination, and Flint remains part of the parent-claim combination for the IMU-based updating aspect of Claim 9. One of ordinary skill in the art would have made such modification to control map-data volume and noise while preserving the utility of the point-cloud subsets for subsequent matching and localization by the device, with the predictable benefit of improved storage efficiency and matching performance. The use of a predetermined threshold to decide when to retain dense data and when to sparsify it would have been a routine design choice within the level of ordinary skill in the art. Such a modification would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, and Zhao, Claim 12 is rendered obvious by the combination of Luo in view of Flint and Zhao. In particular, Luo teaches the underlying device-based point-cloud/global-map localization framework, Flint remains part of the parent-claim combination for the IMU-based updating aspect of Claim 9, and Zhao teaches dense local road/intersection map generation, subsequent simplification of overly dense/noisy representations, and reduced-point representations that are beneficial for later matching. The remaining aspects concerning an explicit predetermined threshold and the branch between retaining dense subsets versus converting them into sparse subsets would have been obvious to one of ordinary skill in the art as routine implementation choices in view of Zhao’s disclosed densification/simplification workflow and its predictable integration with the Luo/Flint/Zhao device-based localization framework.

Regarding Claim 19,
The combination of Luo, Flint, and Zhao establishes the non-transitory computer-readable storage medium of Claim 17, which is the basis for Claim 19.

Luo discloses or renders obvious:
whereinSee at least: “The one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform...” ([0010]).Rationale: The reference expressly discloses that the stored instructions cause a computing device/processor to carry out the localization method. Thus, the “wherein” clause properly introduces additional conditions on that already established storage-medium-implemented method.
the point cloud basemap includes point cloud dataSee at least: “constructing a global map comprises: obtaining the data from the LiDAR; and constructing a city-scale 3D map based on the data from the LiDAR, using LiDAR mapping.” ([0014]) and “the unstructured features may include sparse 3D points.” ([0017]); see also “The global map includes a 3D city-scale map.” ([0076]).Rationale: The reference expressly discloses a LiDAR-derived 3D global map and sparse 3D point features. One of ordinary skill in the art would have understood such LiDAR-derived 3D map data to correspond to point-cloud-type map data. Accordingly, Luo teaches or at least renders obvious the point-cloud-data aspect of the claimed point cloud basemap.
describing measurement of objects within the scenarioSee at least: “obtaining the data from the LiDAR; and constructing a city-scale 3D map based on the data from the LiDAR” ([0014]) and “the unstructured features may include sparse 3D points.” ([0017]).Rationale: The reference expressly discloses LiDAR-acquired 3D map data and 3D point-based features representing measured environmental content. Thus, Luo teaches that the point-cloud-type map data describes measured objects/features in the mapped scene, although Luo does not expressly tie those objects to roads and intersections specifically.
and the vector map includes vector graphic elementsSee at least: “the structured features include at least one of planes, straight lines and curved lines...” ([0017]) and “The extracted features include structured features such as planes, straight lines and curved lines...” ([0082]).Rationale: Luo does not expressly use the phrase “vector graphic elements.” However, Luo expressly discloses structured geometric features such as planes, straight lines, and curved lines. One of ordinary skill in the art would have recognized such structured geometric descriptors as vector-like elements of a map representation. Accordingly, Luo at least renders obvious the vector-element aspect of the claim.

Luo teaches or renders obvious the non-transitory-storage-medium-based point-cloud/global-map framework of Claim 19 and further provides a strong basis for treating structured geometric map features as vector-like elements. The road-specific, intersection-specific, subset-generation, and explicit mapping/construction aspects are more directly shown by Zhao, while Flint remains part of the parent-claim combination for Claim 17.

Zhao discloses:
describing measurement of objects within the scenarioSee at least: “The LiDAR-based SLAM generates accurate maps represented by occupancy grids or 3D point clouds...” (page 2) and “the 3D point cloud of a scene” and “the 2D projection” and “the road boundaries extracted in one frame” and “the multiframe probabilistic fusion” (page 5, Figure 2 and accompanying text).Rationale: Zhao expressly discloses LiDAR-based mapping from 3D point clouds of a scene and extraction of road-boundary objects from those measurements. Thus, Zhao teaches point-cloud data describing measured objects within the mapped scenario.
at roads and intersections within the scenario,See at least: “The high-precision high-definition (HD) map of the road environment is now recognized as one of the cornerstones for autonomous driving. The reliable mapping of road boundaries, lanes, and other road structures...” (page 2) and “Figure 11c-I, II, and VI show successfully mapped intersections; all the road boundaries along the trajectory are mapped...” (page 13).Rationale: Zhao expressly discloses that its mapping concerns road structures in road environments and further expressly identifies mapped intersections. Accordingly, Zhao teaches the road- and intersection-specific context missing from Luo.
and the vector map includes vector graphic elementsSee at least: “we propose a vector-based SLAM method for the road structure mapping using vehicle-mounted multibeam LiDAR. We propose using polylines as the primary mapping element instead of grid maps or point clouds because the vector-based representation is lightweight and precise.” (page 2) and “Vector primitives, for example, polygons and line segments, are widely adopted as the representation of road geometry...” (page 3).Rationale: Zhao expressly discloses vector-based mapping and expressly identifies polylines, polygons, and line segments as vector-based mapping elements. Thus, Zhao teaches the claimed vector graphic elements.
describing geometric characteristics of roads and intersections within the scenario,See at least: “The high-precision high-definition (HD) map of the road environment ... The reliable mapping of road boundaries, lanes, and other road structures...” (page 2) and “Vector primitives ... are widely adopted as the representation of road geometry...” (page 3).Rationale: Zhao expressly discloses vector-based representation of road geometry and road structures, and further shows mapped intersections in the resulting vectorized map discussion. One of ordinary skill in the art would have understood such vector elements to describe geometric characteristics of roads and intersections within the scenario.
and wherein,See at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2).Rationale: Zhao expressly discloses an ordered mapping pipeline in which additional map-construction operations are performed. Thus, the clause properly introduces further map-preconstruction steps.
the method further comprises pre-constructing the fused map by:See at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2) and “First, the LVMs are fused without concatenation... Finally, a reconstruction method is employed to globally reconstruct the vector map...” (page 9).Rationale: Zhao expressly discloses a mapping workflow in which road-structure data is extracted, vectorized, fused, and reconstructed into a vector map. One of ordinary skill in the art would have understood this to be a pre-construction process for creating the map used later for localization.
creating corresponding point cloud data subsetsSee at least: “Figure 2. Road boundary extraction. (a) the 3D point cloud of a scene ... (d) the road boundaries extracted in one frame. (e) the multiframe probabilistic fusion.” (page 5) and “the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6).Rationale: Zhao expressly discloses beginning with scene point-cloud data, extracting road-boundary portions therefrom, and then generating local map representations from those extracted portions. One of ordinary skill in the art would have understood those extracted road-boundary portions and local maps to constitute corresponding subsets of the original point-cloud data for subsequent processing.
respectively for each road and each intersection within the scenarioSee at least: “The data collecting path is approximately 2 km in length, passing through all road segments, including 3 cross-intersections and 8 T-junctions...” (page 13) and “Figure 11c-I, II, and VI show successfully mapped intersections; all the road boundaries along the trajectory are mapped...” (page 13).Rationale: Zhao expressly discloses mapping across multiple road segments and intersections, including cross-intersections and T-junctions. Thus, Zhao teaches road- and intersection-specific map generation, which one of ordinary skill in the art would have understood as involving corresponding subsets or local portions associated with the respective road and intersection regions.
to generate the point cloud data; andSee at least: “To fuse multiframes locally with high precision, we develop a reckoning system...” (page 5) and “As a result, the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” (page 6).Rationale: Zhao expressly discloses generating locally fused map data from multiple LiDAR-derived frames. Thus, Zhao teaches generating point-cloud-derived map data from extracted/fused subsets of the measured scene.
mapping the point cloud data with the vector graphic elementsSee at least: “The LGM generated in the previous step is still a grid map-based representation of road boundaries. To convert the grid map into a vector-based map...” (page 6) and “we employ the virtual scan method to locally vectorize the LGM” (page 6); see also “In Figure 9e, the ordered points belonging to a sector of the road boundaries are connected to realize the initial vectorization.” (page 11).Rationale: Zhao expressly discloses converting LiDAR/point-cloud-derived road-boundary map data into vector-based mapping elements through vectorization. In other words, Zhao teaches mapping the point-cloud-derived road data with vector graphic elements.
to construct the fused map.See at least: “The optimization and reconstruction of local vector-based maps into a global vector map of the road structure is achieved.” (page 3) and “Finally, a reconstruction method is employed to globally reconstruct the vector map from sampled LVMs.” (page 9).Rationale: Zhao expressly discloses constructing a final global vector map by fusing, reconstructing, and vectorizing the road-structure data. In view of Luo’s point-cloud/global-map framework and Zhao’s vector-road-map construction workflow, one of ordinary skill in the art would have found it obvious to construct the claimed fused map by associating point-cloud-derived road/intersection data with vector graphic elements.



Motivation to Combine Luo, Flint, and Zhao
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, and Zhao before them, to modify Luo’s storage-medium-based camera/LiDAR map-based vehicle-localization method, as further informed by Flint’s IMU-based relative-displacement updating used for the parent Claim 17 combination, so that the fused map used by the processor-executed instructions includes road- and intersection-specific point-cloud data subsets and corresponding vector graphic road-geometry elements constructed through a vectorization and reconstruction workflow as taught by Zhao, because Luo already discloses a non-transitory storage medium storing instructions for localization using a LiDAR-derived global map, a camera-derived submap, and structured/unstructured map features, Flint teaches a known and compatible inertial-update technique for maintaining vehicle position between map-matching determinations, and Zhao teaches a known and compatible technique for generating vector-based road-structure maps from LiDAR-derived road-boundary data, including road and intersection mapping, local subset generation, vectorization, and reconstruction. One of ordinary skill in the art would have made such modification to provide a more lightweight, road-geometry-aware, and structured fused map representation for use by Luo’s storage-medium-implemented localization method while preserving the continuous vehicle-position updating already supported by the Luo/Flint combination, with the predictable benefit of improving map usability and localization performance in road/intersection environments. Such a modification would merely apply a known vector-road-map generation technique to a known LiDAR-based localization framework, while retaining known inertial-update processing, and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, and Zhao, Claim 19 is rendered obvious by the combination of Luo in view of Flint and Zhao. In particular, Luo teaches the underlying non-transitory storage-medium framework and point-cloud/global-map localization features, Flint teaches the IMU-based updating portion carried through from parent Claim 17, and Zhao teaches road- and intersection-specific vector-map construction from LiDAR-derived road-structure data, including point-cloud-derived subsets, vector graphic elements, and reconstruction of the fused/vector map. The additional map-content and pre-construction aspects of Claim 19 would have been obvious to one of ordinary skill in the art in view of Zhao’s disclosed vector-road-mapping architecture and its predictable integration with the Luo/Flint storage-medium-based localization framework.

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Luo, in view of Flint, in view of Zhao, and  Zhou (Non-iterative denoising algorithm based on a dual threshold for a 3D point cloud)

Regarding Claim 5,
The combination of Luo, Flint, and Zhao establishes the method of Claim 4, which is the basis for Claim 5.

Zhao teaches or renders obvious:
whereinSee at least: “We explored a combined pipeline of road structure extraction and vectorization, vector-based local map matching, optimization and vector reconstruction.” (page 2)Rationale: Zhao expressly discloses an ordered mapping pipeline with additional processing stages. Thus, the “wherein” clause properly introduces further conditions on the already established sparse-subset processing workflow.
converting the respective dense point cloud data subsetsSee at least: “The LGM generated in the previous step is still a grid map-based representation of road boundaries. To convert the grid map into a vector-based map...” (page 6) and “Since the vectorized result can be noisy and dense ... we employ the Ramer-Douglas-Peucker algorithm [41] to optimize polylines in the LVM.” (page 7)Rationale: This dense-to-sparse conversion concept was already relied upon in establishing Claim 4. It is repeated here only as antecedent context for the new Claim 5 limitations concerning what is done after the sparse subsets have already been generated. The newly added limitations begin with the second-stage evaluation of those sparse subsets.

Zhou teaches or renders obvious:
determining whether a second total amount of dataSee at least: “In this study, a non-iterative dual threshold framework for 3D point cloud denoising is proposed.” (page 2) and “Finally, we calculate the small threshold of the 3D point cloud ... where Nsum is the number of all points in P′.” (page 4)Rationale: Zhou expressly discloses threshold-based point-cloud processing and expressly uses the number of all points in the point cloud as part of the threshold framework. One of ordinary skill in the art would have understood this as evaluating the amount of point-cloud data to determine how further point-cloud reduction should proceed. Accordingly, the claimed determination of a second total amount of data is expressly suggested by Zhou’s point-count-based threshold processing.
of respective sparse point cloud data subsetsSee at least: “The first stage removes the drift points using a small threshold. The second stage removes the drift points using a large threshold.” (page 2) and “After the non-iterative small threshold denoising of the 3D point cloud, we get P″ ... The remaining drift points are continuously deleted by the non-iterative large threshold denoising algorithm.” (page 4)Rationale: Zhou expressly discloses that, after a first-stage reduction, a remaining point-cloud representation is subjected to a second threshold stage. One of ordinary skill in the art would have understood the post-first-stage representation to be a sparser subset of the original point cloud on which the second-stage decision is made.
exceeds the predetermined threshold,See at least: “The first stage removes the drift points using a small threshold. The second stage removes the drift points using a large threshold.” (page 2) and “If there is only one point in the Vi,j, it is identified as a drift point; otherwise the points are regarded as non-noise points.” (page 4)Rationale: Zhou expressly discloses predetermined thresholds governing whether point-cloud data is retained or removed in a two-stage denoising process. Thus, Zhou directly supports the claimed threshold-comparison concept applied to already sparse subsets.
wherein when the second total amount of data does not exceed the predetermined threshold,See at least: “If there is only one point in the Vi,j, it is identified as a drift point; otherwise the points are regarded as non-noise points.” (page 4)Rationale: Zhou expressly discloses a threshold-governed branch in which points are either removed or retained. One of ordinary skill in the art would have understood that where the post-conversion sparse data does not exceed the operative threshold, the existing sparse subset may simply be retained for downstream use.
taking the respective sparse point cloud data subsets as the point cloud data,See at least: “We can then retain the non-noise points and delete the drift points.” (page 4)Rationale: Zhou expressly discloses retaining the point-cloud data that survives threshold-based filtering. Accordingly, Zhou teaches the claimed branch in which the sparse subsets are taken as the operative point-cloud data.
and when the second total amount of data exceeds the predetermined threshold,See at least: “The second stage removes the drift points using a large threshold.” (page 2) and “The remaining drift points are continuously deleted by the non-iterative large threshold denoising algorithm.” (page 4)Rationale: Zhou expressly teaches that when the reduced representation still warrants further processing, additional threshold-based reduction is applied. Thus, Zhou supports the claimed second conditional branch when the second amount of data still exceeds the threshold.

Zhao further teaches or renders obvious the road/intersection-specific treatment of the already sparse subsets:
manually sampling the sparse point cloud data subset for each roadSee at least: “A probe-based sampling method is then employed to sample the fused road boundaries.” (page 8–9) and “the sampled nodes in the proposed method are not included in the optimization” (page 8)Rationale: Zhao expressly discloses sampling road-boundary data after fusion. Zhao does not expressly recite that the sampling is “manual.” However, one of ordinary skill in the art would have understood that, where an automatically produced sparse road subset still remains too large after threshold-based reduction, manual or user-directed sampling of the road subset would have been a routine and predictable implementation option for further controlling road data density.
at a predetermined interval,See at least: “The key difference is that we sample the nodes evenly in the referenced polyline...” (page 7) and “A probe-based sampling method is then employed to sample the fused road boundaries.” (page 8–9)Rationale: Zhao expressly teaches even sampling of nodes and probe-based sampling of fused road boundaries. One of ordinary skill in the art would have understood even sampling to correspond to sampling at a predetermined interval. Accordingly, this interval-based sampling aspect is at least rendered obvious by Zhao.
and taking the sparse point cloud data subset for each intersectionSee at least: “Figure 11c-I, II, and VI show successfully mapped intersections” (page 13) and “the global structure of the map was not taken into consideration, which is crucial for the vectorization of complex intersections.” (page 8)Rationale: Zhao expressly emphasizes the importance of preserving detailed structure at intersections and expressly discusses successful mapped intersections. One of ordinary skill in the art would have understood from these teachings that intersection subsets are significant structural components that may appropriately be retained without the same level of reduction applied to road-only segments.
and the manually sampled sparse point cloud data subset for each roadSee at least: “we sample the nodes evenly in the referenced polyline” (page 7) and “A probe-based sampling method is then employed to sample the fused road boundaries.” (page 8–9)Rationale: Zhao expressly teaches sampling road-boundary representations and even node sampling. Thus, Zhao teaches the road-specific sampled subset aspect of the claim, at least as rendered obvious when applied after the second threshold evaluation taught by Zhou.
as the point cloud data.See at least: “The simplified representation can be reduced to merely averaging 6% of the points ... which is both lightweight and beneficial to the following matching process.” (page 7) and “the matching with simplified-LVMs outperforms raw-LVM-based matching ... in both accuracy and efficiency” (page 11)Rationale: Zhao expressly teaches using the reduced/sampled representation as the operative representation for downstream matching. Accordingly, Zhao teaches taking the retained intersection subset together with the sampled road subset as the point-cloud data used for subsequent processing.

Motivation to Combine Luo, Flint, Zhao, and Zhou
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, Zhao, and Zhou before them, to modify the Luo/Flint/Zhao combination used for Claims 1, 3, and 4 so that, after the dense road- and intersection-specific point-cloud subsets have already been converted into sparse point-cloud subsets, those already sparse subsets are subjected to a second threshold-based data-volume evaluation as taught by Zhou, and, if the remaining sparse road-segment data still exceeds the acceptable threshold, the road subsets are further sampled at an even, predetermined interval while retaining the intersection subsets as taught by Zhao, because Zhou teaches a dual-threshold point-cloud reduction framework based on point-cloud amount and threshold-driven retention/further-reduction logic, Zhao teaches road-boundary sampling and even node spacing together with explicit preservation of important intersection geometry, Luo provides the overall point-cloud/global-map localization framework in which the resulting point-cloud data is used for map matching and positioning, and Flint remains part of the parent-claim combination for the IMU-based updating aspect of Claim 1. One of ordinary skill in the art would have made such modification to further control data volume and noise after the initial dense-to-sparse conversion while preserving important intersection structure and maintaining lightweight road-segment data for matching, with the predictable benefit of improved efficiency, storage economy, and matching performance. The use of a second threshold and predetermined-interval road sampling would have been a routine and predictable design choice in view of Zhou’s threshold-driven point-cloud reduction and Zhao’s express even-sampling disclosures. Such a modification would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, Zhao, and Zhou, Claim 5 is rendered obvious by the combination of Luo in view of Flint, Zhao, and Zhou. In particular, Luo teaches the underlying point-cloud/global-map localization framework, Flint remains part of the parent-claim combination for the IMU-based updating aspect of Claim 1, Zhao teaches road/intersection-specific subset generation together with road-boundary sampling, even node spacing, and use of reduced representations for matching, and Zhou teaches a second threshold-based point-cloud reduction framework based on point-cloud amount and threshold-driven retention/further-reduction logic. The remaining aspects concerning explicit manual sampling and retaining intersection subsets while further sampling road subsets would have been obvious to one of ordinary skill in the art as routine implementation choices in view of Zhao’s road-specific sampling disclosures, Zhao’s emphasis on intersection structure, and Zhou’s disclosed multi-threshold point-cloud reduction workflow.

Regarding Claim 13,
The combination of Luo, Flint, Zhao, and Zhou establishes the device of Claim 12, which is the basis for Claim 13.

Zhou discloses or renders obvious:
whereinSee at least: “The first stage removes the drift points using a small threshold. The second stage removes the drift points using a large threshold.” (page 2).Rationale: Zhou expressly discloses a multi-stage processing pipeline in which additional conditions are applied after an initial reduction stage. Thus, the “wherein” clause properly introduces further conditions on the already established device-based sparse-subset processing workflow.
converting the respective dense point cloud data subsetsSee at least: “The first stage removes the drift points using a small threshold.” (page 2) and “After the non-iterative small threshold denoising of the 3D point cloud, we get P″ ...” (page 4).Rationale: The initial dense-to-sparse conversion concept was already relied upon in establishing Claim 12. It is repeated here only as antecedent context for the new Claim 13 limitations concerning what is done after the sparse subsets have already been generated. The newly added limitations begin with the second-stage evaluation of those sparse subsets.
into sparse point cloud data subsetsSee at least: “After the non-iterative small threshold denoising of the 3D point cloud, we get P″ ...” (page 4).Rationale: Zhou expressly discloses a post-first-stage point-cloud representation that is sparser than the original. Thus, Zhou supports the antecedent sparse-subset context on which the second-stage evaluation is carried out.
to generate the point cloud dataSee at least: “The remaining drift points are continuously deleted by the non-iterative large threshold denoising algorithm.” (page 4).Rationale: Zhou expressly discloses generation of a further-refined operative point-cloud representation after threshold-based reduction. Thus, Zhou supports the use of the sparse subsets as the point-cloud data subjected to additional evaluation.
comprises:See at least: “In this study, a non-iterative dual threshold framework for 3D point cloud denoising is proposed.” (page 2).Rationale: Zhou expressly discloses an open-ended, staged point-cloud processing workflow, thereby supporting the recited transitional phrase for the further sparse-subset processing steps.
determining whether a second total amount of dataSee at least: “Finally, we calculate the small threshold of the 3D point cloud ... where Nsum is the number of all points in P′.” (page 4) and “The first stage removes the drift points using a small threshold. The second stage removes the drift points using a large threshold.” (page 2).Rationale: Zhou expressly discloses threshold-based point-cloud processing and expressly uses the number of all points in the point cloud as part of the threshold framework. One of ordinary skill in the art would have understood this as evaluating the amount of point-cloud data to determine whether further point-cloud reduction should proceed. Accordingly, the claimed determination of a second total amount of data is expressly suggested by Zhou’s point-count-based threshold processing.
of respective sparse point cloud data subsetsSee at least: “After the non-iterative small threshold denoising of the 3D point cloud, we get P″ ...” (page 4) and “The second stage removes the drift points using a large threshold.” (page 2).Rationale: Zhou expressly discloses that, after a first-stage reduction, a remaining point-cloud representation is subjected to a second threshold stage. One of ordinary skill in the art would have understood the post-first-stage representation to be a sparser subset of the original point cloud on which the second-stage decision is made.
exceeds the predetermined threshold,See at least: “The first stage removes the drift points using a small threshold. The second stage removes the drift points using a large threshold.” (page 2) and “If there is only one point in the Vi,j, it is identified as a drift point; otherwise the points are regarded as non-noise points.” (page 4).Rationale: Zhou expressly discloses predetermined thresholds governing whether point-cloud data is retained or removed in a two-stage denoising process. Thus, Zhou directly supports the claimed threshold-comparison concept applied to already sparse subsets.
wherein when the second total amount of data does not exceed the predetermined threshold,See at least: “If there is only one point in the Vi,j, it is identified as a drift point; otherwise the points are regarded as non-noise points.” (page 4).Rationale: Zhou expressly discloses a threshold-governed branch in which points are either removed or retained. One of ordinary skill in the art would have understood that where the post-conversion sparse data does not exceed the operative threshold, the existing sparse subset may simply be retained for downstream use.
taking the respective sparse point cloud data subsets as the point cloud data,See at least: “We can then retain the non-noise points and delete the drift points.” (page 4).Rationale: Zhou expressly discloses retaining the point-cloud data that survives threshold-based filtering. Accordingly, Zhou teaches the claimed branch in which the sparse subsets are taken as the operative point-cloud data.
and when the second total amount of data exceeds the predetermined threshold,See at least: “The second stage removes the drift points using a large threshold.” (page 2) and “The remaining drift points are continuously deleted by the non-iterative large threshold denoising algorithm.” (page 4).Rationale: Zhou expressly teaches that when the reduced representation still warrants further processing, additional threshold-based reduction is applied. Thus, Zhou supports the claimed second conditional branch when the second amount of data still exceeds the threshold.

Zhao further teaches or renders obvious the road/intersection-specific treatment of the already sparse subsets:
manually sampling the sparse point cloud data subset for each roadSee at least: “A probe-based sampling method is then employed to sample the fused road boundaries.” (page 8–9) and “the sampled nodes in the proposed method are not included in the optimization” (page 8).Rationale: Zhao expressly discloses sampling road-boundary data after fusion. Zhao does not expressly recite that the sampling is “manual.” However, one of ordinary skill in the art would have understood that, where an automatically produced sparse road subset still remains too large after threshold-based reduction, manual or user-directed sampling of the road subset would have been a routine and predictable implementation option for further controlling road data density in a device-based workflow.
at a predetermined interval,See at least: “The key difference is that we sample the nodes evenly in the referenced polyline...” (page 7) and “A probe-based sampling method is then employed to sample the fused road boundaries.” (page 8–9).Rationale: Zhao expressly teaches even sampling of nodes and probe-based sampling of fused road boundaries. One of ordinary skill in the art would have understood even sampling to correspond to sampling at a predetermined interval. Accordingly, this interval-based sampling aspect is at least rendered obvious by Zhao.
and taking the sparse point cloud data subset for each intersectionSee at least: “Figure 11c-I, II, and VI show successfully mapped intersections” (page 13) and “the global structure of the map was not taken into consideration, which is crucial for the vectorization of complex intersections.” (page 8).Rationale: Zhao expressly emphasizes the importance of preserving detailed structure at intersections and expressly discusses successful mapped intersections. One of ordinary skill in the art would have understood from these teachings that intersection subsets are significant structural components that may appropriately be retained without the same level of reduction applied to road-only segments.
and the manually sampled sparse point cloud data subset for each roadSee at least: “we sample the nodes evenly in the referenced polyline” (page 7) and “A probe-based sampling method is then employed to sample the fused road boundaries.” (page 8–9).Rationale: Zhao expressly teaches sampling road-boundary representations and even node sampling. Thus, Zhao teaches the road-specific sampled subset aspect of the claim, at least as rendered obvious when applied after the second threshold evaluation taught by Zhou.
as the point cloud data.See at least: “The simplified representation can be reduced to merely averaging 6% of the points ... which is both lightweight and beneficial to the following matching process.” (page 7) and “the matching with simplified-LVMs outperforms raw-LVM-based matching ... in both accuracy and efficiency” (page 11).Rationale: Zhao expressly teaches using the reduced/sampled representation as the operative representation for downstream matching. Accordingly, Zhao teaches taking the retained intersection subset together with the sampled road subset as the point-cloud data used for subsequent processing.
Zhao and Zhou together teach or render obvious the new Claim 13 limitations without remapping the already-established Claim 12 conversion step. Zhou supplies the second-threshold evaluation and threshold-based retain/further-reduce logic for the already sparse subsets, while Zhao supplies the road-specific sampling, even-interval sampling concept, intersection preservation rationale, and use of the resulting reduced representation for subsequent matching in the device-based context.

Motivation to Combine Luo, Flint, Zhao, and Zhou
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, Zhao, and Zhou before them, to modify the Luo/Flint/Zhao combination used for Claims 9, 11, and 12 so that, after the dense road- and intersection-specific point-cloud subsets have already been converted into sparse point-cloud subsets for the device, those already sparse subsets are subjected to a second threshold-based data-volume evaluation as taught by Zhou, and, if the remaining sparse road-segment data still exceeds the acceptable threshold, the road subsets are further sampled at an even, predetermined interval while retaining the intersection subsets as taught by Zhao, because Zhou teaches a dual-threshold point-cloud reduction framework based on point-cloud amount and threshold-driven retention/further-reduction logic, Zhao teaches road-boundary sampling and even node spacing together with explicit preservation of important intersection geometry, Luo provides the overall device-based point-cloud/global-map localization framework in which the resulting point-cloud data is used for map matching and positioning, and Flint remains part of the parent-claim combination for the IMU-based updating aspect of Claim 9. One of ordinary skill in the art would have made such modification to further control data volume and noise after the initial dense-to-sparse conversion while preserving important intersection structure and maintaining lightweight road-segment data for matching, with the predictable benefit of improved efficiency, storage economy, and matching performance. The use of a second threshold and predetermined-interval road sampling would have been a routine and predictable design choice in view of Zhou’s threshold-driven point-cloud reduction and Zhao’s express even-sampling disclosures. Such a modification would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, Zhao, and Zhou, Claim 13 is rendered obvious by the combination of Luo in view of Flint, Zhao, and Zhou. In particular, Luo teaches the underlying device-based point-cloud/global-map localization framework, Flint remains part of the parent-claim combination for the IMU-based updating aspect of Claim 9, Zhao teaches road/intersection-specific subset generation together with road-boundary sampling, even node spacing, and use of reduced representations for matching, and Zhou teaches a second threshold-based point-cloud reduction framework based on point-cloud amount and threshold-driven retention/further-reduction logic. The remaining aspects concerning explicit manual sampling and retaining intersection subsets while further sampling road subsets would have been obvious to one of ordinary skill in the art as routine implementation choices in view of Zhao’s road-specific sampling disclosures, Zhao’s emphasis on intersection structure, and Zhou’s disclosed multi-threshold point-cloud reduction workflow.

Claims 6, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Luo, in view of Flint, in view of Zhao, and  Farshidi (Robust sequential view planning for object recognition using multiple cameras).

Regarding Claim 6,
The combination of Luo, Flint, and Zhao establishes the method of Claim 3, which is the basis for Claim 6.

Luo teaches or renders obvious:
and whereinSee at least: “constructing at least one 3D submap comprises: obtaining images from a camera...” ([0013]).Rationale: The reference expressly discloses a camera-based image-acquisition step within the localization process. Thus, the additional “and wherein” clause properly introduces further camera-based image-processing detail.
capturing at least one image frame of a surrounding environment of the vehicle within the scenario through the camera unitSee at least: “constructing at least one 3D submap comprises: obtaining images from a camera...” ([0013]) and “images of the environment are captured by the camera in approximately 30 Hz.” ([0068]).Rationale: The reference expressly discloses capturing images of the surrounding environment using a camera as part of the localization workflow. One of ordinary skill in the art would have understood such captured images to correspond to image frames obtained through the camera unit.

Luo teaches the baseline vehicle-localization framework and the baseline camera-image-capture portion of Claim 6. Luo does not expressly disclose the plurality of cameras, the multiway image capture at different angles, the frame-by-frame feature extraction across multiple ways, the confidence calculation, or the preferred-frame selection. Those limitations remain outstanding after Luo.

Flint teaches:
and extracting the plurality of feature points from the at least one image frameSee at least: “Images recorded by the image sensor 102 and processed by the module 106 may also be referred to as ‘frames.’” ([0031]) and “The pre-processing module 106 performs feature tracking within the recorded frames...” ([0031]).Rationale: Flint expressly discloses frames and expressly discloses feature tracking within those frames. Thus, Flint teaches extracting or tracking feature points from image frames.
comprises:See at least: “The pre-processing module 106 generates feature tracks by identifying one or more image features in a first frame and then matching those one or more image features...” ([0031]).Rationale: Flint expressly discloses an open-ended feature-processing workflow performed on frames. Thus, the recited transitional phrase is supported for the additional frame-processing operations.
extracting feature points from each way of image framesSee at least: “The pre-processing module 106 calculates ‘feature tracks,’ in which a feature track is a sequence of two-dimensional points representing the locations of a single feature tracked across two or more frames...” ([0031]).Rationale: Flint expressly discloses identifying image features in frames and tracking them across frames. One of ordinary skill in the art would have understood this to correspond to extracting feature points from each image set once multiple ways of image frames are available.
in the multiway of image frames;See at least: “the electronic computing system may further include... an image detection unit ... configured to receive multiple images captured by the image detection unit and derive the image data from multiple images” ([0008]) and “for each window, the multiple first variables represent 3D positions and/or orientations of image features across multiple images” ([0010]).Rationale: Flint expressly teaches processing image features across multiple images. Thus, Flint teaches the multi-image context in which feature points are extracted once the multiway image set is provided.
selecting a preferred frame from the multiway of image framesSee at least: “The pre-processing module 106 also is configured to select a subset of the frames received from the image sensor 102 as ‘keyframes.’” ([0032]) and “A frame may be designated as a keyframe by the module 106 based on one or more parameters, such as a desired keyframe frequency, or statistics related to the number of image features identified in a current frame or matched in a pair of frames.” ([0032]).Rationale: Flint expressly teaches selecting certain frames from a larger set of frames as keyframes and further teaches that the selection may be based on feature-related statistics. This directly supports the claimed concept of selecting a preferred frame from multiple frames based on feature-related criteria.
taking the feature points in the preferred frameSee at least: “the feature locations associated with FALSE elements ... will be ignored, whereas feature locations associated with TRUE elements ... will be used for further calculations.” ([0034]).Rationale: Flint expressly teaches using feature locations from selected/retained frames for subsequent calculations while ignoring others. Thus, Flint teaches taking the feature points in the selected preferred frame for downstream use.
as the plurality of feature points.See at least: “The pre-processing module 106 generates feature tracks by identifying one or more image features in a first frame...” ([0031]) and “feature locations associated with TRUE elements ... will be used for further calculations.” ([0034]).Rationale: Flint expressly teaches identifying image features in frames and using retained feature locations for further calculations. Accordingly, Flint teaches taking the feature points from the selected frame as the operative plurality of feature points.

After Luo, Flint, and Zhao, the following limitations are not explicitly disclosed: the camera unit includes a plurality of cameras; arranged around the vehicle; capturing multiway of image frames of the surrounding environment of the vehicle; at different angles; through the plurality of cameras;; calculating confidences of the extracted feature points; from each way of image frames;; and according to the confidences of each way of image frames;

Farshidi teaches:
the camera unit includes a plurality of camerasSee at least: “While prior relevant research in active object recognition/pose estimation has mostly focused on single-camera systems, we propose two multi-camera solutions...” and “multiple cameras simultaneously acquire images from different view angles...” (page 1).See also: “We have developed an active two-camera vision system...” (page 3).Rationale: Farshidi expressly discloses a plurality of cameras, specifically a two-camera system, used together for image acquisition. Thus, Farshidi teaches the plural-camera aspect of the claim.
arranged around the vehicle,See at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1) and “The second unit moves along the y-axis on a linear track on the side of the workspace.” (page 3).Rationale: Farshidi does not expressly recite cameras arranged around a vehicle. However, Farshidi expressly teaches a plural-camera arrangement positioned to obtain different views of the target from different surrounding locations. In view of Luo’s autonomous-vehicle localization context, one of ordinary skill in the art would have found it obvious to arrange the plural cameras around the vehicle so as to obtain surrounding views for localization, because doing so is the predictable vehicle implementation of Farshidi’s multi-angle plural-camera concept.
capturing multiway of image frames of the surrounding environment of the vehicleSee at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1) and “Such scenarios can also motivate the use of multiple cameras which can provide simultaneous views of the object from different viewing angles.” (page 8).Rationale: Farshidi expressly discloses simultaneous acquisition of multiple images from different viewpoints. One of ordinary skill in the art would have understood such simultaneous multiple views to correspond to capturing a multiway set of image frames.
at different anglesSee at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1).Rationale: Farshidi expressly teaches acquisition of images at different view angles.
through the plurality of cameras;See at least: “we propose two multi-camera solutions...” and “multiple cameras simultaneously acquire images...” (page 1).Rationale: Farshidi expressly teaches acquiring the multi-angle images through a plurality of cameras.
calculating confidences of the extracted feature pointsSee at least: “the recognition algorithms attempt to classify the object, if its identity/pose can be determined with a high confidence level. Otherwise, the algorithms would compute the next most informative camera positions...” (page 1) and “Two performance indices are used to quantify the quality of observations...” (page 3).Rationale: Farshidi does not expressly recite computing a confidence value for each extracted feature point in those exact words. However, Farshidi expressly teaches evaluating confidence level and quantifying observation quality for different views. One of ordinary skill in the art would have understood that, in a multi-view feature-selection workflow, confidence values can be assigned to feature information extracted from each view as a routine way of quantifying which view is more informative.
from each way of image frames;See at least: “multiple cameras simultaneously acquire images from different view angles...” and “The camera positions at each recognition step are selected based on two statistical metrics quantifying the quality of the observations...” (page 1).Rationale: Farshidi expressly teaches evaluating observations obtained from different camera views. Thus, applying a confidence or quality assessment to each way of image frames is at least rendered obvious by Farshidi’s per-view observation-quality framework.
according to the confidences of each way of image frames; andSee at least: “the recognition algorithms attempt to classify the object, if its identity/pose can be determined with a high confidence level. Otherwise, the algorithms would compute the next most informative camera positions...” (page 1) and “The camera positions at each recognition step are selected based on two statistical metrics quantifying the quality of the observations...” (page 1).Rationale: Farshidi expressly teaches confidence- and quality-based evaluation of different views and selection of camera positions based on those measures. In view of Flint’s frame-selection teaching, one of ordinary skill in the art would have found it obvious to select a preferred frame according to confidence values associated with each way of image frames.

Motivation to Combine Luo, Flint, Zhao, and Farshidi
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, Zhao, and Farshidi before them, to modify the Luo/Flint/Zhao combination used for Claims 1 and 3 so that the camera unit includes a plurality of cameras arranged to acquire multiple image frames of the surrounding environment from different angles, and so that feature points are extracted from the multiway images, observation-quality or confidence values are evaluated for the feature information from each view, and a preferred frame is selected for use in downstream localization, because Luo already discloses the vehicle-localization framework in which camera-derived feature points are used for localization, Flint teaches extracting feature points from frames and selecting frames based on feature-related statistics, Zhao remains part of the parent-claim combination for the Claim 3 fused-map limitations, and Farshidi teaches simultaneous multi-camera image acquisition from different viewing angles together with confidence- and quality-based evaluation of different observations. One of ordinary skill in the art would have made such modification to improve viewpoint coverage, reduce ambiguity, and select the most informative camera view for localization, with the predictable benefit of more reliable feature-point extraction and improved localization performance in complex road and intersection environments. Such a modification would merely apply known multi-camera observation-selection techniques and known frame/feature-selection techniques to a known vehicle-localization framework and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, Zhao, and Farshidi, Claim 6 is rendered obvious by the combination of Luo in view of Flint, Zhao, and Farshidi. In particular, Luo teaches the underlying vehicle-localization framework and baseline camera-based image capture, Flint teaches extraction of feature points from frames and frame selection based on feature-related statistics, Zhao remains part of the parent-claim combination for the Claim 3 fused-map limitations, and Farshidi teaches a plurality of cameras simultaneously acquiring images from different viewing angles together with confidence- and observation-quality-based evaluation of different views. The remaining aspects concerning explicitly arranging the plurality of cameras around the vehicle would have been obvious to one of ordinary skill in the art in view of Farshidi’s multi-angle plural-camera architecture and the predictable adaptation of that architecture to Luo’s autonomous-vehicle setting.

Regarding Claim 14,
The combination of Luo, Flint, and Zhao establishes the device of Claim 11, which is the basis for Claim 14.

Luo discloses or renders obvious:
whereinSee at least: “constructing at least one 3D submap comprises: obtaining images from a camera...” ([0013]).Rationale: The reference expressly discloses a camera-based image-acquisition step within the localization process. Thus, the additional “and wherein” clause properly introduces further camera-based image-processing detail.
capturing at least one image frame of a surrounding environment of the vehicle within the scenario through the camera unitSee at least: “constructing at least one 3D submap comprises: obtaining images from a camera...” ([0013]) and “images of the environment are captured by the camera in approximately 30 Hz.” ([0068]).Rationale: The reference expressly discloses capturing images of the surrounding environment using a camera as part of the localization workflow. One of ordinary skill in the art would have understood such captured images to correspond to image frames obtained through the camera unit.

Luo teaches the baseline device-localization framework and the baseline camera-image-capture portion of Claim 14. Luo does not expressly disclose the plurality of cameras, the multiway image capture at different angles, the frame-by-frame feature extraction across multiple ways, the confidence calculation, or the preferred-frame selection. Those limitations remain outstanding after Luo.
Flint teaches:
and extracting the plurality of feature points from the at least one image frameSee at least: “Images recorded by the image sensor 102 and processed by the module 106 may also be referred to as ‘frames.’” ([0031]) and “the pre-processing module 106 performs feature tracking within the recorded frames...” ([0031]).Rationale: Flint expressly discloses frames and expressly discloses feature tracking within those frames. Thus, Flint teaches extracting or tracking feature points from image frames.
comprises:See at least: “The pre-processing module 106 generates feature tracks by identifying one or more image features in a first frame and then matching those one or more image features with one or more corresponding image features in consecutive frames.” ([0031]).Rationale: Flint expressly discloses an open-ended feature-processing workflow performed on frames. Thus, the recited transitional phrase is supported for the additional frame-processing operations.
extracting feature points from each way of image framesSee at least: “The pre-processing module 106 calculates ‘feature tracks,’ in which a feature track is a sequence of two-dimensional points representing the locations of a single feature tracked across two or more frames...” ([0031]).Rationale: Flint expressly discloses identifying image features in frames and tracking them across frames. One of ordinary skill in the art would have understood this to correspond to extracting feature points from each image set once multiple ways of image frames are available.
in the multiway of image frames;See at least: “the electronic computing system may further include... an image detection unit ... configured to receive multiple images captured by the image detection unit and derive the image data from multiple images” ([0008]) and “for each window, the multiple first variables represent 3D positions and/or orientations of image features across multiple images” ([0010]).Rationale: Flint expressly teaches use of multiple images and image features across multiple images. Thus, Flint teaches the multi-image context in which feature points are extracted once the multiway image set is provided.
selecting a preferred frame from the multiway of image framesSee at least: “The pre-processing module 106 also is configured to select a subset of the frames received from the image sensor 102 as ‘keyframes.’” ([0032]) and “A frame may be designated as a keyframe by the module 106 based on one or more parameters, such as a desired keyframe frequency, or statistics related to the number of image features identified in a current frame or matched in a pair of frames.” ([0032]).Rationale: Flint expressly teaches selecting certain frames from a larger set of frames as keyframes and further teaches that the selection may be based on feature-related statistics. This directly supports the claimed concept of selecting a preferred frame from multiple frames based on feature-related criteria.
taking the feature points in the preferred frameSee at least: “the feature locations associated with FALSE elements ... will be ignored, whereas feature locations associated with TRUE elements ... will be used for further calculations.” ([0034]).Rationale: Flint expressly teaches using feature locations from selected or retained frames for subsequent calculations while ignoring others. Thus, Flint teaches taking the feature points in the selected preferred frame for downstream use.
as the plurality of feature points.See at least: “The pre-processing module 106 generates feature tracks by identifying one or more image features in a first frame...” ([0031]) and “feature locations associated with TRUE elements ... will be used for further calculations.” ([0034]).Rationale: Flint expressly teaches identifying image features in frames and using retained feature locations for further calculations. Accordingly, Flint teaches taking the feature points from the selected frame as the operative plurality of feature points.

After Luo, Flint, and Zhao, the following limitations are not explicitly disclosed: the camera unit includes a plurality of cameras; arranged around the vehicle; capturing multiway of image frames of the surrounding environment of the vehicle; at different angles; through the plurality of cameras;; calculating confidences of the extracted feature points; from each way of image frames;; and according to the confidences of each way of image frames;

Farshidi discloses:
the camera unit includes a plurality of camerasSee at least: “While prior relevant research in active object recognition/pose estimation has mostly focused on single-camera systems, we propose two multi-camera solutions...” and “multiple cameras simultaneously acquire images from different view angles...” (page 1).See also: “We have developed an active two-camera vision system...” (page 3).Rationale: Farshidi expressly discloses a plurality of cameras, specifically a two-camera system, used together for image acquisition. Thus, Farshidi teaches the plural-camera aspect of the claim.
arranged around the vehicle,See at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1, Abstract) and “The first pan/tilt unit is mounted on an x–y gantry frame. The second unit moves along the y-axis on a linear track on the side of the workspace.” (page 3).Rationale: Farshidi does not expressly recite cameras arranged around a vehicle. However, Farshidi expressly teaches a plural-camera arrangement positioned to obtain different views of the target from different surrounding locations. In view of Luo’s autonomous-vehicle localization context, one of ordinary skill in the art would have found it obvious to arrange the plural cameras around the vehicle so as to obtain surrounding views for localization, because doing so is the predictable vehicle implementation of Farshidi’s multi-angle plural-camera concept.
capturing multiway of image frames of the surrounding environment of the vehicleSee at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1) and “Such scenarios can also motivate the use of multiple cameras which can provide simultaneous views of the object from different viewing angles.” (page 8).Rationale: Farshidi expressly discloses simultaneous acquisition of multiple images from different viewpoints. One of ordinary skill in the art would have understood such simultaneous multiple views to correspond to capturing a multiway set of image frames.
at different anglesSee at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1).Rationale: Farshidi expressly teaches acquisition of images at different view angles.
through the plurality of cameras;See at least: “we propose two multi-camera solutions...” and “multiple cameras simultaneously acquire images from different view angles...” (page 1, Abstract).Rationale: Farshidi expressly teaches acquiring the multi-angle images through a plurality of cameras.
calculating confidences of the extracted feature pointsSee at least: “the recognition algorithms attempt to classify the object, if its identity/pose can be determined with a high confidence level. Otherwise, the algorithms would compute the next most informative camera positions...” (page 1, Abstract) and “Two performance indices are used to quantify the quality of observations...” (page 3).Rationale: Farshidi does not expressly recite computing a confidence value for each extracted feature point in those exact words. However, Farshidi expressly teaches evaluating confidence level and quantifying observation quality for different views. One of ordinary skill in the art would have understood that, in a multi-view feature-selection workflow, confidence values can be assigned to feature information extracted from each view as a routine way of quantifying which view is more informative.
from each way of image frames;See at least: “The observation vector g includes the eigenspace coefficients corresponding to the images obtained from the two cameras, i.e. g = [g1 g2]” (page 4) and “the cameras’ observations ... are conditionally independent of each other” (page 4).Rationale: Farshidi expressly teaches separate observations from the different camera views and evaluates those observations within a probabilistic framework. Thus, applying a confidence or quality assessment to each way of image frames is at least rendered obvious by Farshidi’s per-view observation framework.
according to the confidences of each way of image frames;See at least: “the recognition algorithms attempt to classify the object, if its identity/pose can be determined with a high confidence level. Otherwise, the algorithms would compute the next most informative camera positions...” (page 1) and “Two performance indices are used to quantify the quality of observations in the context of state estimation and subsequently choose the next best positions of the cameras.” (page 3).Rationale: Farshidi expressly teaches confidence- and quality-based evaluation of different views and selection of camera positions based on those measures. In view of Flint’s frame-selection teaching, one of ordinary skill in the art would have found it obvious to select a preferred frame according to confidence values associated with each way of image frames.

Motivation to Combine Luo, Flint, Zhao, and Farshidi
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, Zhao, and Farshidi before them, to modify the Luo/Flint/Zhao combination used for Claims 9 and 11 so that the device’s camera unit includes a plurality of cameras arranged to acquire multiple image frames of the surrounding environment from different angles, and so that feature points are extracted from the multiway images, observation-quality or confidence values are evaluated for the feature information from each view, and a preferred frame is selected for use in downstream localization, because Luo already discloses the device-based vehicle-localization framework in which camera-derived feature points are used for localization, Flint teaches extracting feature points from frames and selecting frames based on feature-related statistics, Zhao remains part of the parent-claim combination for the Claim 11 fused-map limitations, and Farshidi teaches simultaneous multi-camera image acquisition from different viewing angles together with confidence- and quality-based evaluation of different observations. One of ordinary skill in the art would have made such modification to improve viewpoint coverage, reduce ambiguity, and select the most informative camera view for localization, with the predictable benefit of more reliable feature-point extraction and improved localization performance in complex road and intersection environments. Such a modification would merely apply known multi-camera observation-selection techniques and known frame/feature-selection techniques to a known device-based vehicle-localization framework and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, Zhao, and Farshidi, Claim 14 is rendered obvious by the combination of Luo in view of Flint, Zhao, and Farshidi. In particular, Luo teaches the underlying device-based vehicle-localization framework and baseline camera-based image capture, Flint teaches extraction of feature points from frames and frame selection based on feature-related statistics, Zhao remains part of the parent-claim combination for the Claim 11 fused-map limitations, and Farshidi teaches a plurality of cameras simultaneously acquiring images from different viewing angles together with confidence- and observation-quality-based evaluation of different views. The remaining aspects concerning explicitly arranging the plurality of cameras around the vehicle would have been obvious to one of ordinary skill in the art in view of Farshidi’s multi-angle plural-camera architecture and the predictable adaptation of that architecture to Luo’s autonomous-vehicle setting.

Regarding Claim 20,
The combination of Luo, Flint, and Zhao establishes the non-transitory computer-readable storage medium of Claim 19, which is the basis for Claim 20.

Luo renders obvious:
whereinSee at least: “The one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform...” ([0010]).Rationale: The reference expressly discloses that the stored instructions cause a computing device to carry out the localization method. Thus, the “wherein” clause properly introduces additional conditions on that already established storage-medium-implemented method.
capturing at least one image frame of a surrounding environment of the vehicle within the scenario through the camera unitSee at least: “constructing at least one 3D submap comprises: obtaining images from a camera...” ([0013]) and “images of the environment are captured by the camera in approximately 30 Hz.” ([0068]).Rationale: The reference expressly discloses capturing images of the surrounding environment using a camera as part of the localization workflow. One of ordinary skill in the art would have understood such captured images to correspond to image frames obtained through the camera unit.

Luo teaches the baseline storage-medium vehicle-localization framework and the baseline camera-image-capture portion of Claim 20. Luo does not expressly disclose the plurality of cameras, the multiway image capture at different angles, the frame-by-frame feature extraction across multiple ways, the confidence calculation, or the preferred-frame selection. Those limitations remain outstanding after Luo.

Flint discloses:
and extracting the plurality of feature points from the at least one image frameSee at least: “Images recorded by the image sensor 102 and processed by the module 106 may also be referred to as ‘frames.’” ([0031]) and “The pre-processing module 106 performs feature tracking within the recorded frames...” ([0031]).Rationale: Flint expressly discloses frames and expressly discloses feature tracking within those frames. Thus, Flint teaches extracting or tracking feature points from image frames.
comprises:See at least: “The pre-processing module 106 generates feature tracks by identifying one or more image features in a first frame and then matching those one or more image features with one or more corresponding image features in consecutive frames.” ([0031]).Rationale: Flint expressly discloses an open-ended feature-processing workflow performed on frames. Thus, the recited transitional phrase is supported for the additional frame-processing operations.
extracting feature points from each way of image framesSee at least: “The pre-processing module 106 calculates ‘feature tracks,’ in which a feature track is a sequence of two-dimensional points representing the locations of a single feature tracked across two or more frames...” ([0031]).Rationale: Flint expressly discloses identifying image features in frames and tracking them across frames. One of ordinary skill in the art would have understood this to correspond to extracting feature points from each image set once multiple ways of image frames are available.
in the multiway of image frames;See at least: “the electronic computing system may further include... an image detection unit ... configured to receive multiple images captured by the image detection unit and derive the image data from multiple images” ([0008]) and “for each window, the multiple first variables represent 3D positions and/or orientations of image features across multiple images” ([0010]).Rationale: Flint expressly teaches use of multiple images and image features across multiple images. Thus, Flint teaches the multi-image context in which feature points are extracted once the multiway image set is provided.
selecting a preferred frame from the multiway of image framesSee at least: “The pre-processing module 106 also is configured to select a subset of the frames received from the image sensor 102 as ‘keyframes.’” ([0032]) and “A frame may be designated as a keyframe by the module 106 based on one or more parameters, such as a desired keyframe frequency, or statistics related to the number of image features identified in a current frame or matched in a pair of frames.” ([0032]).Rationale: Flint expressly teaches selecting certain frames from a larger set of frames as keyframes and further teaches that the selection may be based on feature-related statistics. This directly supports the claimed concept of selecting a preferred frame from multiple frames based on image-feature-related criteria.
taking the feature points in the preferred frameSee at least: “the feature locations associated with FALSE elements ... will be ignored, whereas feature locations associated with TRUE elements ... will be used for further calculations.” ([0034]).Rationale: Flint expressly teaches using feature locations from selected or retained frames for subsequent calculations while ignoring others. Thus, Flint teaches taking the feature points in the selected preferred frame for downstream use.
as the plurality of feature points.See at least: “The pre-processing module 106 generates feature tracks by identifying one or more image features in a first frame...” ([0031]) and “feature locations associated with TRUE elements ... will be used for further calculations.” ([0034]).Rationale: Flint expressly teaches identifying image features in frames and using retained feature locations for further calculations. Accordingly, Flint teaches taking the feature points from the selected frame as the operative plurality of feature points.

After Luo, Flint, and Zhao, the following limitations are not explicitly disclosed: the camera unit includes a plurality of cameras; arranged around the vehicle; capturing multiway of image frames of the surrounding environment of the vehicle; at different angles; through the plurality of cameras;; calculating confidences of the extracted feature points; from each way of image frames;; and according to the confidences of each way of image frames;

Farshidi discloses:
the camera unit includes a plurality of camerasSee at least: “While prior relevant research in active object recognition/pose estimation has mostly focused on single-camera systems, we propose two multi-camera solutions...” and “multiple cameras simultaneously acquire images from different view angles...” (page 1).See also: “We have developed an active two-camera vision system...” (page 3).Rationale: Farshidi expressly discloses a plurality of cameras, specifically a two-camera system, used together for image acquisition. Thus, Farshidi teaches the plural-camera aspect of the claim.
arranged around the vehicle,See at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1) and “The first pan/tilt unit is mounted on an x–y gantry frame. The second unit moves along the y-axis on a linear track on the side of the workspace.” (page 3).Rationale: Farshidi does not expressly recite cameras arranged around a vehicle. However, Farshidi expressly teaches a plural-camera arrangement positioned to obtain different views of the target from different surrounding locations. In view of Luo’s autonomous-vehicle localization context, one of ordinary skill in the art would have found it obvious to arrange the plural cameras around the vehicle so as to obtain surrounding views for localization, because doing so is the predictable vehicle implementation of Farshidi’s multi-angle plural-camera concept.
capturing multiway of image frames of the surrounding environment of the vehicleSee at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1) and “Such scenarios can also motivate the use of multiple cameras which can provide simultaneous views of the object from different viewing angles.” (page 1078 / page 8 of the article).Rationale: Farshidi expressly discloses simultaneous acquisition of multiple images from different viewpoints. One of ordinary skill in the art would have understood such simultaneous multiple views to correspond to capturing a multiway set of image frames.
at different anglesSee at least: “multiple cameras simultaneously acquire images from different view angles...” (page 1).Rationale: Farshidi expressly teaches acquisition of images at different view angles.
through the plurality of cameras;See at least: “we propose two multi-camera solutions...” and “multiple cameras simultaneously acquire images from different view angles...” (page 1).Rationale: Farshidi expressly teaches acquiring the multi-angle images through a plurality of cameras.
calculating confidences of the extracted feature pointsSee at least: “the recognition algorithms attempt to classify the object, if its identity/pose can be determined with a high confidence level. Otherwise, the algorithms would compute the next most informative camera positions...” (page 1) and “Two performance indices are used to quantify the quality of observations...” (page 1073 / page 3 of the article).Rationale: Farshidi does not expressly recite computing a confidence value for each extracted feature point in those exact words. However, Farshidi expressly teaches evaluating confidence level and quantifying observation quality for different views. One of ordinary skill in the art would have understood that, in a multi-view feature-selection workflow, confidence values can be assigned to feature information extracted from each view as a routine way of quantifying which view is more informative.
from each way of image frames;See at least: “g = [g1 g2], where g1 and g2 are observation vectors from Camera 1 and 2, respectively” and “We assume that the cameras’ observations ... are conditionally independent of each other.” (page 1075 / page 4 of the article).Rationale: Farshidi expressly teaches separate observations from the different camera views and evaluates those observations within a probabilistic framework. Thus, applying a confidence or quality assessment to each way of image frames is at least rendered obvious by Farshidi’s per-view observation framework.
according to the confidences of each way of image frames; andSee at least: “the recognition algorithms attempt to classify the object, if its identity/pose can be determined with a high confidence level. Otherwise, the algorithms would compute the next most informative camera positions...” (page 1) and “Two performance indices are used to quantify the quality of observations in the context of state estimation and subsequently choose the next best positions of the cameras.” (page 1073 / page 3 of the article).Rationale: Farshidi expressly teaches confidence- and quality-based evaluation of different views and selection of camera positions based on those measures. In view of Flint’s frame-selection teaching, one of ordinary skill in the art would have found it obvious to select a preferred frame according to confidence values associated with each way of image frames.

Motivation to Combine Luo, Flint, Zhao, and Farshidi
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, Zhao, and Farshidi before them, to modify the Luo/Flint/Zhao combination used for Claims 17 and 19 so that the storage-medium-implemented localization method uses a camera unit having a plurality of cameras arranged to acquire multiple image frames of the surrounding environment from different angles, and so that feature points are extracted from the multiway images, observation-quality or confidence values are evaluated for the feature information from each view, and a preferred frame is selected for use in downstream localization, because Luo already discloses the storage-medium-based vehicle-localization framework in which camera-derived feature points are used for localization, Flint teaches extracting feature points from frames and selecting frames based on feature-related statistics, Zhao remains part of the parent-claim combination for the Claim 19 fused-map limitations, and Farshidi teaches simultaneous multi-camera image acquisition from different viewing angles together with confidence- and quality-based evaluation of different observations. One of ordinary skill in the art would have made such modification to improve viewpoint coverage, reduce ambiguity, and select the most informative camera view for localization, with the predictable benefit of more reliable feature-point extraction and improved localization performance in complex road and intersection environments. Such a modification would merely apply known multi-camera observation-selection techniques and known frame/feature-selection techniques to a known storage-medium-based vehicle-localization framework and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, Zhao, and Farshidi, Claim 20 is rendered obvious by the combination of Luo in view of Flint, Zhao, and Farshidi. In particular, Luo teaches the underlying non-transitory storage-medium vehicle-localization framework and baseline camera-based image capture, Flint teaches extraction of feature points from frames and frame selection based on feature-related statistics, Zhao remains part of the parent-claim combination for the Claim 19 fused-map limitations, and Farshidi teaches a plurality of cameras simultaneously acquiring images from different viewing angles together with confidence- and observation-quality-based evaluation of different views. The remaining aspects concerning explicitly arranging the plurality of cameras around the vehicle would have been obvious to one of ordinary skill in the art in view of Farshidi’s multi-angle plural-camera architecture and the predictable adaptation of that architecture to Luo’s autonomous-vehicle setting.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Luo, in view of Flint, in view of Zhao, in view of Farshidi, in view of Lin (Autonomous Vehicle Localization with Prior Visual Point Cloud Map Constraints in GNSS-Challenged Environments), and in view of Liang (Fusion of LIDAR Point Clouds and Large Scale Vector Maps for the Reconstruction of 3-D Road Models).


Regarding Claim 7,
The combination of Luo, Flint, Zhao, and Farshidi establishes the method of Claim 6, which is the basis for Claim 7.

Luo teaches or renders obvious:
whereinSee at least: “In another embodiment, before computing matching score, the method further comprises: constructing at least one 3D submap; and constructing a global map.” ([0012]).Rationale: Luo expressly discloses an ordered matching workflow operating on a submap and a global map. Thus, the “wherein” clause properly introduces the more specific matching details now recited.
performing a matching of the plurality of feature pointsSee at least: “the features extracted from the 3D submap are matched against the features extracted from the global map...” ([0071]).Rationale: Luo expressly discloses matching the camera-derived 3D-submap features against the LiDAR-derived global-map features. One of ordinary skill in the art would have understood the camera-derived features to correspond to the claimed plurality of feature points used for matching.
with point cloud data in the point cloud basemapSee at least: “constructing a city-scale 3D map based on the data from the LiDAR, using LiDAR mapping.” ([0014]) and “the unstructured features include sparse 3D points.” ([0017]).Rationale: Luo expressly discloses a LiDAR-based global map and sparse 3D point features. Accordingly, Luo teaches matching against point-cloud-type data in the basemap.
to determine the position of the vehicle within the vector mapSee at least: “Subsequently, in operation 17, location of the 3D submap is iteratively estimated until a distance between corresponding features is minimized.” ([0072]).Rationale: Luo expressly discloses determining the submap location from the matching result. In the already established parent-claim combination, the point-cloud basemap and vector-map framework is present; therefore, Luo’s location estimation corresponds to determining vehicle position within that mapped framework.
according to the result of the matchingSee at least: “computing... matching score...” ([0010]) and “the features extracted from the 3D submap are matched against the features extracted from the global map...” ([0071]).Rationale: Luo expressly teaches that location estimation follows from the feature-matching process. Thus, Luo teaches determining position according to the result of the matching.
comprises:See at least: “before computing matching score, the method further comprises...” ([0011]) and “In another embodiment, before computing matching score, the method further comprises...” ([0012]).Rationale: Luo expressly discloses an open-ended matching workflow with constituent sub-steps. Thus, the transitional phrase is supported.

Luo teaches the core matching-and-position-determination framework of Claim 7. Luo does not expressly disclose the more specific local-part restriction to current-position road/intersection subsets within a predetermined range, nor the explicit fallback from local-part matching to all point-cloud data. Those limitations remain outstanding after Luo.

Zhao teaches:
the part of the point cloud data includes point cloud data subsetsSee at least: “We explored the following: (1) the extraction and vectorization of road structures based on multiframe probabilistic fusion; (2) the efficient vector-based matching between frames of road structures...” (page 2) and “Most of the current methods usually focus on the road boundary detection in a single frame... Therefore, a multiframe fusion is required for robust and precise road structure detection.” (page 4).Rationale: Zhao expressly discloses extracting and matching local road-structure data generated from multiframe fusion. One of ordinary skill in the art would have understood these local fused road-structure portions to constitute subsets of the overall point-cloud/map data.
for the roads and intersectionsSee at least: “The high-precision high-definition (HD) map of the road environment is now recognized as one of the cornerstones for autonomous driving. The reliable mapping of road boundaries, lanes, and other road structures...” (page 2).Rationale: Zhao expressly teaches road-structure mapping, including road boundaries and related road structures, in autonomous-driving environments. In the already established parent-claim context, these local subsets correspond to roads and intersections.

Zhao supplies the road-structure subset context for the local-part matching recited in Claim 7. Zhao does not expressly disclose current-position-based range restriction or fallback to matching against all point-cloud data. Those limitations remain outstanding after Luo, Flint, and Zhao.

Lin teaches:
performing a matching of the plurality of feature pointsSee at least: “the current visual point cloud frame is matched with the candidate sub-map by NDT.” (page 4) and “With reliable initial pose estimation, current visual point cloud frames obtained from a stereo camera is matched with the candidate sub-map...” (page 7).Rationale: Lin expressly discloses matching the current camera-derived point-cloud/frame information against a candidate sub-map. In view of the parent-claim combination, this corresponds to matching the claimed plurality of feature points.
with a part of the point cloud data in the point cloud basemapSee at least: “The main idea was to segment the sub-map from a large point cloud...” (page 5) and “the current visual point cloud frame is matched with the candidate sub-map...” (page 4/page 7).Rationale: Lin expressly discloses segmenting a sub-map from a large prior point cloud and matching the current frame against that candidate sub-map. Thus, Lin teaches matching against a part of the point-cloud basemap rather than the whole map.
corresponding to a current position of the vehicle,See at least: “where Lvechicle is the position of the vehicle...” and “Bcenter is the new center point coordinates of the sub-map.” (page 5).Rationale: Lin expressly discloses that the sub-map is centered and updated based on the vehicle position. Thus, Lin teaches that the matched sub-map corresponds to the current position of the vehicle.
whereinSee at least: “The principle of segmentation is as follows...” followed by the position-based sub-map update rule (page 5).Rationale: Lin expressly discloses a conditional segmentation rule tied to vehicle position. Thus, the clause properly introduces the more specific position/range limitation.
within a predetermined rangeSee at least: “τ is the segmentation threshold, we set it to be 80 m in this paper.” (page 5).Rationale: Lin expressly discloses a predetermined threshold distance used to define when the vehicle is near the sub-map edge and when the sub-map should be reset. One of ordinary skill in the art would have understood this threshold-based sub-map extent to correspond to a predetermined range around the current position.
of the current position of the vehicle,See at least: “where Lvechicle is the position of the vehicle, Bedge is edge of the sub-map, τ is the segmentation threshold...” (page 5).Rationale: Lin expressly ties the sub-map definition to the current vehicle position. Thus, the predetermined range is expressly anchored to the current position of the vehicle.
whereinSee at least: “If |Lvechicle − Bedge| > τ, continue; else, Bcenter = Lvechicle.” (page 5).Rationale: Lin expressly discloses a conditional decision rule for the local sub-map. Thus, the clause properly introduces the succeeding/failing local-matching logic in Claim 7.
when the matching of the plurality of feature points with the part of the point cloud data succeeds,See at least: “If the transformation parameters can make the two scans match well, the probability density of the transformed points in the candidate sub-map will be large.” (page 7).Rationale: Lin expressly discloses the condition in which matching with the candidate sub-map succeeds, i.e., when the scans match well under the NDT metric.
determining the position of the vehicle within the vector mapSee at least: “Finally, the matching result was used to update pose prediction based on the last frame for accurate localization.” (page 1/page 2/page 4).Rationale: Lin expressly discloses using the successful sub-map matching result to update pose prediction for accurate localization. Thus, Lin teaches determining the vehicle position based on successful local-part matching.
according to the result of the matchingSee at least: “Then, the current visual point cloud frame is matched with the candidate sub-map by NDT. Finally, the matching result was used to update pose prediction...” (page 4).Rationale: Lin expressly teaches that pose refinement follows from the matching result. Thus, this limitation is expressly taught.
and when the matching of the plurality of feature points with the part of the point cloud data fails,
See at least: “It is time-consuming to match the visual point cloud generated by a stereo camera with the whole prior visual point cloud map.” (page 5) and “the current visual point cloud frame is matched with the candidate sub-map...” (page 4/page 7).
Rationale: Lin expressly contrasts whole-map matching with candidate-sub-map matching. Although Lin does not expressly recite a failure branch in those exact words, one of ordinary skill in the art would have understood that if matching against the candidate sub-map does not succeed, a predictable fallback is to broaden the search from the local part to the whole prior point-cloud map, especially because Lin expressly identifies whole-map matching as the broader baseline from which sub-map matching is an efficiency improvement.
performing a matching of the plurality of feature pointsSee at least: “It is time-consuming to match the visual point cloud generated by a stereo camera with the whole prior visual point cloud map.” (page 5).Rationale: Lin expressly teaches matching against the whole prior visual point cloud map as the broader matching operation. Thus, Lin supports the fallback full-map matching concept.
with all the point cloud data in the point cloud basemap,See at least: “match the visual point cloud generated by a stereo camera with the whole prior visual point cloud map.” (page 5).Rationale: Lin expressly discloses whole-map matching. One of ordinary skill in the art would have understood the whole prior point cloud map to correspond to all the point-cloud data in the basemap.

Lin teaches the current-position candidate-submap matching, the predetermined-range restriction, successful local matching followed by localization update, and a supportable PHOSITA-obvious fallback from local-part matching to whole-map matching if local matching does not succeed. What remains after Lin is the explicit road/intersection subset content and the explicit mapping relationship between the point-cloud basemap and the vector map.

Liang teaches:
for the roads and intersectionsSee at least: “The roadsides can be paired. The centerlines can be determined...” and “a node is generated by the intersection of two centerlines for the crossroads...” (pages 2–4) and “The test area includes highways, arterial streets, local streets, and traffic islands.” (pages 4–6).Rationale: Liang expressly discloses road networks, road segments, centerlines, and crossroads/intersections. Thus, Liang directly supports the road-and-intersection organization of the local point-cloud subsets.
determining the position of the vehicle within the vector mapSee at least: “we fuse LIDAR point clouds and large scale vector maps to reconstruct the 3-D road models” (page 1) and “the roads can be represented as a parametric model” / “the 3-D road models by flat quadrilaterals ... are presented as ribbons” (pages 1 and 4–7).Rationale: Liang expressly discloses a fused representation in which LiDAR point-cloud information is organized with vector-map geometry. Thus, once the local/submap point-cloud match is successful, the point-cloud result can be used within the vector-map representation to determine position.
and a mapping relationship between the point cloud basemap and the vector map,See at least: “we fuse LIDAR point clouds and large scale vector maps to reconstruct the 3-D road models” (page 1) and “the large scale vector maps used in this investigation record the boundaries of street blocks ... Then we can compute the heights of the road surface from LIDAR point clouds along those centerlines.” (pages 1–2).Rationale: Liang expressly discloses a fused relationship between LiDAR point clouds and vector-map geometry, where the vector-map structure guides and organizes the point-cloud-derived road model. Thus, Liang teaches the claimed mapping relationship between the point cloud basemap and the vector map.
and determining the position of the vehicle within the vector mapSee at least: “the centerlines are connected to form road networks” and “the complete set of the planimetric road networks” (pages 2–6).Rationale: Liang expressly discloses a coherent vectorized road-network representation tied to fused point-cloud content. In view of the already established Luo/Lin localization workflow, one of ordinary skill in the art would have used the full-map match result within that point-cloud/vector-map relationship to determine position in the vector map.
according to the result of the matching of the plurality of feature points with all of the point cloud data.See at least: Lin’s whole-map matching disclosure, “match the visual point cloud ... with the whole prior visual point cloud map” (page 5), together with Liang’s fusion of LiDAR point clouds and vector maps (page 1).Rationale: Lin teaches the full-map matching fallback, and Liang teaches the point-cloud/vector-map relationship needed to interpret the result within the vector map. Accordingly, the combined teachings render obvious determining position in the vector map according to the result of matching with all point-cloud data.

Motivation to Combine Luo, Flint, Zhao, Farshidi, Lin, and Liang
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, Zhao, Farshidi, Lin, and Liang before them, to modify the Luo/Flint/Zhao/Farshidi combination so that the matching operation is first performed against a local part of the point-cloud basemap corresponding to the current vehicle position and containing road/intersection subsets within a predetermined range, and so that, if that local matching succeeds, the vehicle position is determined in the vector map using the matching result and the point-cloud/vector-map relationship, but if the local matching does not succeed, the matching is broadened to the whole point-cloud basemap, because Luo already teaches the underlying feature-to-map matching and localization framework, Flint and Farshidi remain part of the parent Claim 6 combination, Zhao teaches road-structure subset organization, Lin teaches current-position-based candidate-submap segmentation and matching with the candidate sub-map rather than the whole map, and Liang teaches the fused relationship between LiDAR point clouds and vector-map road geometry for roads and intersections. One of ordinary skill in the art would have made such modification to improve efficiency by attempting localized matching first while preserving robustness through whole-map fallback if localized matching is unsuccessful, with the predictable benefit of faster matching, more stable localization, and meaningful position determination within the fused point-cloud/vector-map representation. Such a modification would merely combine known local-submap matching, known whole-map matching, and known point-cloud/vector-map fusion techniques in a predictable way and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, Zhao, Farshidi, Lin, and Liang, Claim 7 is rendered obvious by the combination of Luo in view of Flint, Zhao, Farshidi, Lin, and Liang. In particular, Luo teaches the underlying feature-to-point-cloud matching and position-determination framework, Flint and Farshidi remain part of the parent Claim 6 combination, Zhao teaches road-structure subset context, Lin teaches current-position candidate-submap matching within a predetermined range and supports the full-map fallback, and Liang teaches the road/intersection organization and the mapping relationship between point-cloud data and vector-map geometry. The remaining aspects concerning the explicit failure-triggered fallback from local-part matching to all point-cloud data would have been obvious to one of ordinary skill in the art in view of Lin’s express contrast between candidate-submap matching and whole-map matching and the predictable need to broaden the search when localized matching is unsuccessful.

Regarding Claim 15,
The combination of Luo, Flint, Zhao, and Farshidi establishes the device of Claim 14, which is the basis for Claim 15.

Luo renders obvious:
whereinSee at least: “In another embodiment, before computing matching score, the method further comprises: constructing at least one 3D submap; and constructing a global map.” ([0012]).Rationale: Luo expressly discloses an ordered matching workflow operating on a submap and a global map. Thus, the “wherein” clause properly introduces the more specific matching details now recited.
performing a matching of the plurality of feature pointsSee at least: “computing, in response to features from a 3D submap and features from a global map, matching score...” ([0010]) and “the features extracted from the 3D submap are matched against the features extracted from the global map...” ([0071]).Rationale: Luo expressly discloses matching camera-derived submap features against global-map features. In the parent-claim combination, those camera-derived features correspond to the claimed plurality of feature points.
with point cloud data in the point cloud basemapSee at least: “constructing a city-scale 3D map based on the data from the LiDAR, using LiDAR mapping.” ([0014]) and “the unstructured features include sparse 3D points.” ([0017]).Rationale: Luo expressly discloses a LiDAR-based global map and sparse 3D point features. Accordingly, Luo teaches matching against point-cloud-type data in the basemap.
to determine the position of the vehicle within the vector mapSee at least: “Subsequently, in operation 17, location of the 3D submap is iteratively estimated until a distance between corresponding features is minimized.” ([0072]).Rationale: Luo expressly discloses determining the submap location from the matching result. In the already established parent-claim combination, the point-cloud basemap and vector-map framework is present; therefore, Luo’s location estimation corresponds to determining vehicle position within that mapped framework.
according to the result of the matchingSee at least: “computing... matching score...” ([0010]) and “the features extracted from the 3D submap are matched against the features extracted from the global map...” ([0071]).Rationale: Luo expressly teaches that location estimation follows from the feature-matching process. Thus, Luo teaches determining position according to the result of the matching.
comprises:See at least: “before computing matching score, the method further comprises...” ([0011]) and “In another embodiment, before computing matching score, the method further comprises...” ([0012]).Rationale: Luo expressly discloses an open-ended matching workflow with constituent sub-steps. Thus, the transitional phrase is supported.

Luo teaches the core matching-and-position-determination framework of Claim 15. Luo does not expressly disclose the more specific local-part restriction to current-position road/intersection subsets within a predetermined range, nor the explicit fallback from local-part matching to all point-cloud data. Those limitations remain outstanding after Luo.

Zhao discloses:
the part of the point cloud data includes point cloud data subsetsSee at least: “the sparse extraction results can be enhanced which generate the local grid map (LGM) of road boundaries” and “we first propose a local vectorization method...” (page 6).Rationale: Zhao expressly discloses local road-boundary map portions generated from fused local data. One of ordinary skill in the art would have understood these local fused road-structure portions to constitute subsets of the overall point-cloud/map data.
for the roads and intersectionsSee at least: “the global structure of the map was not taken into consideration, which is crucial for the vectorization of complex intersections” and “A probe-based sampling method is then employed to sample the fused road boundaries.” (pages 8–9).Rationale: Zhao expressly discloses road boundaries and expressly emphasizes complex intersections in the map construction workflow. Thus, Zhao directly supports the roads-and-intersections content of the local subsets.

Zhao supplies the road/intersection subset context for the local-part matching recited in Claim 15. Zhao does not expressly disclose current-position-based range restriction or fallback to matching against all point-cloud data. Those limitations remain outstanding after Luo, Flint, and Zhao.

Lin discloses:
performing a matching of the plurality of feature pointsSee at least: “the current visual point cloud frame is matched with the candidate sub-map by NDT.” (page 1 / introduction summary) and “With reliable initial pose estimation, current visual point cloud frames obtained from a stereo camera is matched with the candidate sub-map...” (page 7).Rationale: Lin expressly discloses matching current camera-derived frame data against a candidate sub-map. In view of the parent-claim combination, this corresponds to matching the claimed plurality of feature points.
with a part of the point cloud data in the point cloud basemapSee at least: “The main idea was to segment the sub-map from a large point cloud...” (page 5) and “the current visual point cloud frame is matched with the candidate sub-map...” (page 7).Rationale: Lin expressly discloses segmenting a candidate sub-map from a large prior point cloud and matching against that candidate sub-map. Thus, Lin teaches matching against a part of the point-cloud basemap rather than the whole map.
corresponding to a current position of the vehicle,See at least: “where Lvechicle is the position of the vehicle...” and “Bcenter is the new center point coordinates of the sub-map.” (page 5).Rationale: Lin expressly discloses that the sub-map is centered and updated based on vehicle position. Thus, Lin teaches that the matched sub-map corresponds to the current position of the vehicle.
whereinSee at least: “The principle of segmentation is as follows...” followed by the position-based sub-map update rule (page 5).Rationale: Lin expressly discloses a conditional segmentation rule tied to vehicle position. Thus, the clause properly introduces the more specific position/range limitation.
within a predetermined rangeSee at least: “τ is the segmentation threshold, we set it to be 80 m in this paper.” (page 5) and “In this paper, the threshold is set to 80 m.” (page 16).Rationale: Lin expressly discloses a predetermined threshold distance used to define the local sub-map extent. One of ordinary skill in the art would have understood this threshold-based sub-map extent to correspond to a predetermined range around the current position.
of the current position of the vehicle,See at least: “where Lvechicle is the position of the vehicle, Bedge is edge of the sub-map, τ is the segmentation threshold...” (page 5).Rationale: Lin expressly ties the sub-map definition to the current vehicle position. Thus, the predetermined range is expressly anchored to the current position of the vehicle.
whereinSee at least: “If |Lvechicle − Bedge| > τ, continue else, Bcenter = Lvechicle” (page 5).Rationale: Lin expressly discloses a conditional decision rule for the local sub-map. Thus, the clause properly introduces the succeeding/failing local-matching logic in Claim 15.
when the matching of the plurality of feature points with the part of the point cloud data succeeds,See at least: “If the transformation parameters can make the two scans match well, the probability density of the transformed points in the candidate sub-map will be large.” (page 7).Rationale: Lin expressly discloses the condition in which matching with the candidate sub-map succeeds, i.e., when the scans match well under the NDT metric.
determining the position of the vehicle within the vector mapSee at least: “Finally, the matching result was used to update pose prediction based on the last frame for accurate localization.” (page 1 / introduction summary) and “the current visual point cloud frame is matched with the candidate sub-map ... to update pose prediction based on the last frame for accurate localization.” (page 7).Rationale: Lin expressly discloses using the successful candidate-submap matching result to update pose prediction for accurate localization. Thus, Lin teaches determining the vehicle position based on successful local-part matching.
according to the result of the matchingSee at least: “Then, the current visual point cloud frame is matched with the candidate sub-map by NDT. Finally, the matching result was used to update pose prediction...” (page 1 / page 7).Rationale: Lin expressly teaches that pose refinement follows from the matching result. Thus, this limitation is expressly taught.
and when the matching of the plurality of feature points with the part of the point cloud data fails,See at least: “It is time-consuming to match the visual point cloud generated by a stereo camera with the whole prior visual point cloud map.” (page 5) and “the current visual point cloud frame is matched with the candidate sub-map...” (page 7).Rationale: Lin expressly contrasts whole-map matching with candidate-submap matching. Although Lin does not expressly recite a failure branch in those exact words, one of ordinary skill in the art would have understood that if matching against the candidate sub-map does not succeed, a predictable fallback is to broaden the search from the local part to the whole prior point-cloud map, especially because Lin expressly identifies whole-map matching as the broader baseline from which sub-map matching is an efficiency improvement.
performing a matching of the plurality of feature pointsSee at least: “It is time-consuming to match the visual point cloud generated by a stereo camera with the whole prior visual point cloud map.” (page 5).Rationale: Lin expressly teaches matching against the whole prior visual point-cloud map as the broader matching operation. Thus, Lin supports the fallback full-map matching concept.
with all the point cloud data in the point cloud basemap,See at least: “match the visual point cloud generated by a stereo camera with the whole prior visual point cloud map.” (page 5).Rationale: Lin expressly discloses whole-map matching. One of ordinary skill in the art would have understood the whole prior point-cloud map to correspond to all the point-cloud data in the basemap.

Lin teaches the current-position candidate-submap matching, the predetermined-range restriction, successful local matching followed by localization update, and a supportable PHOSITA-obvious fallback from local-part matching to whole-map matching if local matching does not succeed.

Liang discloses:
and a mapping relationship between the point cloud basemap and the vector map,See at least: “we fuse LIDAR point clouds and large scale vector maps to reconstruct the 3-D road models” (page 1) and “The roadsides can be paired. The centerlines can be determined...” / “a node is generated by the intersection of two centerlines for the crossroads...” (pages 2–4).Rationale: Liang expressly discloses a fused relationship between LiDAR point clouds and vector-map road geometry, including road and intersection structure. Thus, Liang teaches the claimed mapping relationship between the point-cloud basemap and the vector map.
determining the position of the vehicle within the vector mapSee at least: “the complete set of the planimetric road networks” and “Finally, the road models are reconstructed by combining with two flat quadrilaterals from the paired roadsides and presented as ribbons.” (pages 4–6).Rationale: Liang expressly discloses a coherent vectorized road-network representation tied to fused point-cloud content. In view of the already established Luo/Lin localization workflow, one of ordinary skill in the art would have used the match result within that point-cloud/vector-map relationship to determine position in the vector map.
and determining the position of the vehicle within the vector mapSee at least: “we fuse LIDAR point clouds and large scale vector maps to reconstruct the 3-D road models” (page 1).Rationale: Liang expressly teaches that the vector-map model is organized from the point-cloud/vector fusion itself. Thus, the matched point-cloud result can be interpreted in the vector-map domain through the disclosed fusion relationship.
according to the result of the matching of the plurality of feature points with all of the point cloud data.See at least: Lin’s whole-map matching disclosure, “match the visual point cloud generated by a stereo camera with the whole prior visual point cloud map” (page 5), together with Liang’s point-cloud/vector-map fusion, “we fuse LIDAR point clouds and large scale vector maps...” (page 1).Rationale: Lin teaches the full-map matching fallback, and Liang teaches the point-cloud/vector-map relationship needed to interpret the result within the vector map. Accordingly, the combined teachings render obvious determining position in the vector map according to the result of matching with all point-cloud data.



Motivation to Combine Luo, Flint, Zhao, Farshidi, Lin, and Liang
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, Zhao, Farshidi, Lin, and Liang before them, to modify the Luo/Flint/Zhao/Farshidi combination used for Claims 9, 11, and 14 so that the matching operation in the device is first performed against a local part of the point-cloud basemap corresponding to the current vehicle position and containing road/intersection subsets within a predetermined range, and so that, if that local matching succeeds, the device determines the vehicle position in the vector map using the matching result and the point-cloud/vector-map relationship, but if the local matching does not succeed, the device broadens the matching to the whole point-cloud basemap, because Luo already teaches the underlying feature-to-map matching and localization framework, Flint and Farshidi remain part of the parent Claim 14 combination, Zhao teaches road-structure subset organization, Lin teaches current-position-based candidate-submap segmentation and matching with the candidate sub-map rather than the whole map, and Liang teaches the fused relationship between LiDAR point clouds and vector-map road geometry for roads and intersections. One of ordinary skill in the art would have made such modification to improve efficiency by attempting localized matching first while preserving robustness through whole-map fallback if localized matching is unsuccessful, with the predictable benefit of faster matching, more stable localization, and meaningful position determination within the fused point-cloud/vector-map representation. Such a modification would merely combine known local-submap matching, known whole-map matching, and known point-cloud/vector-map fusion techniques in a predictable way and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, Zhao, Farshidi, Lin, and Liang, Claim 15 is rendered obvious by the combination of Luo in view of Flint, Zhao, Farshidi, Lin, and Liang. In particular, Luo teaches the underlying feature-to-point-cloud matching and position-determination framework in the device context, Flint and Farshidi remain part of the parent Claim 14 combination, Zhao teaches road/intersection subset context, Lin teaches current-position candidate-submap matching within a predetermined range and supports the full-map fallback, and Liang teaches the mapping relationship between point-cloud data and vector-map geometry. The remaining aspects concerning the explicit failure-triggered fallback from local-part matching to all point-cloud data would have been obvious to one of ordinary skill in the art in view of Lin’s express contrast between candidate-submap matching and whole-map matching and the predictable need to broaden the search when localized matching is unsuccessful.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Luo, in view of Flint, and in view of Huang (Vision-based Semantic Mapping and Localization for Autonomous Indoor Parking).

Regarding Claim 8,
The combination of Luo and Flint establishes the method of Claim 1, which is the basis for Claim 8.

After Luo and Flint, the following limitations are not explicitly disclosed: the scenario includes an indoor parking lot,; the method further comprises; presenting the vector map; and the position of the vehicle within the vector map; and to a driver of the vehicle.

Huang teaches:
the scenario includes an indoor parking lot,
See at least: “In this paper, we proposed a novel and practical solution for the real-time indoor localization of autonomous driving in parking lots.” (page 1) and “However, parking in a large indoor parking lot without human intervention is still an unsolved problem.” (page 1).Rationale: Huang expressly discloses that the localization scenario is an indoor parking lot. Thus, Huang directly teaches this limitation.
the method further comprisesSee at least: the pipeline on page 3 showing an “online part” that outputs “Stable Parking Map” and “Vehicle Position,” together with the accompanying description that the method performs online localization in the parking lot. (page 3).Rationale: Huang expressly discloses that its online localization workflow produces additional outputs beyond localization computation itself. Thus, Huang teaches that the method further comprises additional output operations.
presenting the vector mapSee at least: “Finally, a two-dimensional map of parking slots can be robustly established which is distinguished from the traditional feature-based or point-cloud map for its stability, re-usability, lightweight and human readable.” (page 1) and the pipeline on page 3 showing “Stable Parking Map.”Rationale: Huang expressly discloses a human-readable two-dimensional parking map and, in the page-3 pipeline, a stable parking map as an online system output. One of ordinary skill in the art would have understood this human-readable stable parking map to be a presented map representation corresponding to the claimed vector map.
and the position of the vehicle within the vector mapSee at least: the page-3 pipeline showing “Stable Parking Map” and “Vehicle Position” as online outputs, and “Then the vehicle drives automatically ... according to the real-time localization.” (pages 5–6).Rationale: Huang expressly discloses vehicle position as an online system output and expressly discloses real-time localization in the indoor parking lot. In view of Huang’s human-readable map output, one of ordinary skill in the art would have understood the vehicle position to be shown within that map representation.
to a driver of the vehicle.See at least: “Finally, a two-dimensional map of parking slots can be robustly established ... human readable.” (page 1), the page-3 pipeline showing “Stable Parking Map” and “Vehicle Position,” and “During the online experiment, the vehicle is first operated by a human driver to initialize the parking map.” (pages 5–6).Rationale: Huang does not expressly recite, in those exact words, that the map and vehicle position are presented “to a driver of the vehicle.” However, Huang expressly teaches a human-readable parking map, expressly outputs stable parking map and vehicle position in the online pipeline, and expressly operates in a vehicle context involving a human driver. One of ordinary skill in the art would have understood that a human-readable parking map and vehicle-position output in an autonomous indoor-parking system would predictably be presented to the driver through a vehicle display or HMI so the driver can monitor localization and parking status. This is a straightforward and predictable use of Huang’s disclosed outputs.

Motivation to Combine Luo, Flint and Huang
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint and Huang before them, to adapt Luo’s autonomous-vehicle localization method so that, in an indoor parking lot scenario, the method outputs a human-readable map together with the vehicle’s current position and presents that information to the driver, because Luo already teaches the baseline vehicle-localization method using camera- and LiDAR-derived map information, while Huang teaches autonomous indoor-parking localization using a human-readable parking map together with vehicle-position output. One of ordinary skill in the art would have made this modification to improve driver awareness, usability, and monitoring of vehicle localization during indoor parking, with the predictable benefit of giving the driver a clear view of the vehicle’s location within the mapped parking environment. Such a modification would merely apply Huang’s known indoor-parking map/position output technique to Luo’s known localization framework and would not alter the principle of operation of either reference or render either reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint and Huang, in view of the already established Claim 1 combination including Flint, Claim 8 is rendered obvious by the combination of Luo in view Flint, and in view of of Huang. In particular, Luo teaches the underlying vehicle-localization framework, Flint remains part of the parent-claim combination for Claim 1, and Huang teaches the indoor parking lot scenario together with the online output of a human-readable parking map and vehicle position. The remaining aspect concerning presentation of that information to a driver of the vehicle would have been obvious to one of ordinary skill in the art in view of Huang’s express teaching that the map is human readable and produced as part of the online localization workflow.

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Luo, in view of Flint, in view of Zhao, in view of Farshidi, in view of Lin, in view of Liang, and in view of Huang.

Regarding Claim 16,
The combination of Luo, Flint, Zhao, Farshidi, Lin, and Liang establishes the device of Claim 15, which is the basis for Claim 16.

After the combination of Luo, Flint, Zhao, Farshidi, Lin, and Liang, the following limitations are not explicitly disclosed: the scenario includes an indoor parking lot,; the method further comprises; presenting the vector map; and the position of the vehicle within the vector map; and to a driver of the vehicle.



Huang discloses:
the scenario includes an indoor parking lot,
See at least: “we proposed a novel and practical solution for the real-time indoor localization of autonomous driving in parking lots” (page 1) and “However, parking in a large indoor parking lot without human intervention is still an unsolved problem.” (page 1).Rationale: Huang expressly discloses that the localization scenario is an indoor parking lot. Thus, Huang directly teaches this limitation.
the method further comprises
See at least: “Our system is implemented on an autonomous driving vehicle and tested in real parking lots.” (page 1) and “Finally, a two-dimensional map of parking slots can be robustly established...” (page 1).
Rationale: Huang expressly discloses an operational localization workflow that produces additional outputs beyond the underlying localization computation. Thus, Huang teaches that the method further comprises additional output operations.
presenting the vector map
See at least: “Finally, a two-dimensional map of parking slots can be robustly established which is distinguished from the traditional feature-based or point-cloud map for its stability, re-usability, lightweight and human readable.” (page 1).
Rationale: Huang expressly discloses a human-readable two-dimensional parking map. One of ordinary skill in the art would have understood this human-readable parking map to be a presented map representation corresponding to the claimed vector map.
and the position of the vehicle within the vector map
See at least: “we proposed a novel and practical solution for the real-time indoor localization of autonomous driving in parking lots” (page 1) and “Our system is implemented on an autonomous driving vehicle and tested in real parking lots.” (page 1).
Rationale: Huang expressly discloses real-time indoor localization for autonomous driving in parking lots and a human-readable parking map. In view of these teachings, one of ordinary skill in the art would have understood the vehicle position to be shown within that map representation so the localization result is meaningful to the user.
to a driver of the vehicle.
See at least: “Finally, a two-dimensional map of parking slots can be robustly established ... human readable.” (page 1) and “Our system is implemented on an autonomous driving vehicle and tested in real parking lots.” (page 1).
Rationale: Huang expressly teaches a human-readable parking map and an autonomous-driving vehicle context. One of ordinary skill in the art would have understood that a human-readable parking map and localization result in a vehicle-based indoor-parking system would predictably be presented to the driver through a vehicle display or HMI so the driver can monitor localization and parking status. This is a straightforward and predictable use of Huang’s disclosed outputs.

Motivation to Combine Luo, Flint, Zhao, Farshidi, Lin, Liang, and Huang
Therefore, given the teachings as a whole, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, having Luo, Flint, Zhao, Farshidi, Lin, Liang, and Huang before them, to modify the Luo/Flint/Zhao/Farshidi/Lin/Liang combination used for Claim 15 so that, in an indoor parking lot scenario, the device further outputs a human-readable map together with the vehicle’s current position and presents that information to the driver, because Luo teaches the underlying localization device framework, Flint, Zhao, Farshidi, Lin, and Liang remain part of the parent-claim combination for Claim 15, and Huang teaches real-time indoor-parking localization using a human-readable parking map in an autonomous-driving vehicle context. One of ordinary skill in the art would have made this modification to improve driver awareness, usability, and monitoring of vehicle localization during indoor parking, with the predictable benefit of giving the driver a clear view of the vehicle’s location within the mapped parking environment. Such a modification would merely apply Huang’s known indoor-parking map/presentation output technique to the already-established Claim 15 localization framework and would not alter the principle of operation of any reference or render any reference unsatisfactory for its intended purpose. Therefore, after combining the teachings of Luo, Flint, Zhao, Farshidi, Lin, Liang, and Huang, Claim 16 is rendered obvious by the combination of Luo in view of Flint, Zhao, Farshidi, Lin, Liang, and Huang. In particular, Luo teaches the underlying device-based localization framework, Flint, Zhao, Farshidi, Lin, and Liang remain part of the parent-claim combination for Claim 15, and Huang teaches the indoor parking lot scenario together with a human-readable parking map in an autonomous-driving vehicle context. The remaining aspect concerning presentation of the vector map and vehicle position to a driver of the vehicle would have been obvious to one of ordinary skill in the art in view of Huang’s express teaching that the map is human readable and used in a vehicle-based indoor-parking localization system.

Response to Arguments
101 Rejection – WITHDRAWN
Applicant’s claims 1–20 are directed to patent-eligible subject matter under 35 U.S.C. 101.
The claims recite statutory subject matter, namely a process, a machine, and a non-transitory computer-readable storage medium. Claims 1, 9, and 17, and their dependent claims, are directed to a specific technological solution for vehicle positioning using a previously constructed fused map, camera-captured image frames, extracted feature points, point-cloud matching, and inertial-measurement-based position updating. As a whole, the claims are directed to a concrete vehicle-localization technique implemented using particular map data structures and vehicle-mounted sensing components.
The claims do not recite an abstract idea. They are not directed to a method of organizing human activity. They are also not directed to a mental process because the claimed operations, including obtaining and storing a fused map from a cloud network, capturing image frames through a camera unit, extracting feature points, matching the feature points with point cloud data, and updating vehicle position using an inertial measurement unit, cannot practically be performed in the human mind.
Further, even if certain sub-operations such as “matching,” “determining,” or “calculating confidences” were viewed in isolation as involving analysis, the claims as a whole integrate any such operation into a practical application. The claimed subject matter uses specific technological components and data structures to improve vehicle positioning in a real-world driving scenario, including roads, intersections, and indoor parking environments. The claims therefore recite significantly more than merely collecting, analyzing, and displaying information.
Accordingly, claims 1–20 are not directed to a judicial exception and are eligible under 35 U.S.C. 101.
The rejection under 35 U.S.C. 101 is hereby withdrawn.

103 Rejection - MAINTAINED
The Applicant’s arguments filed on 01/20/2026 have been fully considered but are not persuasive. The 35 U.S.C. 103 rejections of Claims 1–20 are maintained. A detailed response to the Applicant’s remarks is provided below.
I. Response to Arguments Regarding Independent Claims 1, 9, and 17
The Applicant argues that Luo fails to disclose or suggest a vehicle obtaining a "previously-constructed fused map" from a "cloud network" "before the vehicle enters the scenario." This argument is not persuasive for several reasons.
1. The "Cloud Network" and "Internet Server" under BRI: Under BRI, a PHOSITA would understand that Luo’s disclosure of an "internet server" (62) configured to communicate with "client devices" (the vehicle) via a "wireless network" (68) is the technical equivalent of a "cloud network" as recited in the claims. A networked server providing high-fidelity data to a mobile client through a wireless interface constitutes a cloud-based distribution system under current technological definitions.
2. Temporal Requirement ("Before Entering") and Predictable Results: The Applicant emphasizes the "before entering" limitation. However, Luo explicitly teaches that the "global map" and "submaps" are constructed and collected in an "operation 13" that occurs prior to the "matching" and "position estimation" phases.
The Examiner is entitled to rely on inferences which one skilled in the art would reasonably be expected to draw from the prior art (MPEP 2144.01). A PHOSITA seeking to implement Luo’s centimeter-level localization would find it a routine design choice to download the pre-constructed map data from the server and store it in the vehicle’s local memory (69) before entering the navigation scenario. This ensures the localization system functions in real-time without the latency or connectivity risks associated with streaming high-density 3D map data over a wireless network during active driving. Such a modification represents the "predictable use of known server-client data distribution and local caching techniques" consistent with KSR v. Teleflex.
3. The "Fused Map" Architecture and Coextensiveness:
The Applicant contends that Luo does not obtain a "fused map" consisting of a point cloud and a vector map, and that the submaps and global maps are not "coextensive."
Regarding the architecture, Luo discloses "structured features" (planes, lines) and "unstructured features" (sparse 3D points) within the same global map framework. In the context of autonomous driving, structured geometric primitives are recognized as the core components of vector maps, while sparse 3D points constitute a point cloud basemap. Jointly using these layers for a common localization goal constitutes a "fused map".
Furthermore, the Applicant’s argument regarding "coextensiveness" is misplaced. The claims do not require the submap and global map to be coextensive; rather, the global map serves as the pre-constructed fused reference for the entire scenario, against which real-time, localized submap features are matched. The fact that one is a global reference and the other is a local sensor input is a functional requirement of the matching process itself, not a deficiency in the prior art's disclosure of the mapping framework.
II. Response to Arguments Regarding Claim 2, 10, and 18 (Frequency Relationship)
The Applicant asserts that the references do not remedy the alleged deficiencies of Claim 1. However, the specific operational frequencies of Claim 2 are explicitly supported. Luo discloses that camera images are captured at ~30 Hz and LiDAR scans at ~20 Hz, while vehicle poses are collected by the inertial navigation module at ~50 Hz. Because the matching-based position determination is bounded by the lower frequency of the image/LiDAR data (20/30Hz), it is technically certain that this "first frequency" is lower than the IMU-driven "second frequency" (50 Hz). Flint reinforces this by teaching that inertial data is used for "dead reckoning" to update position between visual keyframes.
III. Response to Arguments Regarding Dependent Claims 3–8, 11–16, and 19–20
As the Applicant’s arguments for the dependent claims relied on the same grounds as the independent claims, those rejections are also maintained:
Data Management (Claims 3–5): Zhao and Liang provide the missing road/intersection semantic context. Zhao’s use of the Ramer-Douglas-Peucker algorithm to simplify dense data by 94% provides the "dense-to-sparse" conversion. Zhou provides the dual-threshold logic used to manage point cloud reduction.
Multi-Camera (Claims 6 & 14): Farshidi discloses multi-camera arrangements where "confidences" are calculated to select the most informative view. Huang teaches the specific surround-view arrangement (4 fisheye cameras) on a vehicle.
Local Matching/Fallback (Claims 7 & 15): Lin teaches "adaptive prior map segmentation" with a specific 80m range threshold to match feature points against a "local part" for efficiency. The fallback to a global search upon failure is a predictable recovery mechanism in the art.
Indoor Parking (Claims 8 & 16): Huang explicitly discloses the "indoor parking lot" scenario and the generation of a "human-readable" semantic map for navigation.
The rejection is maintained.








Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWABUSAYO ADEBANJO AWORUNSE whose telephone number is (571)272-4311. The examiner can normally be reached M - F (8:30AM - 5PM).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jelani Smith can be reached at (571) 270-3969. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWABUSAYO ADEBANJO AWORUNSE/Examiner, Art Unit 3662                                                                                                                                                                                                        
/JELANI A SMITH/Supervisory Patent Examiner, Art Unit 3662
Read full office action
Prosecution Timeline

Oct 31, 2023
Application Filed
Jun 16, 2025
Non-Final Rejection mailed — §101, §103
Aug 27, 2025
Response Filed
Nov 07, 2025
Final Rejection mailed — §101, §103
Dec 12, 2025
Response after Non-Final Action
Jan 20, 2026
Request for Continued Examination
Feb 17, 2026
Response after Non-Final Action
Apr 01, 2026
Non-Final Rejection mailed — §101, §103 (current)
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
Grant Probability
With Interview (+0.0%)
3y 1m (~6m remaining)
Median Time to Grant
High
PTA Risk
Based on 4 resolved cases by this examiner. Grant probability derived from career allowance rate.