Last updated: April 19, 2026
Application No. 18/509,045
3D LANE AND ROAD BOUNDARY ESTIMATION VIA ROW-WISE CLASSIFICATION

Non-Final OA §103
Filed
Nov 14, 2023
Examiner
PICON-FELICIANO, RUBEN
Art Unit
3747
Tech Center
3700 — Mechanical Engineering & Manufacturing
Assignee
Qualcomm Incorporated
OA Round
3 (Non-Final)
Interview Optional

— +13.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 708 resolved cases, 2023–2026
Examiner Intelligence

PICON-FELICIANO, RUBEN View full profile →
Grants 68% — above average
Career Allow Rate
483 granted / 708 resolved
-1.8% vs TC avg
Moderate +13% lift
Without
With
+13.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
61 currently pending
Career history
769
Total Applications
across all art units
Statute-Specific Performance

§101
1.0%
-39.0% vs TC avg
§103
46.3%
+6.3% vs TC avg
§102
37.2%
-2.8% vs TC avg
§112
13.0%
-27.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 708 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
       A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on November 24, 2025 has been entered.
 
Response to Arguments
       Applicant’s amendments/arguments filed November 24, 2025, with respect to the rejections of claims 1-30 have been fully considered and are persuasive.  Therefore, the rejections have been withdrawn.  However, upon further consideration, a new ground of rejection is made as explained below in section 35 USC 103 below in view of Yoo and Paek.

Disposition of Claims
      Claims 1-30 are pending in this application.
      Claims 1-30 are rejected.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1.	Determining the scope and contents of the prior art.
2.	Ascertaining the differences between the prior art and the claims at issue.
3.	Resolving the level of ordinary skill in the pertinent art.
4.	Considering objective evidence present in the application indicating obviousness or non-obviousness.

Claims 1-30 are rejected under 35 U.S.C. 103 as being unpatentable over (Urtasun – US 2020/0160559 A1), in view of (KIM – US 2023/0071437 A1), further in view of (Yoo – "Yoo_End-to-End_Lane_Marker_Detection_via_Row-Wise_Classification_CVPRW_2020_paper"), further in view of (Paek – "Row-wise_LiDAR_Lane_Detection_Network_with_Lane_Correlation_Refinement").

Regarding claim 1, Urtasun discloses:
A method for image processing (Object detection architecture and ROI feature fusion process including neural networks for autonomous vehicle perception and control: Figs. 2-6 and 8), comprising:
 extracting point features from light detection and ranging (LiDAR) data ([0023]: “The machine-learned LIDAR backbone model can be configured to {{{receive a bird's eye view (BEV) representation of the LIDAR point cloud for the environment surrounding the autonomous vehicle. The machine-learned LIDAR backbone model can be configured to process the BEV representation of the LIDAR point cloud to generate a LIDAR feature map}}}. The machine-learned image backbone model can be configured to receive the image(s) and to process the image(s) to generate an image feature map. The machine-learned refinement model can be configured to receive respective region of interest (ROI) feature crops from each of the LIDAR feature map and the image feature map, to perform ROI-wise fusion to fuse respective pairs of ROI feature crops to generate fused ROI feature crops, and to generate one or more three-dimensional object detections based on the fused ROI feature crops. Each of the one or more three-dimensional object detections can indicate a location of a detected object within the environment surrounding the autonomous vehicle. For example, the object detection(s) can be provided in the form of a three-dimensional bounding shape (e.g., bounding box). Thus, example implementations of the present disclosure can perform multi-sensor fusion at the ROI level”);
partitioning the point features (This step in illustrated/shown/disclosed in Figs. 2-3 and Paragraphs [0023, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0101-0103, 0111, 0138, 0150]: “In particular, a multi-task loss can be employed to train the multi-sensor detector end-to-end. In some implementations, the full model ensemble outputs object classification, 3D box estimation, 2D and 3D box refinement, ground estimation and dense depth. During training, detection labels and dense depth labels may be available, while ground estimation can be optimized implicitly by the 3D localization loss. There are two paths of gradient transmission for ground estimation. One is from the 3D box output where ground height is added back to predicted Z term. The other goes through the LIDAR backbone model 202 to the LIDAR voxelization layer where ground height is subtracted from the Z coordinate of each LIDAR point” {{{and}}} “At 716, method 700 can include detecting three-dimensional objects of interest based on the fused ROI crops generated at 714. In some implementations, detecting objects of interest at 716 can include providing the feature map generated at 714 as input to a machine-learned refinement model. In response to receiving the feature map, the machine-learned refinement model can be trained to generate as output a plurality of detections corresponding to identified objects of interest within the feature map. In some implementations, detecting objects of interest at 716 can include determining a plurality of object classifications and/or bounding shapes corresponding to the detected objects of interest. For example, in one implementation, the plurality of objects detected at 716 can include a plurality of bounding shapes at locations within the feature map(s) having a confidence score associated with an object likelihood that is above a threshold value. In some implementations, detecting objects of interest at 716 can include determining one or more of a classification indicative of a likelihood that each of the one or more objects of interest comprises a class of object from a predefined group of object classes (e.g., vehicle, bicycle, pedestrian, etc.) and a bounding shape representative of a size, a location, and an orientation of each the one or more objects of interest”);
performing BEV-feature pooling based on the partitioned point features (Figs. 2-4 and [0096, 0099]: “FIG. 4 depicts precise rotated ROI feature extraction that takes orientation cycle into account. In particular, FIG. 4 illustrates, at (1), the rotational periodicity causes reverse of order in feature extraction and at (2), an ROI refine module with two orientation anchors. In some implementations, an ROI can be assigned to 0 or 90 degrees. They share most refining layers except for the output. At (3), FIG. 4 depicts the regression target of relative offsets are re-parametrized with respect to the object orientation axes and at (4) a n×n sized feature is extracted using bilinear interpolation (an example is shown with n=2)” and “For oriented BEV ROI feature extraction, however, two new issues (as shown in FIG. 4) are observed and resolved. First, the periodicity of the ROI orientation causes the reverse of feature order around the cycle boundary. To solve this issue, the present disclosure proposes an oriented ROI feature extraction module with anchors. Given an oriented ROI, it can first be assigned to one of the two orientation anchors, 0 or 90 degrees. Each anchor has a consistent feature extraction order. The two anchors share the refinement net except for the output layer”); and
determining lane-boundary heads based on the BEV-feature pooling ([0054]: “In addition to the autonomy sensor data 116, the autonomy computing system 120 can retrieve or otherwise obtain data including the map data 122. The map data 122 can provide detailed information about the surrounding environment of the vehicle 102. For example, the map data 122 can provide information regarding: the identity and location of different roadways, road segments, buildings, or other items or objects (e.g., lampposts, crosswalks and/or curb); the {{{location and directions of traffic lanes}}} (e.g., the location and direction of a parking lane, a turning lane, a bicycle lane, or other lanes within a particular roadway or other travel way and/or one or more boundary markings associated therewith); traffic control data (e.g., the location and instructions of signage, traffic lights, or other traffic control devices); and/or any other map data that provides information that assists the vehicle computing system 112 in processing, analyzing, and perceiving its surrounding environment and its relationship thereto”).

But Urtasun does not explicitly and/or specifically meet the following limitations: 
wherein the determining comprises row-wise classification with at least one of: offset correction regression processing; or vertex-wise height regression processing.
a first classification for a first row of the plurality of rows, and a second classification for a second row of the plurality of rows, wherein each of the first classification and the second classification is performed with at least one of: offset correction regression processing; or vertex-wise height regression processing

However, regarding limitation (A) above, KIM (Figs. 1, 3-4 and 6) discloses/teaches the following:
An apparatus for single-stage three-dimensional (3D) multi-object detection by using a LiDAR sensor to detect 3D multiple objects, comprising: a data input module configured to receive raw point cloud data from the LiDAR sensor; a BEV image generating module configured to generate bird's eye view (BEV) images from the raw point cloud data; a learning module configured to perform a deep learning algorithm-based learning task to extract a fine-grained feature image from the BEV images; and a localization module configured to perform a regression operation and a localization operation to find 3D candidate boxes and classes corresponding to the 3D candidate boxes for detecting 3D objects from the fine-grained feature image (KIM Abstract).

The data input module 110 receives raw point cloud data from a LiDAR sensor (KIM [0036]).
The BEV image generation module 120 generates a BEV image from the raw point cloud data (KIM [0037]).
The learning module 130 performs deep learning algorithm-based learning to extract a fine-grained or subdivided feature image from the BEV image (KIM [0038]).
In at least one embodiment of the present disclosure, the learning module 130 performs Convolutional Neural Network (CNN)-based learning (KIM [0039]).
The localization module 140 performs a regression operation and a localization operation to find, in the subdivided feature image, 3D candidate boxes and their corresponding classes for detecting a 3D object (KIM [0040]).
The BEV image generation module 120 may generate a BEV image by projecting and discretizing the raw 3D point cloud data into 2D pseudo-images (KIM [0041]).
The BEV image generation module 120 may encode the raw 3D point cloud data to generate four feature map images based on a height feature map, a density feature map, an intensity feature map, and a distance feature map (KIM [0042]).
The learning process may employ a center regression, offset regression, orientation regression, Z-axis location regression, and size regression (KIM [0088]).

It is noted that KIM discloses a similar 3-DIMENSION MULTI-OBJECT DETECTING APPARATUS AND METHOD FOR AUTONOMOUS DRIVING like Urtasun above.

Still further, regarding the motivation for combining elements under 35 U.S.C. 103, the Examiner respectfully notes that the above combination is a 2-way linear superposition of “Prior Art Elements”, where one skilled in the art can incorporate the teachings of KIM into Urtasun, or the other way around, incorporate the teachings of Urtasun into KIM, where both ways would result in the claimed limitations.

Accordingly, one skilled in the art would have been motivated to incorporate the teachings of KIM into Urtasun to improve the accuracy of a 3D object detection task while maintaining a very fast inference speed and also minimize the imbalance training set and improve the stability of the model.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the image-processing method and identification system of Urtasun incorporating additional controller programming instructions as taught by KIM to improve the accuracy of a 3D object detection task while maintaining a very fast inference speed and also minimize the imbalance training set and improve the stability of the model.

Further on, regarding limitation (B) above, Yoo (Figure 2) discloses/teaches the following:
In the second stage, we successively squeeze the horizontal dimension of the shared representation using HRMs without changing the vertical dimension. With this squeeze
operation, we can obtain the row-wise representation in a more natural way. After running shared HRMs, we squeeze the remaining width of representation by lane marker-wise HRMs to make single vector representation for each row. We found that it is required to assign dedicated HRMs on each lane marker after the shared HRMs for increasing accuracy numbers, since each lane marker has different innate spatial and shape characteristics. For computational efficiency, however, only the first few HRMs are shared across lane markers, followed by lane marker-wise HRMs. With more shared layers we can save computational cost but each lane marker accuracy might be degraded. Specifically, in the skip connection, we add a horizontal average pooling layer with a 1×1 convolution to down-sample horizontal components. Although pooling operations let the deeper layers gather more spatial context (to improve classification) and reduce computational complexity, they still have the drawback of reducing the pixel precision. Therefore, to effectively keep and enhance the horizontal representation, inspired by the pixel shuffle layer of [32, 24], we propose to rearrange the elements of C × H × W input tensor to make a tensor of shape rC×H×W/r in the residual branch, which is somewhat a reverse operation of the original pixel shuffle block in [32] so called the horizontal pixel unshuffled layer. By rearranging the representation, we can efficiently move spatial information to channel.

    PNG
    media_image1.png
    660
    1456
    media_image1.png
    Greyscale

Yoo Figure 2 (a)-(b)

Still further, regarding limitation (B) above, Paek (Figures 1-2) discloses/teaches the following:
The feature extractor is responsible for encoding the point cloud raw data into an output feature map that is used by the detection head to make the final predictions. It is composed of two parts: the BEV encoder and the global feature correlator (GFC) backbone. Given a point cloud P = {p1,p2, ...,pn}, where pi ∈ R(3+C) is a point in the 3D space with C additional features such as intensity and reflectivity, the feature extractor first encode the raw point cloud data into a pseudo BEV image of size CBEV ×HBEV ×WBEV , where CBEV is the number of feature channels, HBEV is the number of rows, and WBEV is the number of columns.
After obtaining the pseudo-BEV image, the backbone then learn important features through global feature correlator. This results in the final output feature map of size Chead × HBEV ×WBEV. We utilize the same feature extractor as seen in previous state-of-the-art network [9] as our focus in this work is on the detection head with the two-stage row-wise formulation.

    PNG
    media_image2.png
    570
    1410
    media_image2.png
    Greyscale
The row-wise detection head uses the final output feature map as an input and produce two predictions: the row-wise lane existence and the row-wise lane location probability. To do so, we leverage the fact that lane lines from a LiDAR scan have almost no shape distortion along the BEV map rows, thus, it is suitable to utilize shared-MLPs. As shown in Fig. 2, the MLPs are shared along the rows of the feature map, that is, each row in the feature map is considered as an individual feature vector (colorized as purple in Fig. 2) to be processed by the same MLPs.
Paek Figure 1


    PNG
    media_image3.png
    778
    938
    media_image3.png
    Greyscale

Paek Figure 2

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the image-processing method and identification system of Urtasun in view of KIM further incorporating additional controller programming instructions as taught by Yoo and Paek to improve the accuracy of a 3D object detection task and improve the stability of the model.

Regarding claim 10, Urtasun discloses:
An apparatus (Object detection architecture and ROI feature fusion process including neural networks for autonomous vehicle perception and control: Figs. 2-6 and 8), comprising:
a memory storing processor-readable code (vehicle computing system 112 can include one or more tangible, non-transitory, computer readable media {e.g., memory devices}: Fig. 1 and [0051]); and
at least one processor (vehicle computing system 112 can include one or more processors: Fig. 1 and [0051]) coupled to the memory (non-transitory, computer readable media {e.g., memory devices}: Fig. 1 and [0051]), the at least one processor (vehicle computing system 112 can include one or more processors: Fig. 1 and [0051]) configured to execute the processor-readable code (non-transitory, computer readable media {e.g., memory devices}: Fig. 1 and [0051]) to cause the at least one processor (vehicle computing system 112 can include one or more processors: Fig. 1 and [0051]) to perform operations including:
extracting point features from light detection and ranging (LiDAR) data ([0023]: “The machine-learned LIDAR backbone model can be configured to {{{receive a bird's eye view (BEV) representation of the LIDAR point cloud for the environment surrounding the autonomous vehicle. The machine-learned LIDAR backbone model can be configured to process the BEV representation of the LIDAR point cloud to generate a LIDAR feature map}}}. The machine-learned image backbone model can be configured to receive the image(s) and to process the image(s) to generate an image feature map. The machine-learned refinement model can be configured to receive respective region of interest (ROI) feature crops from each of the LIDAR feature map and the image feature map, to perform ROI-wise fusion to fuse respective pairs of ROI feature crops to generate fused ROI feature crops, and to generate one or more three-dimensional object detections based on the fused ROI feature crops. Each of the one or more three-dimensional object detections can indicate a location of a detected object within the environment surrounding the autonomous vehicle. For example, the object detection(s) can be provided in the form of a three-dimensional bounding shape (e.g., bounding box). Thus, example implementations of the present disclosure can perform multi-sensor fusion at the ROI level”);
partitioning the point features (This step in illustrated/shown/disclosed in Figs. 2-3 and Paragraphs [0023, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0101-0103, 0111, 0138, 0150]: “In particular, a multi-task loss can be employed to train the multi-sensor detector end-to-end. In some implementations, the full model ensemble outputs object classification, 3D box estimation, 2D and 3D box refinement, ground estimation and dense depth. During training, detection labels and dense depth labels may be available, while ground estimation can be optimized implicitly by the 3D localization loss. There are two paths of gradient transmission for ground estimation. One is from the 3D box output where ground height is added back to predicted Z term. The other goes through the LIDAR backbone model 202 to the LIDAR voxelization layer where ground height is subtracted from the Z coordinate of each LIDAR point” {{{and}}} “At 716, method 700 can include detecting three-dimensional objects of interest based on the fused ROI crops generated at 714. In some implementations, detecting objects of interest at 716 can include providing the feature map generated at 714 as input to a machine-learned refinement model. In response to receiving the feature map, the machine-learned refinement model can be trained to generate as output a plurality of detections corresponding to identified objects of interest within the feature map. In some implementations, detecting objects of interest at 716 can include determining a plurality of object classifications and/or bounding shapes corresponding to the detected objects of interest. For example, in one implementation, the plurality of objects detected at 716 can include a plurality of bounding shapes at locations within the feature map(s) having a confidence score associated with an object likelihood that is above a threshold value. In some implementations, detecting objects of interest at 716 can include determining one or more of a classification indicative of a likelihood that each of the one or more objects of interest comprises a class of object from a predefined group of object classes (e.g., vehicle, bicycle, pedestrian, etc.) and a bounding shape representative of a size, a location, and an orientation of each the one or more objects of interest”); 
performing BEV-feature pooling based on the partitioned point features (Figs. 2-4 and [0096, 0099]: “FIG. 4 depicts precise rotated ROI feature extraction that takes orientation cycle into account. In particular, FIG. 4 illustrates, at (1), the rotational periodicity causes reverse of order in feature extraction and at (2), an ROI refine module with two orientation anchors. In some implementations, an ROI can be assigned to 0 or 90 degrees. They share most refining layers except for the output. At (3), FIG. 4 depicts the regression target of relative offsets are re-parametrized with respect to the object orientation axes and at (4) a n×n sized feature is extracted using bilinear interpolation (an example is shown with n=2)” and “For oriented BEV ROI feature extraction, however, two new issues (as shown in FIG. 4) are observed and resolved. First, the periodicity of the ROI orientation causes the reverse of feature order around the cycle boundary. To solve this issue, the present disclosure proposes an oriented ROI feature extraction module with anchors. Given an oriented ROI, it can first be assigned to one of the two orientation anchors, 0 or 90 degrees. Each anchor has a consistent feature extraction order. The two anchors share the refinement net except for the output layer”);
determining lane-boundary heads based on the BEV-feature pooling ([0054]: “In addition to the autonomy sensor data 116, the autonomy computing system 120 can retrieve or otherwise obtain data including the map data 122. The map data 122 can provide detailed information about the surrounding environment of the vehicle 102. For example, the map data 122 can provide information regarding: the identity and location of different roadways, road segments, buildings, or other items or objects (e.g., lampposts, crosswalks and/or curb); the {{{location and directions of traffic lanes}}} (e.g., the location and direction of a parking lane, a turning lane, a bicycle lane, or other lanes within a particular roadway or other travel way and/or one or more boundary markings associated therewith); traffic control data (e.g., the location and instructions of signage, traffic lights, or other traffic control devices); and/or any other map data that provides information that assists the vehicle computing system 112 in processing, analyzing, and perceiving its surrounding environment and its relationship thereto”).

But Urtasun does not explicitly and/or specifically meet the following limitations: 
wherein the determining comprises row-wise classification with at least one of: offset correction regression processing; or vertex-wise height regression processing.
a first classification for a first row of the plurality of rows, and a second classification for a second row of the plurality of rows, wherein each of the first classification and the second classification is performed with at least one of: offset correction regression processing; or vertex-wise height regression processing

However, regarding limitation (A) above, KIM (Figs. 1, 3-4 and 6) discloses/teaches the following:
An apparatus for single-stage three-dimensional (3D) multi-object detection by using a LiDAR sensor to detect 3D multiple objects, comprising: a data input module configured to receive raw point cloud data from the LiDAR sensor; a BEV image generating module configured to generate bird's eye view (BEV) images from the raw point cloud data; a learning module configured to perform a deep learning algorithm-based learning task to extract a fine-grained feature image from the BEV images; and a localization module configured to perform a regression operation and a localization operation to find 3D candidate boxes and classes corresponding to the 3D candidate boxes for detecting 3D objects from the fine-grained feature image (KIM Abstract).

The data input module 110 receives raw point cloud data from a LiDAR sensor (KIM [0036]).
The BEV image generation module 120 generates a BEV image from the raw point cloud data (KIM [0037]).
The learning module 130 performs deep learning algorithm-based learning to extract a fine-grained or subdivided feature image from the BEV image (KIM [0038]).
In at least one embodiment of the present disclosure, the learning module 130 performs Convolutional Neural Network (CNN)-based learning (KIM [0039]).
The localization module 140 performs a regression operation and a localization operation to find, in the subdivided feature image, 3D candidate boxes and their corresponding classes for detecting a 3D object (KIM [0040]).
The BEV image generation module 120 may generate a BEV image by projecting and discretizing the raw 3D point cloud data into 2D pseudo-images (KIM [0041]).
The BEV image generation module 120 may encode the raw 3D point cloud data to generate four feature map images based on a height feature map, a density feature map, an intensity feature map, and a distance feature map (KIM [0042]).
The learning process may employ a center regression, offset regression, orientation regression, Z-axis location regression, and size regression (KIM [0088]).

It is noted that KIM discloses a similar 3-DIMENSION MULTI-OBJECT DETECTING APPARATUS AND METHOD FOR AUTONOMOUS DRIVING like Urtasun above.

Still further, regarding the motivation for combining elements under 35 U.S.C. 103, the Examiner respectfully notes that the above combination is a 2-way linear superposition of “Prior Art Elements”, where one skilled in the art can incorporate the teachings of KIM into Urtasun, or the other way around, incorporate the teachings of Urtasun into KIM, where both ways would result in the claimed limitations.

Accordingly, one skilled in the art would have been motivated to incorporate the teachings of KIM into Urtasun to improve the accuracy of a 3D object detection task while maintaining a very fast inference speed and also minimize the imbalance training set and improve the stability of the model.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the image-processing method and identification system of Urtasun incorporating additional controller programming instructions as taught by KIM to improve the accuracy of a 3D object detection task while maintaining a very fast inference speed and also minimize the imbalance training set and improve the stability of the model.

Further on, regarding limitation (B) above, Yoo (Figure 2) discloses/teaches the following:
In the second stage, we successively squeeze the horizontal dimension of the shared representation using HRMs without changing the vertical dimension. With this squeeze
operation, we can obtain the row-wise representation in a more natural way. After running shared HRMs, we squeeze the remaining width of representation by lane marker-wise HRMs to make single vector representation for each row. We found that it is required to assign dedicated HRMs on each lane marker after the shared HRMs for increasing accuracy numbers, since each lane marker has different innate spatial and shape characteristics. For computational efficiency, however, only the first few HRMs are shared across lane markers, followed by lane marker-wise HRMs. With more shared layers we can save computational cost but each lane marker accuracy might be degraded. Specifically, in the skip connection, we add a horizontal average pooling layer with a 1×1 convolution to down-sample horizontal components. Although pooling operations let the deeper layers gather more spatial context (to improve classification) and reduce computational complexity, they still have the drawback of reducing the pixel precision. Therefore, to effectively keep and enhance the horizontal representation, inspired by the pixel shuffle layer of [32, 24], we propose to rearrange the elements of C × H × W input tensor to make a tensor of shape rC×H×W/r in the residual branch, which is somewhat a reverse operation of the original pixel shuffle block in [32] so called the horizontal pixel unshuffled layer. By rearranging the representation, we can efficiently move spatial information to channel.

    PNG
    media_image1.png
    660
    1456
    media_image1.png
    Greyscale

Yoo Figure 2 (a)-(b)

Still further, regarding limitation (B) above, Paek (Figures 1-2) discloses/teaches the following:
The feature extractor is responsible for encoding the point cloud raw data into an output feature map that is used by the detection head to make the final predictions. It is composed of two parts: the BEV encoder and the global feature correlator (GFC) backbone. Given a point cloud P = {p1,p2, ...,pn}, where pi ∈ R(3+C) is a point in the 3D space with C additional features such as intensity and reflectivity, the feature extractor first encode the raw point cloud data into a pseudo BEV image of size CBEV ×HBEV ×WBEV , where CBEV is the number of feature channels, HBEV is the number of rows, and WBEV is the number of columns.
After obtaining the pseudo-BEV image, the backbone then learn important features through global feature correlator. This results in the final output feature map of size Chead × HBEV ×WBEV. We utilize the same feature extractor as seen in previous state-of-the-art network [9] as our focus in this work is on the detection head with the two-stage row-wise formulation.

    PNG
    media_image2.png
    570
    1410
    media_image2.png
    Greyscale
The row-wise detection head uses the final output feature map as an input and produce two predictions: the row-wise lane existence and the row-wise lane location probability. To do so, we leverage the fact that lane lines from a LiDAR scan have almost no shape distortion along the BEV map rows, thus, it is suitable to utilize shared-MLPs. As shown in Fig. 2, the MLPs are shared along the rows of the feature map, that is, each row in the feature map is considered as an individual feature vector (colorized as purple in Fig. 2) to be processed by the same MLPs.
Paek Figure 1


    PNG
    media_image3.png
    778
    938
    media_image3.png
    Greyscale

Paek Figure 2

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the image-processing method and identification system of Urtasun in view of KIM further incorporating additional controller programming instructions as taught by Yoo and Paek to improve the accuracy of a 3D object detection task and improve the stability of the model.

Regarding claim 19, Urtasun discloses:
A non-transitory computer-readable medium storing instructions that, when executed by a processor (Please see same limitations analysis in claims 1 and 10 above), cause the processor to perform operations comprising:
extracting point features from light detection and ranging (LiDAR) data (Please see same limitations analysis in claims 1 and 10 above);
partitioning the point features (Please see same limitations analysis in claims 1 and 10 above); 
performing BEV-feature pooling based on the partitioned point features (Please see same limitations analysis in claims 1 and 10 above);
determining lane-boundary heads based on the BEV-feature pooling (Please see same limitations analysis in claims 1 and 10 above).

But Urtasun does not explicitly and/or specifically meet the following limitations: 
wherein the determining comprises row-wise classification with at least one of: offset correction regression processing; or vertex-wise height regression processing.
a first classification for a first row of the plurality of rows, and a second classification for a second row of the plurality of rows, wherein each of the first classification and the second classification is performed with at least one of: offset correction regression processing; or vertex-wise height regression processing

However, regarding limitation (A) above, KIM (Figs. 1, 3-4 and 6) discloses/teaches the following:
An apparatus for single-stage three-dimensional (3D) multi-object detection by using a LiDAR sensor to detect 3D multiple objects, comprising: a data input module configured to receive raw point cloud data from the LiDAR sensor; a BEV image generating module configured to generate bird's eye view (BEV) images from the raw point cloud data; a learning module configured to perform a deep learning algorithm-based learning task to extract a fine-grained feature image from the BEV images; and a localization module configured to perform a regression operation and a localization operation to find 3D candidate boxes and classes corresponding to the 3D candidate boxes for detecting 3D objects from the fine-grained feature image (KIM Abstract).

The data input module 110 receives raw point cloud data from a LiDAR sensor (KIM [0036]).
The BEV image generation module 120 generates a BEV image from the raw point cloud data (KIM [0037]).
The learning module 130 performs deep learning algorithm-based learning to extract a fine-grained or subdivided feature image from the BEV image (KIM [0038]).
In at least one embodiment of the present disclosure, the learning module 130 performs Convolutional Neural Network (CNN)-based learning (KIM [0039]).
The localization module 140 performs a regression operation and a localization operation to find, in the subdivided feature image, 3D candidate boxes and their corresponding classes for detecting a 3D object (KIM [0040]).
The BEV image generation module 120 may generate a BEV image by projecting and discretizing the raw 3D point cloud data into 2D pseudo-images (KIM [0041]).
The BEV image generation module 120 may encode the raw 3D point cloud data to generate four feature map images based on a height feature map, a density feature map, an intensity feature map, and a distance feature map (KIM [0042]).
The learning process may employ a center regression, offset regression, orientation regression, Z-axis location regression, and size regression (KIM [0088]).

It is noted that KIM discloses a similar 3-DIMENSION MULTI-OBJECT DETECTING APPARATUS AND METHOD FOR AUTONOMOUS DRIVING like Urtasun above.

Still further, regarding the motivation for combining elements under 35 U.S.C. 103, the Examiner respectfully notes that the above combination is a 2-way linear superposition of “Prior Art Elements”, where one skilled in the art can incorporate the teachings of KIM into Urtasun, or the other way around, incorporate the teachings of Urtasun into KIM, where both ways would result in the claimed limitations.

Accordingly, one skilled in the art would have been motivated to incorporate the teachings of KIM into Urtasun to improve the accuracy of a 3D object detection task while maintaining a very fast inference speed and also minimize the imbalance training set and improve the stability of the model.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the image-processing method and identification system of Urtasun incorporating additional controller programming instructions as taught by KIM to improve the accuracy of a 3D object detection task while maintaining a very fast inference speed and also minimize the imbalance training set and improve the stability of the model.

Further on, regarding limitation (B) above, Yoo (Figure 2) discloses/teaches the following:
In the second stage, we successively squeeze the horizontal dimension of the shared representation using HRMs without changing the vertical dimension. With this squeeze
operation, we can obtain the row-wise representation in a more natural way. After running shared HRMs, we squeeze the remaining width of representation by lane marker-wise HRMs to make single vector representation for each row. We found that it is required to assign dedicated HRMs on each lane marker after the shared HRMs for increasing accuracy numbers, since each lane marker has different innate spatial and shape characteristics. For computational efficiency, however, only the first few HRMs are shared across lane markers, followed by lane marker-wise HRMs. With more shared layers we can save computational cost but each lane marker accuracy might be degraded. Specifically, in the skip connection, we add a horizontal average pooling layer with a 1×1 convolution to down-sample horizontal components. Although pooling operations let the deeper layers gather more spatial context (to improve classification) and reduce computational complexity, they still have the drawback of reducing the pixel precision. Therefore, to effectively keep and enhance the horizontal representation, inspired by the pixel shuffle layer of [32, 24], we propose to rearrange the elements of C × H × W input tensor to make a tensor of shape rC×H×W/r in the residual branch, which is somewhat a reverse operation of the original pixel shuffle block in [32] so called the horizontal pixel unshuffled layer. By rearranging the representation, we can efficiently move spatial information to channel.

    PNG
    media_image1.png
    660
    1456
    media_image1.png
    Greyscale

Yoo Figure 2 (a)-(b)

Still further, regarding limitation (B) above, Paek (Figures 1-2) discloses/teaches the following:
The feature extractor is responsible for encoding the point cloud raw data into an output feature map that is used by the detection head to make the final predictions. It is composed of two parts: the BEV encoder and the global feature correlator (GFC) backbone. Given a point cloud P = {p1,p2, ...,pn}, where pi ∈ R(3+C) is a point in the 3D space with C additional features such as intensity and reflectivity, the feature extractor first encode the raw point cloud data into a pseudo BEV image of size CBEV ×HBEV ×WBEV , where CBEV is the number of feature channels, HBEV is the number of rows, and WBEV is the number of columns.
After obtaining the pseudo-BEV image, the backbone then learn important features through global feature correlator. This results in the final output feature map of size Chead × HBEV ×WBEV. We utilize the same feature extractor as seen in previous state-of-the-art network [9] as our focus in this work is on the detection head with the two-stage row-wise formulation.

    PNG
    media_image2.png
    570
    1410
    media_image2.png
    Greyscale
The row-wise detection head uses the final output feature map as an input and produce two predictions: the row-wise lane existence and the row-wise lane location probability. To do so, we leverage the fact that lane lines from a LiDAR scan have almost no shape distortion along the BEV map rows, thus, it is suitable to utilize shared-MLPs. As shown in Fig. 2, the MLPs are shared along the rows of the feature map, that is, each row in the feature map is considered as an individual feature vector (colorized as purple in Fig. 2) to be processed by the same MLPs.
Paek Figure 1


    PNG
    media_image3.png
    778
    938
    media_image3.png
    Greyscale

Paek Figure 2

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the image-processing method and identification system of Urtasun in view of KIM further incorporating additional controller programming instructions as taught by Yoo and Paek to improve the accuracy of a 3D object detection task and improve the stability of the model.

Regarding claim 24, Urtasun discloses:
A vehicle (vehicle 102: Fig. 1 and [0051]), comprising:
a steering system (Steering system: [0040, 0062]);
a light detection and ranging (LiDAR) imaging system (light detection and ranging {LIDAR} system: [0006, 0019, 0020, 0022, 0023]);
a memory storing processor-readable code (Please see same limitations analysis in claims 1 and 10 above); and
at least one processor coupled to the memory (Please see same limitations analysis in claims 1 and 10 above), to the LiDAR imaging system, and to the steering system, the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations including:
extracting point features from light detection and ranging (LiDAR) data (Please see same limitations analysis in claims 1 and 10 above);
partitioning the point features (Please see same limitations analysis in claims 1 and 10 above);
performing BEV-feature pooling based on the partitioned point features (Please see same limitations analysis in claims 1 and 10 above);
determining lane-boundary heads based on the BEV-feature pooling (Please see same limitations analysis in claims 1 and 10 above).

But Urtasun does not explicitly and/or specifically meet the following limitations: 
wherein the determining comprises row-wise classification with at least one of: offset correction regression processing; or vertex-wise height regression processing.
a first classification for a first row of the plurality of rows, and a second classification for a second row of the plurality of rows, wherein each of the first classification and the second classification is performed with at least one of: offset correction regression processing; or vertex-wise height regression processing

However, regarding limitation (A) above, KIM (Figs. 1, 3-4 and 6) discloses/teaches the following:
An apparatus for single-stage three-dimensional (3D) multi-object detection by using a LiDAR sensor to detect 3D multiple objects, comprising: a data input module configured to receive raw point cloud data from the LiDAR sensor; a BEV image generating module configured to generate bird's eye view (BEV) images from the raw point cloud data; a learning module configured to perform a deep learning algorithm-based learning task to extract a fine-grained feature image from the BEV images; and a localization module configured to perform a regression operation and a localization operation to find 3D candidate boxes and classes corresponding to the 3D candidate boxes for detecting 3D objects from the fine-grained feature image (KIM Abstract).

The data input module 110 receives raw point cloud data from a LiDAR sensor (KIM [0036]).
The BEV image generation module 120 generates a BEV image from the raw point cloud data (KIM [0037]).
The learning module 130 performs deep learning algorithm-based learning to extract a fine-grained or subdivided feature image from the BEV image (KIM [0038]).
In at least one embodiment of the present disclosure, the learning module 130 performs Convolutional Neural Network (CNN)-based learning (KIM [0039]).
The localization module 140 performs a regression operation and a localization operation to find, in the subdivided feature image, 3D candidate boxes and their corresponding classes for detecting a 3D object (KIM [0040]).
The BEV image generation module 120 may generate a BEV image by projecting and discretizing the raw 3D point cloud data into 2D pseudo-images (KIM [0041]).
The BEV image generation module 120 may encode the raw 3D point cloud data to generate four feature map images based on a height feature map, a density feature map, an intensity feature map, and a distance feature map (KIM [0042]).
The learning process may employ a center regression, offset regression, orientation regression, Z-axis location regression, and size regression (KIM [0088]).

It is noted that KIM discloses a similar 3-DIMENSION MULTI-OBJECT DETECTING APPARATUS AND METHOD FOR AUTONOMOUS DRIVING like Urtasun above.

Still further, regarding the motivation for combining elements under 35 U.S.C. 103, the Examiner respectfully notes that the above combination is a 2-way linear superposition of “Prior Art Elements”, where one skilled in the art can incorporate the teachings of KIM into Urtasun, or the other way around, incorporate the teachings of Urtasun into KIM, where both ways would result in the claimed limitations.

Accordingly, one skilled in the art would have been motivated to incorporate the teachings of KIM into Urtasun to improve the accuracy of a 3D object detection task while maintaining a very fast inference speed and also minimize the imbalance training set and improve the stability of the model.

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the image-processing method and identification system of Urtasun incorporating additional controller programming instructions as taught by KIM to improve the accuracy of a 3D object detection task while maintaining a very fast inference speed and also minimize the imbalance training set and improve the stability of the model.

Further on, regarding limitation (B) above, Yoo (Figure 2) discloses/teaches the following:
In the second stage, we successively squeeze the horizontal dimension of the shared representation using HRMs without changing the vertical dimension. With this squeeze
operation, we can obtain the row-wise representation in a more natural way. After running shared HRMs, we squeeze the remaining width of representation by lane marker-wise HRMs to make single vector representation for each row. We found that it is required to assign dedicated HRMs on each lane marker after the shared HRMs for increasing accuracy numbers, since each lane marker has different innate spatial and shape characteristics. For computational efficiency, however, only the first few HRMs are shared across lane markers, followed by lane marker-wise HRMs. With more shared layers we can save computational cost but each lane marker accuracy might be degraded. Specifically, in the skip connection, we add a horizontal average pooling layer with a 1×1 convolution to down-sample horizontal components. Although pooling operations let the deeper layers gather more spatial context (to improve classification) and reduce computational complexity, they still have the drawback of reducing the pixel precision. Therefore, to effectively keep and enhance the horizontal representation, inspired by the pixel shuffle layer of [32, 24], we propose to rearrange the elements of C × H × W input tensor to make a tensor of shape rC×H×W/r in the residual branch, which is somewhat a reverse operation of the original pixel shuffle block in [32] so called the horizontal pixel unshuffled layer. By rearranging the representation, we can efficiently move spatial information to channel.

    PNG
    media_image1.png
    660
    1456
    media_image1.png
    Greyscale

Yoo Figure 2 (a)-(b)

Still further, regarding limitation (B) above, Paek (Figures 1-2) discloses/teaches the following:
The feature extractor is responsible for encoding the point cloud raw data into an output feature map that is used by the detection head to make the final predictions. It is composed of two parts: the BEV encoder and the global feature correlator (GFC) backbone. Given a point cloud P = {p1,p2, ...,pn}, where pi ∈ R(3+C) is a point in the 3D space with C additional features such as intensity and reflectivity, the feature extractor first encode the raw point cloud data into a pseudo BEV image of size CBEV ×HBEV ×WBEV , where CBEV is the number of feature channels, HBEV is the number of rows, and WBEV is the number of columns.
After obtaining the pseudo-BEV image, the backbone then learn important features through global feature correlator. This results in the final output feature map of size Chead × HBEV ×WBEV. We utilize the same feature extractor as seen in previous state-of-the-art network [9] as our focus in this work is on the detection head with the two-stage row-wise formulation.

    PNG
    media_image2.png
    570
    1410
    media_image2.png
    Greyscale
The row-wise detection head uses the final output feature map as an input and produce two predictions: the row-wise lane existence and the row-wise lane location probability. To do so, we leverage the fact that lane lines from a LiDAR scan have almost no shape distortion along the BEV map rows, thus, it is suitable to utilize shared-MLPs. As shown in Fig. 2, the MLPs are shared along the rows of the feature map, that is, each row in the feature map is considered as an individual feature vector (colorized as purple in Fig. 2) to be processed by the same MLPs.
Paek Figure 1


    PNG
    media_image3.png
    778
    938
    media_image3.png
    Greyscale

Paek Figure 2

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have modified the image-processing method and identification system of Urtasun in view of KIM further incorporating additional controller programming instructions as taught by Yoo and Paek to improve the accuracy of a 3D object detection task and improve the stability of the model.

Regarding claim 2, Urtasun as combined above disclose the method according to claim 1, and further on Urtasun as combined above also discloses:
wherein offset correction regression processing comprises regressing a planar distance of a lane vertex location from a center of a BEV grid cell (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 3, Urtasun as combined above disclose the method according to claim 1, and further on Urtasun as combined above also discloses:
wherein partitioning of the point features comprises partitioning with a grid cell size that is larger than road boundary vertices in the point features (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 4, Urtasun as combined above disclose the method according to claim 1, and further on Urtasun as combined above also discloses:
wherein the vertex-wise height regression processing comprises vertex height regression for each row of the plurality of rows of the BEV grid, including the first row and the second row of the plurality of rows (Urtasun as combined above [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 5, Urtasun as combined above disclose the method according to claim 4, and further on Urtasun as combined above also discloses:
wherein the vertex-wise height regression processing comprises determining a third-dimension for each planar distance of a lane vertex location (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 6, Urtasun as combined above disclose the method according to claim 1, and further on Urtasun as combined above also discloses:
extracting features from the point features that are partitioned, wherein the BEV-feature pooling is performed based on the features that are extracted (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 7, Urtasun as combined above disclose the method according to claim 1, and further on Urtasun as combined above also discloses:
wherein determining the lane-boundary heads comprises row-wise classification with both offset correction regression processing and vertex-wise height regression processing (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 8, Urtasun as combined above disclose the method according to claim 1, and further on Urtasun as combined above also discloses:
receiving the LiDAR data from a LiDAR imaging system of a vehicle, wherein the receiving the LiDAR data and the determining lane-boundary heads based on the LiDAR data are performed during operation of the vehicle (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 9, Urtasun as combined above disclose the method according to claim 6, and further on Urtasun as combined above also discloses:
assisting a driver in steering the vehicle based on the lane-boundary heads (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 11, Urtasun as combined above disclose the apparatus according to claim 10, and further on Urtasun as combined above also discloses:
wherein offset correction regression processing comprises regressing a planar distance of a lane vertex location from a center of a BEV grid cell (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 12, Urtasun as combined above disclose the apparatus according to claim 10, and further on Urtasun as combined above also discloses:
wherein partitioning of the point features comprises partitioning with a grid cell size that is larger than road boundary vertices in the point features (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 13, Urtasun as combined above disclose the apparatus according to claim 10, and further on Urtasun as combined above also discloses:
wherein the vertex-wise height regression processing comprises vertex height regression for each row of the plurality of rows of the BEV grid, including the first row and the second row of the plurality of rows (Urtasun as combined above [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 14, Urtasun as combined above disclose the apparatus according to claim 13, and further on Urtasun as combined above also discloses:
wherein the vertex-wise height regression processing comprises determining a third-dimension for each planar distance of a lane vertex location (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 15, Urtasun as combined above disclose the apparatus according to claim 10, and further on Urtasun as combined above also discloses:
wherein the operations further include extracting features from the point features that are partitioned, wherein the BEV-feature pooling is performed based on the features that are extracted (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 16, Urtasun as combined above disclose the apparatus according to claim 10, and further on Urtasun as combined above also discloses:
wherein determining the lane-boundary heads comprises row-wise classification with both offset correction regression processing and vertex-wise height regression processing (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 17, Urtasun as combined above disclose the apparatus according to claim 10, and further on Urtasun as combined above also discloses:
wherein the operations further include receiving the LiDAR data from a LiDAR imaging system of a vehicle, wherein the receiving the LiDAR data and the determining lane-boundary heads based on the LiDAR data are performed during operation of the vehicle (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 18, Urtasun as combined above disclose the apparatus according to claim 17, and further on Urtasun as combined above also discloses:
wherein the operations further include assisting a driver in steering the vehicle based on the lane-boundary heads (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 20, Urtasun as combined above disclose the non-transitory computer-readable medium according to claim 19, and further on Urtasun as combined above also discloses:
wherein offset correction regression processing comprises regressing a planar distance of a lane vertex location from a center of a BEV grid cell (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 21, Urtasun as combined above disclose the non-transitory computer-readable medium according to claim 19, and further on Urtasun as combined above also discloses:
wherein partitioning of the point features comprises partitioning with a grid cell size that is larger than road boundary vertices in the point features (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 22, Urtasun as combined above disclose the non-transitory computer-readable medium according to claim 19, and further on Urtasun as combined above also discloses:
wherein the vertex-wise height regression processing comprises: 
vertex height regression for each row of the plurality of rows of the BEV grid, including the first row and the second row of the plurality of rows (Urtasun as combined above [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]); and
determining a third-dimension for each planar distance of a lane vertex location (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 23, Urtasun as combined above disclose the non-transitory computer-readable medium according to claim 19, and further on Urtasun as combined above also discloses:
wherein the operations further include extracting features from the point features that are partitioned, wherein the BEV-feature pooling is performed based on the features that are extracted (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 25, Urtasun as combined above disclose the vehicle according to claim 24, and further on Urtasun as combined above also discloses:
wherein offset correction regression processing comprises regressing a planar distance of a lane vertex location from a center of a BEV grid cell (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 26, Urtasun as combined above disclose the vehicle according to claim 24, and further on Urtasun as combined above also discloses:
wherein partitioning of the point features comprises partitioning with a grid cell size that is larger than road boundary vertices in the point features (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 27, Urtasun as combined above disclose the vehicle according to claim 24, and further on Urtasun as combined above also discloses:
wherein the vertex-wise height regression processing comprises: 
vertex height regression for each row of the plurality of rows of the BEV grid, including the first row and the second row of the plurality of rows (Urtasun as combined above [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]); and
determining a third-dimension for each planar distance of a lane vertex location (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 28, Urtasun as combined above disclose the vehicle according to claim 24, and further on Urtasun as combined above also discloses:
wherein the operations further include extracting features from the point features that are partitioned, wherein the BEV-feature pooling is performed based on the features that are extracted (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 29, Urtasun as combined above disclose the vehicle according to claim 24, and further on Urtasun as combined above also discloses:
wherein the LiDAR data is received from the LiDAR imaging system, and wherein the receiving the LiDAR data and the determining lane-boundary heads based on the LiDAR data are performed during operation of the vehicle (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).

Regarding claim 30, Urtasun as combined above disclose the vehicle according to claim 24, and further on Urtasun as combined above also discloses:
wherein the operations further controlling the steering system based on the lane-boundary heads (Urtasun [0023, 0054, 0068-0070, 0072, 0075, 0077, 0085, 0087-0089, 0092, 0096, 0099, 0101-0103, 0111, 0138, 0150] and KIM [Abstract, 0036-0042, 0088]).
Pertinent Prior Art
       The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
"Lane_Structure_Fitting_via_Row-wise_Grid_Classification" (Please see PTO-892) - Liu et al.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ruben Picon-Feliciano whose telephone number is (571)-272-4938. The examiner can normally be reached on Monday-Thursday within 11:30 am-7:30 pm ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lindsay M. Low can be reached on (571)272-1196.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RUBEN PICON-FELICIANO/Examiner, Art Unit 3747

/GRANT MOUBRY/Primary Examiner, Art Unit 3747
Read full office action
Prosecution Timeline

Nov 14, 2023
Application Filed
Apr 05, 2025
Non-Final Rejection — §103
Jun 25, 2025
Response Filed
Oct 03, 2025
Final Rejection — §103
Nov 24, 2025
Request for Continued Examination
Dec 04, 2025
Response after Non-Final Action
Feb 20, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/914,627
Patent 12601670
CONTROLLING A VISCOSITY OF FUEL IN A FUEL CONTROL SYSTEM WITH A VIBRATORY METER
2y 5m to grant Granted Apr 14, 2026
18/608,959
Patent 12594915
BRAKE FORCE DISTRIBUTION DEVICE FOR VEHICLE AND METHOD THEREOF
2y 5m to grant Granted Apr 07, 2026
18/409,479
Patent 12583384
SYSTEM AND METHOD FOR CONTROLLING A VEHICLE CONDITION CHECK LIGHT USING A DWL MODE
2y 5m to grant Granted Mar 24, 2026
18/555,968
Patent 12583423
METHOD FOR DRIVE CONTROL
2y 5m to grant Granted Mar 24, 2026
18/722,444
Patent 12576901
SYSTEM AND METHOD FOR HAPTIC CALIBRATION
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
68%
Grant Probability
82%
With Interview (+13.3%)
3y 1m
Median Time to Grant
High
PTA Risk
Based on 708 resolved cases by this examiner. Grant probability derived from career allow rate.