Last updated: April 19, 2026
Application No. 18/607,036
MULTIMODAL SENSOR AGNOSTIC LOCALIZATION USING ONE OR MORE ADAPTIVE FEATURE GRAPHS

Non-Final OA §103
Filed
Mar 15, 2024
Examiner
BARNES JR, CARL E
Art Unit
2178
Tech Center
2100 — Computer Architecture & Software
Assignee
Qualcomm Incorporated
OA Round
1 (Non-Final)
This examiner grants 32% of cases after interview

— +25.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 202 resolved cases, 2023–2026
Examiner Intelligence

BARNES JR, CARL E View full profile →
Grants only 32% of cases
Career Allow Rate
65 granted / 202 resolved
-22.8% vs TC avg
Strong +25% interview lift
Without
With
+25.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
32 currently pending
Career history
234
Total Applications
across all art units
Statute-Specific Performance

§101
14.3%
-25.7% vs TC avg
§103
62.6%
+22.6% vs TC avg
§102
9.0%
-31.0% vs TC avg
§112
8.7%
-31.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 202 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/13/2025 was filed..  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Examiner note
A query scene graph is a query graph that represent relationships of objects in a scene, that can be matched from images (bird’s eye view). A query graph is a data structure. “normalized a bird’s eye view images to into calibration, scale and/or coordinate system.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Park (US 20240020953 A1, File Date: Jul. 17, 2023) in view of Sangwoong Yoon (Image-to-Image Retrieval by Learning Similarity between Scene Graphs, Dec 2020, hereinafter “Yoon”).
Regarding independent claim 1, Park teaches: An apparatus for processing image data, the apparatus comprising: (Park − [0027] computer device 800 of Fig. 8; [0034] The sensor data 120 may include image data representing an image(s),)
a sensor; (Park − [0033-0034] In embodiments, any number of sensors may be used to incorporate multiple fields of view (e.g., the fields of view of the long-range cameras 798, the forward-facing stereo camera 768, and/or the forward facing wide-view camera 770 of FIG. 7B) and/or sensory fields (e.g., of a LIDAR sensor 764, a RADAR sensor 760, etc.).)
at least one memory; (Park − [0027] computer device 800 of Fig. 8; [0034] The sensor data 120 may include image data representing an image(s), For instance, various functions may be carried out by a processor executing instructions stored in memory.)
and at least one processor coupled to the at least one memory, the at least one processor being configured to: (Park − [0027] computer device 800 of Fig. 8; [0034] The sensor data 120 may include image data representing an image(s), For instance, various functions may be carried out by a processor executing instructions stored in memory.)
obtain a first set of image features from one or more images of an environment captured by a camera; (Park − [0029] The image encoder 102 (or more generally, sensor data encoder 102) may be configured to receive sensor data corresponding to a plurality of sensors and/or views (e.g., perspective views) of an environment, such as sensor data 120. [0031] The sensor data 120 may be generated using one or more sensors’ [0032] wide-view camera(s) 770 (e.g., fisheye cameras), infrared camera(s)) receiving data from camera 770
transform the first set of image features to generate a first set of bird’s eye view (BEV) image features; (Park − Fig. 2A, Fig. 5, [0040] FIG. 2A is a perspective view 200A illustrating examples of a set of transformed features 224 corresponding to a field of view of a sensor scattered onto a Bird's-Eye-View (BEV) plane 210; [0064] The method 500, at block B502, includes transforming first feature values corresponding to a first view into one or more first BEV feature values.)
obtain a second set of features, the second set of features generated based on a representation of the environment obtained using the sensor having a different sensor type than the camera; (Park − [0029] The image encoder 102 (or more generally, sensor data encoder 102) may be configured to receive sensor data corresponding to a plurality of sensors and/or views (e.g., perspective views) of an environment, such as sensor data 120. [0031] The sensor data 120 may be generated using one or more sensors’ [0032] LIDAR sensor(s) 764) receiving data from LIDAR sensor(s) 764
transform the second set of features to generate a second set of BEV features; (Park − Fig. 2A, Fig. 5, [0040] FIG. 2A is a perspective view 200A illustrating examples of a set of transformed features 224 corresponding to a field of view of a sensor scattered onto a Bird's-Eye-View (BEV) plane 210; [0065] At block B504, the method 500 includes transforming second feature values corresponding to a second view into one or more second BEV feature values.)
normalize the first set of BEV image features based on camera configuration information associated with the one or more images; (Park − [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes), and converts the projections to polar coordinates, resulting in a list of BEV points)
normalize the second set of BEV features based on sensor configuration information of the sensor; (Park − [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes), and converts the projections to polar coordinates, resulting in a list of BEV points; Examiner note sensor calibration can be applied to other BEV features such as LIDAR sensor)
Park does not explicitly teach: generate a query graph
However, Yoon teaches: and generate a query graph based on the normalized first set of BEV image features and the normalized second set of BEV features. (Yoon − [pdf page 2-3] Given a query image, IRSGS first generates a query scene graph from the image and then retrieves images with a scene graph highly similar to the query scene graph. The similarity between scene graphs is computed through a graph neural network trained)

    PNG
    media_image1.png
    98
    337
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    91
    343
    media_image2.png
    Greyscale


Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Park, and Yoon as each invention in the same field of computer vision for identifying, detecting, tracking objects. Adding the teaching of Yoon, provide Park with an improvement in transformation process of features by using the query scene graph approach. One of ordinary skill in the art would have been motivated to make these modification to improve labeling imagery data using neural network for identifying, detecting complex objects.
Regarding dependent claim 2, depends on claim 1, Park teaches: wherein the sensor comprises one of a light detection and ranging (LIDAR) sensor, a radar sensor, or a sonar sensor. (Park − FIGS. 7A-7C, the sensor data 120 may include data generated by or using, without limitation, global navigation satellite systems (GNSS) sensor(s) 758 (e.g., Global Positioning System sensor(s), differential GPS (DGPS), etc.), RADAR sensor(s) 760, ultrasonic sensor(s) 762, LIDAR sensor(s) 764, inertial measurement unit (IMU) sensor(s) 766 (e.g., accelerometer(s), gyro scope(s), magnetic compass(es), magnetometer(s), etc.), microphone(s) 796, stereo camera(s) 768, wide-view camera(s) 770 (e.g., fisheye cameras), infrared camera(s) 772, surround camera(s) 774 (e.g., 360 degree cameras), long-range and/or mid-range camera(s) 798, speed sensor(s) 744 (e.g., for measuring the speed of the vehicle 700 and/or distance traveled), and/or other sensor types.)
Regarding dependent claim 3, depends on claim 1, Park teaches: wherein the camera configuration information comprises calibration information for the camera, and wherein the sensor configuration information comprises calibration information for the sensor. (Park – [0034] at least some of the sensor data 120 may undergo pre-processing (e.g., noise balancing, demosaicing, scaling, cropping, augmentation, white balancing, tone curve adjustment, etc., such as using a sensor data pre-processor (not shown)). [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes))
Regarding dependent claim 4, depends on claim 3, Park teaches: wherein the calibration information for the camera includes information associated with at least one of a field of view (FOV) of the camera, principal point of the camera, and lens distortion information.  (Park – [0033] any number of sensors may be used to incorporate multiple fields of view (e.g., the fields of view of the long-range cameras 798, [0034] at least some of the sensor data 120 may undergo pre-processing (e.g., noise balancing, demosaicing, scaling, cropping, augmentation, white balancing, tone curve adjustment, etc., such as using a sensor data pre-processor (not shown)). [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes))
Regarding dependent claim 5, depends on claim 3, Park teaches: wherein the calibration information for the sensor includes information associated with at least one of a mounting height, tilt angle, FOV, and range of the sensor. (Park – [0033] any number of sensors may be used to incorporate multiple fields of view (e.g., the fields of view of the long-range cameras 798, [0034] at least some of the sensor data 120 may undergo pre-processing (e.g., noise balancing, demosaicing, scaling, cropping, augmentation, white balancing, tone curve adjustment, etc., such as using a sensor data pre-processor (not shown)). [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes) [0150] Front-mounted LIDAR sensor(s) 764 may be configured for a horizontal field of view between 45 degrees and 135 degrees.)
Regarding dependent claim 6, depends on claim 3, Park teaches: wherein at least one of the calibration information for the camera or calibration information for the sensor is refined over time. (Park – [0033-0034] [0045] [0150] The real-time ray-tracing hardware accelerator may be used to quickly and efficiently determine the positions and extents of objects (e.g., within a world model), to generate real-time visualization simulations, for RADAR signal interpretation, for sound propagation synthesis and/or analysis, for simulation of SONAR systems. Examiner Note: Ray-tracing is a form of calibration information for camera for real-time to determine the positions and extents of objects.)
Regarding dependent claim 7, depends on claim 3, Park teaches: wherein the at least one processor is configured to adapt the camera configuration information or sensor configuration information based on estimates of the environment. (Park – [0033-0034] [0045] [0051] Non-limiting examples of perception tasks that may be performed using the fused features 126 include one or more of object detection,… object orientation estimation, [0150] The real-time ray-tracing hardware accelerator may be used to quickly and efficiently determine the positions and extents of objects (e.g., within a world model), to generate real-time visualization simulations, for RADAR signal interpretation, for sound propagation synthesis and/or analysis, for simulation of SONAR systems. Examiner Note: object orientation estimation)
Regarding dependent claim 8, depends on claim 1, Park teaches: wherein at least one of the camera configuration information or sensor configuration information is determined by a machine learning model. (Park – [0037] In at least one embodiment, the image encoder 102 may be implemented using one or more machine learning models (MLMs). For example and without limitation, any of the various MLMs described herein may include one or more of any type(s) of machine learning model(s), such as a machine learning model using linear regression,… generative, etc. neural networks), and/or other types of machine learning model.)
Regarding dependent claim 9, depends on claim 8, Park teaches: wherein data obtained by the sensor is used by the machine learning model to determine the camera configuration information. (Park – [0035] Each camera and/or view may provide one or more images for input to the image encoder 102. [0036] As described herein, the image encoder(s) 102 (e.g., one or more 2D image encoders) may be configured to encode, using the sensor data 120, the image features 122 (e.g., image feature maps). [0037] In at least one embodiment, the image encoder 102 may be implemented using one or more machine learning models (MLMs).)
Regarding dependent claim 10, depends on claim 1, Park teaches: wherein the normalized first set of BEV image features and the normalized second set of BEV features comprise a normalized BEV feature map, (Park − [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes), and converts the projections to polar coordinates, resulting in a list of BEV points)
and wherein the at least one processor is configured to: divide the normalized BEV feature map into a grid of cells; (Park − [0041] the BEV plane into regions and the transformed features 124 may be associated with one or more particular regions of the BEV plane. For example, the BEV plane 210 is shown as being discretized into a grid. [0042] As indicated in FIG. 2A, transformed features in the set of transformed features 224 are associated with (e.g., assigned to) particular grid cells of the BEV plane 210. In at least one embodiment, the perspective transformer 104 performs assignments of transformed features to grid cells for the perspective transformations based at least on the geometric relationship between row and column positions in an image plane 240 corresponding to an image feature map of the image features 122 and radial and angular positions in the BEV plane 210.)
wherein features of the normalized BEV feature map in a cell of the grid of cells are aggregated (Park − [0046] The feature fuser 106 may aggregate into the global BEV feature map)
Park does not explicitly teach: generate a BEV feature graph
However, Yoon teaches: and generate a BEV feature graph based on the divided normalized BEV feature map, (Yoon − [pdf page 2-3] Given a query image, IRSGS first generates a query scene graph from the image and then retrieves images with a scene graph highly similar to the query scene graph. The similarity between scene graphs is computed through a graph neural network trained) into a node of the BEV feature graph. (Yoon − [pdf page 3] All objects, attributes, and relations are treated as nodes)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Park, and Yoon as each invention in the same field of computer vision for identifying, detecting, tracking objects. Adding the teaching of Yoon, provide Park with an improvement in transformation process of features by using the query scene graph approach. One of ordinary skill in the art would have been motivated to make these modification to improve labeling imagery data using neural network for identifying, detecting complex objects.
Regarding dependent claim 11, depends on claim 10, Park does not explicitly teach: to generate the query graph,
However, Yoon teaches: wherein, to generate the query graph, the at least one processor is configured to aggregate the BEV feature graph with another BEV feature graph within a time window. (Yoon − [pdf page 6 Graph Matching Networks (GMN) Final node representations are aggregated by summation, resulting in a 128-dimensional vector which is then fed to a multilayer perceptron to produce final scalar output.)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Park, and Yoon as each invention in the same field of computer vision for identifying, detecting, tracking objects. Adding the teaching of Yoon, provide Park with an improvement in transformation process of features by using the query scene graph approach. One of ordinary skill in the art would have been motivated to make these modification to improve labeling imagery data using neural network for identifying, detecting complex objects.
Regarding dependent claim 12, depends on claim 11, Park does not explicitly teach: generate a query graph feature
However, Yoon teaches: wherein the at least one processor is configured to generate a query graph feature descriptor based on features of the query graph. (Yoon − [pdf page 6 Graph Matching Networks (GMN) Final node representations are aggregated by summation, resulting in a 128-dimensional vector which is then fed to a multilayer perceptron to produce final scalar output. Resulting 128-dimensional vector is the graph feature descriptor)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Park, and Yoon as each invention in the same field of computer vision for identifying, detecting, tracking objects. Adding the teaching of Yoon, provide Park with an improvement in transformation process of features by using the query scene graph approach. One of ordinary skill in the art would have been motivated to make these modification to improve labeling imagery data using neural network for identifying, detecting complex objects.
Regarding dependent claim 13, depends on claim 12, Park does not explicitly teach: generate a query graph feature
However, Yoon teaches: wherein the query graph feature descriptor is generated by a graph neural network. (Yoon − [pdf page 6 Graph Matching Networks (GMN) Final node representations are aggregated by summation, resulting in a 128-dimensional vector which is then fed to a multilayer perceptron to produce final scalar output. Graph Matching Network is a graph neural network)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Park, and Yoon as each invention in the same field of computer vision for identifying, detecting, tracking objects. Adding the teaching of Yoon, provide Park with an improvement in transformation process of features by using the query scene graph approach. One of ordinary skill in the art would have been motivated to make these modification to improve labeling imagery data using neural network for identifying, detecting complex objects.
Regarding dependent claim 14, depends on claim 12, Park does not explicitly teach: compare the query graph feature descriptor to a scene graph feature descriptor
However, Yoon teaches: wherein the at least one processor is configured to compare the query graph feature descriptor to a scene graph feature descriptor to identify a portion of a scene graph that matches the query graph. (Yoon − [pdf page 2-3] Given a query image, IRSGS first generates a query scene graph from the image and then retrieves images with a scene graph highly similar to the query scene graph. The similarity between scene graphs is computed through a graph neural network trained)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Park, and Yoon as each invention in the same field of computer vision for identifying, detecting, tracking objects. Adding the teaching of Yoon, provide Park with an improvement in transformation process of features by using the query scene graph approach. One of ordinary skill in the art would have been motivated to make these modification to improve labeling imagery data using neural network for identifying, detecting complex objects.
Regarding dependent claim 15, depends on claim 1, Park teaches: wherein the apparatus further includes the camera and the sensor. (Park − FIGS. 7A-7C, the sensor data 120 may include data generated by or using, without limitation, global navigation satellite systems (GNSS) sensor(s) 758 (e.g., Global Positioning System sensor(s), differential GPS (DGPS), etc.), RADAR sensor(s) 760, ultrasonic sensor(s) 762, LIDAR sensor(s) 764, inertial measurement unit (IMU) sensor(s) 766 (e.g., accelerometer(s), gyro scope(s), magnetic compass(es), magnetometer(s), etc.), microphone(s) 796, stereo camera(s) 768, wide-view camera(s) 770 (e.g., fisheye cameras), infrared camera(s) 772, surround camera(s) 774 (e.g., 360 degree cameras), long-range and/or mid-range camera(s) 798, speed sensor(s) 744 (e.g., for measuring the speed of the vehicle 700 and/or distance traveled), and/or other sensor types.)
Regarding independent claim 16, claim 16 have similar/same technical features/limitations as claim 1 limitation above and is rejected under the same rational.
Regarding dependent claim 17, depends on claim 16, Park teaches: wherein the sensor comprises one of a light detection and ranging (LIDAR) sensor, a radar sensor, or a sonar sensor. (Park − FIGS. 7A-7C, the sensor data 120 may include data generated by or using, without limitation, global navigation satellite systems (GNSS) sensor(s) 758 (e.g., Global Positioning System sensor(s), differential GPS (DGPS), etc.), RADAR sensor(s) 760, ultrasonic sensor(s) 762, LIDAR sensor(s) 764, inertial measurement unit (IMU) sensor(s) 766 (e.g., accelerometer(s), gyro scope(s), magnetic compass(es), magnetometer(s), etc.), microphone(s) 796, stereo camera(s) 768, wide-view camera(s) 770 (e.g., fisheye cameras), infrared camera(s) 772, surround camera(s) 774 (e.g., 360 degree cameras), long-range and/or mid-range camera(s) 798, speed sensor(s) 744 (e.g., for measuring the speed of the vehicle 700 and/or distance traveled), and/or other sensor types.)
Regarding dependent claim 18, depends on claim 16, Park teaches: wherein: the camera configuration information comprises calibration information for the camera, (Park – [0034] at least some of the sensor data 120 may undergo pre-processing (e.g., noise balancing, demosaicing, scaling, cropping, augmentation, white balancing, tone curve adjustment, etc., such as using a sensor data pre-processor (not shown)). [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes))
the calibration information for the camera including information associated with at least one of a field of view (FOV) of the camera, principal point of the camera, and lens distortion information; (Park – [0033] any number of sensors may be used to incorporate multiple fields of view (e.g., the fields of view of the long-range cameras 798, [0034] at least some of the sensor data 120 may undergo pre-processing (e.g., noise balancing, demosaicing, scaling, cropping, augmentation, white balancing, tone curve adjustment, etc., such as using a sensor data pre-processor (not shown)). [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes))
and the sensor configuration information comprises calibration information for the sensor, the calibration information for the sensor including information associated with at least one of a mounting height, tilt angle, FOV, and range of the sensor. (Park – [0033] any number of sensors may be used to incorporate multiple fields of view (e.g., the fields of view of the long-range cameras 798, [0034] at least some of the sensor data 120 may undergo pre-processing (e.g., noise balancing, demosaicing, scaling, cropping, augmentation, white balancing, tone curve adjustment, etc., such as using a sensor data pre-processor (not shown)). [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes) [0150] Front-mounted LIDAR sensor(s) 764 may be configured for a horizontal field of view between 45 degrees and 135 degrees.)
Regarding dependent claim 19, depends on claim 16, Park teaches: wherein the normalized first set of BEV image features and the normalized second set of BEV features comprise a normalized BEV feature map, (Park − [0045] the BEV plane 210 using sensor calibrations 140 (corresponding to camera intrinsic attributes), and converts the projections to polar coordinates, resulting in a list of BEV points)
and further comprising: dividing the normalized BEV feature map into a grid of cells; (Park − [0041] the BEV plane into regions and the transformed features 124 may be associated with one or more particular regions of the BEV plane. For example, the BEV plane 210 is shown as being discretized into a grid. [0042] As indicated in FIG. 2A, transformed features in the set of transformed features 224 are associated with (e.g., assigned to) particular grid cells of the BEV plane 210. In at least one embodiment, the perspective transformer 104 performs assignments of transformed features to grid cells for the perspective transformations based at least on the geometric relationship between row and column positions in an image plane 240 corresponding to an image feature map of the image features 122 and radial and angular positions in the BEV plane 210.)
wherein features of the normalized BEV feature map in a cell of the grid of cells are aggregated (Park − [0046] The feature fuser 106 may aggregate into the global BEV feature map)
Park does not explicitly teach: and generating a BEV feature graph
However, Yoon teaches: and generating a BEV feature graph based on the divided normalized BEV feature map, (Yoon − [pdf page 2-3] Given a query image, IRSGS first generates a query scene graph from the image and then retrieves images with a scene graph highly similar to the query scene graph. The similarity between scene graphs is computed through a graph neural network trained) into a node of the BEV feature graph. (Yoon − [pdf page 3] All objects, attributes, and relations are treated as nodes)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Park, and Yoon as each invention in the same field of computer vision for identifying, detecting, tracking objects. Adding the teaching of Yoon, provide Park with an improvement in transformation process of features by using the query scene graph approach. One of ordinary skill in the art would have been motivated to make these modification to improve labeling imagery data using neural network for identifying, detecting complex objects.
Regarding dependent claim 20, depends on claim 19, Park does not explicitly teach: wherein generating the query graph
However, Yoon teaches: wherein generating the query graph comprises aggregating the BEV feature graph with another BEV feature graph within a time window. (Yoon − [pdf page 6 Graph Matching Networks (GMN) Final node representations are aggregated by summation, resulting in a 128-dimensional vector which is then fed to a multilayer perceptron to produce final scalar output.)
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teaching of Park, and Yoon as each invention in the same field of computer vision for identifying, detecting, tracking objects. Adding the teaching of Yoon, provide Park with an improvement in transformation process of features by using the query scene graph approach. One of ordinary skill in the art would have been motivated to make these modification to improve labeling imagery data using neural network for identifying, detecting complex objects.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Z. Li, BEVFormer: Learning Bird’s-Eye-View from multi-camera images.
S. Sharma, BEVSeg2GTA: Vehicle Segmentation and Graph Neural Networks.
L. Saini, Graph Query Networks for Object Detection with Automative Radars.
Y. Zhang, GraphAD: Interaction Scene Graph for End-to-End Autonomous Driving.
Z. Song, GraphBEV: Towards Robust BEV Feature Alignment for Mult-Modal 3D Object Detection.
D. Wu, HV-BEV: Decoupling Horizontal and Vertical Feature Sampling for Multi-View 3D Object Detection.
S. Mohapatra, LiDAR-BEVMTN: Real-time LiDAR Bird’s-Eye View Multi-Task Perception Network for Autonomous Driving.
D. Unger, Multi-camera Bird’s Eye View Perception for Autonomous Driving.
Z. Qi, OCBEV: Object-Centric BEV Transformation for Multi-View 3D Object Detection.
D. KUM, US 20250139986 A1, Perceiving Traffic Road Environment based on Graph Representation for Autonomous Driving.
Y. Owechko, US 7672911 B2 Graph-based for Object Group Recognition in 3D space.
Lakshmi Narayanan, US 20200086879 A1 Scene Classification Prediction
N. Smolyanskiy, US 20210150230 A1 Multi-view Deep Neural Network for LiDAR Perception
H. Deng US 20210241026 A1 Detection using Deep Fusion of camera, radar and LiDAR point cloud. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARL E BARNES JR whose telephone number is (571)270-3395. The examiner can normally be reached Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached at (571) 272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/CARL E BARNES JR/Examiner, Art Unit 2178                                                                                                                                                                                                        
/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178
Read full office action
Prosecution Timeline

Mar 15, 2024
Application Filed
Mar 17, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/953,132
Patent 12584932
SLIDE IMAGING APPARATUS AND A METHOD FOR IMAGING A SLIDE
2y 5m to grant Granted Mar 24, 2026
16/871,512
Patent 12541640
COMPUTING DEVICE FOR MULTIPLE CELL LINKING
2y 5m to grant Granted Feb 03, 2026
16/262,443
Patent 12536464
SYSTEM FOR CONSTRUCTING EFFECTIVE MACHINE-LEARNING PIPELINES WITH OPTIMIZED OUTCOMES
2y 5m to grant Granted Jan 27, 2026
17/428,937
Patent 12530765
SYSTEMS AND METHODS FOR CALCIUM-FREE COMPUTED TOMOGRAPHY ANGIOGRAPHY
2y 5m to grant Granted Jan 20, 2026
17/975,033
Patent 12530523
METHOD, APPARATUS, SYSTEM, AND COMPUTER PROGRAM FOR CORRECTING TABLE COORDINATE INFORMATION
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
32%
Grant Probability
57%
With Interview (+25.2%)
4y 4m
Median Time to Grant
Low
PTA Risk
Based on 202 resolved cases by this examiner. Grant probability derived from career allow rate.