Last updated: April 19, 2026
Application No. 18/332,394
ONLINE ADAPTIVE MULTI-SENSOR FUSION

Non-Final OA §103
Filed
Jun 09, 2023
Examiner
POTTS, RYAN PATRICK
Art Unit
2672
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
1 (Non-Final)
Interview Optional

— +36.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 235 resolved cases, 2023–2026
Examiner Intelligence

POTTS, RYAN PATRICK View full profile →
Grants 80% — above average
Career Allow Rate
189 granted / 235 resolved
+18.4% vs TC avg
Strong +37% interview lift
Without
With
+36.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
29 currently pending
Career history
264
Total Applications
across all art units
Statute-Specific Performance

§101
9.8%
-30.2% vs TC avg
§103
39.2%
-0.8% vs TC avg
§102
20.6%
-19.4% vs TC avg
§112
27.9%
-12.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 235 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The abstract of the disclosure is objected to because “The disclosure provides” and “Other aspects and features are also claimed and described” can be implied. A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. The language of the abstract should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc. See MPEP 608.01(b).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 2, 4, 6, 9, 10, 12, 14, 17-19, 22-26, 29, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation (published 26 May 2022) to Liu et al. (hereinafter “Liu”) in view of U.S. Pat. App. No. 11,537,819 to Das et al. (hereinafter “Das”).

Regarding claim 1, Liu teaches a method for multi-sensor fusion, comprising: 
receiving first information indicative of a first set of BEV features of image data (Liu, Figure 2, receiving image data indicates corresponding camera BEV features will be derived.) captured by an image sensor (Liu, Figure 2, “Camera Feat. (in BEV)”; The camera used to acquire the RGB mages is an image sensor.); 
receiving second information indicative of a second set of BEV features of first non-image sensor data (Liu, Figure 2, receiving LiDAR point cloud data indicates corresponding LiDAR BEV features will be derived.) captured by a first non-image sensor (Liu, Figure 2, “LiDAR Feat. (in BEV)”; The LiDAR sensor used to acquire the LiDAR point cloud is a first non-image sensor.); and 
determining fused data (Liu, Figure 2, “Fused BEV Features”) that combines the image data and the first non-image sensor data based on the first information and the second information (Liu, Figure 1, “BEVFusion unifies camera and LiDAR features in a shared BEV space instead of mapping one modality to the other” (original emphasis)), but does not explicitly teach that which is taught by Das.
Das teaches first information (Das, col. 16, ll. 53-61, “cameras disposed at various locations about the exterior and/or interior of the vehicle 402”), second information (Das, col. 16, ll. 44-61, “sensor(s) 406 may include lidar sensors, radar sensors … etc. …”) and determining third information indicative of differences (Updating the covariance matrix is indictive of differences between sensor features before and after the covariance is updated.) between features of training data and a first set of features, a second set of features, or both (Das, col. 3, l. 63 – col. 4, l. 5, “the covariance model may operate based on the output of a plurality of perception pipelines and the observation covariance matrix values may be utilized by a Kalman filter in fusing the observations that are associated with a track into values for the track, which may then be output to the planning component and/or prediction component.”; col. 5, ll. 8-22, “in examples that utilize multiple types of sensors, the covariance model may provide observation covariance values trained for an associated sensor type or perception pipeline. For example, the covariance model may output different observation covariance values for observation data based on lidar sensor data that for the same values of observation data based on image sensor data or radar sensor data.”; col. 20, ll. 36-43, “The Kalman filter 436 may operate to fuse the observation data of new object detections with track data of existing tracks based at least in part on an observation covariance matrix populated with observation covariance values output by the covariance model 434.”), the features of the training data comprise a third set of features associated with the image sensor and a fourth set of features associated with the first non-image sensor (Das, FIG. 1, “Training Data 106”; col. 5, ll. 8-22, “in examples that utilize multiple types of sensors, the covariance model may provide observation covariance values trained for an associated sensor type or perception pipeline. For example, the covariance model may output different observation covariance values for observation data based on lidar sensor data that for the same values of observation data based on image sensor data or radar sensor data.”; col. 7, l. 66 – col. 8, l. 3, “FIG. 1 illustrates a pictorial flow diagram of an example process for training a covariance model and using the covariance model to generate an observation covariance matrix that may be used by a Kalman filter”; The “covariance model 120” is trained on LiDAR (fourth set of features), Vision (third set of features), and RADAR features.).
Liu discloses a multi-sensor fusion machine learning (ML) model of LiDAR and camera features, that “cameras capture rich semantic information, LiDARs provide accurate spatial information, while radars offer instant velocity estimation” (section I), that “LiDAR/radar features are typically in the 3D/bird’s-eye-view” (section 3.1), and that their “framework can be easily extended to support other types of sensors (such as radars and event-based cameras) and other 3D perception tasks (such as 3D object tracking and motion forecasting)” (section 4). Thus, Liu shows that it was known in the art before the effective filing date of the claimed invention to perform a fusion of camera, LiDAR, and radar features using a shared bird’s eye view (BEV) feature space to perform an image analysis task in a BEV representation of the area surrounding an autonomous vehicle, which is analogous to the claimed invention in that it is pertinent to the problem being solved by the claimed invention, increasing the sensor accuracy of a multi-sensor autonomous vehicle system. Das discloses a multi-sensor fusion ML model of camera, LiDAR, and radar features that updates covariances of the model to track objects in the area surrounding an autonomous vehicle based on new sensor observation data, where the tracked objects correspond to BEV regions of interest (ROIs) (See Das at col. 6, ll. 11-31). Thus, Das shows that it was known in the art before the effective filing date of the claimed invention to perform a fusion of camera, LiDAR, and RADAR features using a shared feature space and update covariance matrices trained on the multi-sensor data to track objects around a vehicle, which is analogous to the claimed invention in that it is pertinent to the problem being solved by the claimed invention, increasing the sensor accuracy of a multi-sensor autonomous vehicle system.
A person of ordinary skill in the art would have been motivated to modify the multi-sensor fusion model of Liu by adding a radar sensor as a third sensor and training a covariance model on camera, LiDAR, and radar features as described by Das to thereby fuse camera, LiDAR, and radar features in the BEV space and track objects by updating the covariance matrix calculation of the Kalman filter many times (e.g., hundreds) as the vehicle drives in an environment, the covariance matric indicating differences between observation data and training data. Based on the foregoing, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have made such modification according to known methods to yield the predictable results to have the benefit of detecting objects that camera and LiDAR features struggle to detect by using radar data to detect objects in adverse weather conditions (e.g., heavy fog) with radio waves (as opposed to light).

Regarding claim 2, Liu in view of Das teaches the method of claim 1, wherein the first non-image sensor is a LiDAR sensor (Liu, Figure 2, “LiDAR Feat. (in BEV)”) and the first non-image sensor data is point cloud data (Liu, Figure 2, “LiDAR Point Cloud”).

Regarding claim 4, Liu in view of Das teaches the method of claim 1, further comprising (Das, col. 5, ll. 21-50, “a type of sensor or format of data associated with the observation data 116 (e.g., lidar sensors, radar sensors, image sensors, etc.”): 
receiving fourth information indicative of a fifth set of BEV features of second non-image sensor data captured by a second non-image sensor (Receiving observation data from the radar of Liu in view of Das), wherein: 
the third information is indicative of differences between the BEV features of the training data and the first set of BEV features, the second set of BEV features, and the fifth set of BEV features, the BEV features of the training data further comprise a sixth set of BEV features associated with the second non-image sensor (The trained covariance model is based on all three sensor types), and 
the fused data combines the image data, the first non-image sensor data, and the second non-image sensor data based on the first information, the second information, the third information, and the fourth information (The data from all three sensors is fused in the BEV space as disclosed by Liu in view of Das).
The rationale for obviousness is the same as provided for claim 1.

Regarding claim 6, Liu in view of Das teaches the method of claim 1, wherein the third information comprises a first indication of a difference between a first matrix of values (A first covariance matrix of the Kalman filter generated according to FIG. 3 of Das as an object is tracked over a period of time) and a second matrix of values (A second, updated covariance matrix of the Kalman filter generated according to FIG. 3 of Das), the first matrix of values representing a covariance of the first set of BEV features and the second set of BEV features (Das, col. 20, ll. 36-43, “The Kalman filter 436 may operate to fuse the observation data of new object detections with track data of existing tracks based at least in part on an observation covariance matrix populated with observation covariance values output by the covariance model 434.”; Once the model is trained, new observation/sensor data is acquired to update the covariance matrix.), the second matrix of values representing a covariance of the third set of BEV features and the fourth set of BEV features of the training data (Das, FIG.3, “Receive Observation Covariances from ML model 308”, “Use Observation Covariances as Observation Covariance Matrix Values of Kalman Filter to Determine Output 310”; col. 16, ll. 17-21, “At 312, the example process 300 may comprise providing the output of the Kalman filter to the prediction component and planning component of an autonomous vehicle system. Then, the example process 300 may return to 306 to select and process the next object detection or track.”; The covariance model of Liu in view of Das is trained on all three sensors and outputs covariance values used for the Kalman filter’s covariance matrix at each new acquisition of multi-sensor data.).
The rationale for obviousness is the same as provided for claim 1.

Claims 9, 10, 12, and 14 substantially correspond to claims 1, 2, 4, and 6 by reciting an apparatus, comprising: a memory storing processor-readable code (Liu, Figure 4 shows results of BEVFusion and Table 5 describes 10 epochs of training. Training an ML model requires a memory to retain data between epochs and execute the algorithm(s) of the model.); and at least one processor (Liu, section 3.2, “RTX 3090 GPU”) coupled to the memory (graphics processing units (GPUs) are coupled to memory via a motherboard. GPUs are a type of processor.), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations corresponding to the steps of the methods of claims 1, 2, 4, and 6. 
The rationale for obviousness is the same as provided for claim 1.

Regarding claim 17, Liu teaches a method of training a model for multi-sensor fusion, comprising: 
receiving first information indicative of (Liu, Figure 2, receiving image data indicates corresponding camera BEV features will be derived.) a first set of BEV features of image data (Liu, Figure 2, camera BEV features.); 
receiving second information indicative of (Liu, Figure 2, receiving LiDAR data indicates corresponding LiDAR BEV features will be derived.) a second set of BEV features of non-image sensor data (Liu, Figure 2, LiDAR BEV features.); and
determining fused data (Liu, Figure 2, “Fused BEV Features”) that combines the image data and the non-image sensor data based on the first information and the second information (Liu, Figure 1, “BEVFusion unifies camera and LiDAR features in a shared BEV space instead of mapping one modality to the other.”), but does not explicitly teach that which is taught by Das.
Das teaches determining third information (how to update a first covariance matrix) indicative of differences (Updating the covariance matrix is indictive of differences between sensor features before and after the covariance is updated.) between a third set of features (Camera and LiDAR/radar training features for a first covariance matrix) and a fourth set of features (Camera and LiDAR/radar training features for a second covariance matrix as the update of the first covariance matrix), the third set of features comprises a first plurality of features of the first set of features and a second plurality of features of the second set of features (Camera and LiDAR/radar observation data generate the first covariance matrix), the fourth set of features comprises a third plurality of features of the first set of features (Camera observation data to generate the second covariance matrix) and a fourth plurality of features of the second set of features (Camera and LiDAR/Radar observation data to generate the second covariance matrix). 
The rationale for obviousness is the same as provided for claim 1.

Regarding claim 18, Liu in view of Das teaches the method of claim 17, wherein the non-image sensor data is captured by either a LiDAR sensor (Liu, Figure 2, “LiDAR Feat. (in BEV)”) or a RADAR sensor (Das, col. 5, ll. 21-50, “a type of sensor or format of data associated with the observation data 116 (e.g., lidar sensors, radar sensors, image sensors, etc.”).
The rationale for obviousness is the same as provided for claim 1.

Regarding claim 19, Liu in view of Das teaches the method of claim 17, wherein determining the third information (Das - Update to the covariance matrix) comprises determining a matrix of values representing a covariance of a plurality of pairs of BEV features (A first covariance matrix of the Kalman filter is generated according to FIG. 3 of Das; Das, col. 14, ll. 57-65, “process 300 may comprise receiving sensor data from one or more sensors. Sensor data may include lidar data, radar data, depth sensor data (time of flight, structured light, etc.), image data (e.g., still images, video images, etc.)”; Camera features and LiDAR features form pairs of features), 
wherein each pair of the plurality of pairs includes a first BEV feature of the first set of BEV features and a second BEV feature of the second set of BEV features (Das - The covariance matrix is generated for camera/vision, LiDAR, and radar sensors, as shown in FIG. 1, which includes pairs of features (e.g., camera/vision and LiDAR) corresponding to overlapping regions in the surrounding area of the vehicle.), and 
wherein the third information is based on the matrix of values (Das - The observation covariance matrix is generated to determine what information needs to be updated by the Kalman filter).
The rationale for obviousness is the same as provided for claim 1.

Regarding claim 22, Liu in view of Das teaches the method of claim 17, further comprising: 
receiving fourth information indicative of (Das, FIG.1, receiving radar data indicates radar features will be derived) a fifth set of BEV features of second non-image sensor data (Das, FIG. 1, radar data received from one of the sensors 132), wherein: 
the third set of BEV features (Camera and LiDAR/radar training features for the first covariance matrix) comprises the first plurality of BEV features of the first set of BEV features (Camera observation data), the second plurality of BEV features of the second set of BEV features (LiDAR observation data), and a fifth plurality of BEV features of the fifth set of BEV features (Das, FIG. 1 - Radar observation data; The covariance model is trained on observation data from each sensor. See col. 10, ll. 16-65), 
the fourth set of BEV features (Camera and LiDAR/radar training features for the second covariance matrix as the update of the first covariance matrix) comprises the third plurality of BEV features of the first set of BEV features (Camera observation data to generate the second covariance matrix), the fourth plurality of BEV features of the second set of BEV features (LiDAR observation data to generate the second covariance matrix), and a sixth plurality of BEV features of the fifth set of BEV features (Das, FIG. 1 - Radar observation data to generate the second covariance matrix), and 
the fused data combines the image data (Liu - camera data), the non-image sensor data (Liu - LiDAR data), and the second non-image sensor data (Das - radar data) based on the first information, the second information, the third information, and the fourth information (Liu, Figure 2, “BEVFusion extracts features from multi-modal inputs and converts them into a shared bird’s-eye view (BEV) space efficiently using view transformations. It fuses the unified BEV features with a fully-convolutional BEV encoder and supports different tasks with task-specific heads.”; pg. 4, “we train the entire model in an end-to-end manner.”; The fusion combines the multi-sensor features based on the model trained on each type of sensor.).
The rationale for obviousness is the same as provided for claim 1.

Regarding claim 23, Liu in view of Das teaches the method of claim 22, wherein the image data is captured by an image sensor of a camera (Liu, Figure 2, “Camera”), the non-image sensor data is captured by a LiDAR sensor (Liu, Figure 2, “LiDAR”), and the second non-image sensor data is captured by a RADAR sensor (Das, FIG .1, “RADAR”).
The rationale for obviousness is the same as provided for claim 1.

Claims 24-26, 29, and 30 substantially correspond to claims 17-19, 22, and 23 by reciting an apparatus, comprising: a memory storing processor-readable code (Liu, Figure 4 shows results of BEVFusion and Table 5 describes 10 epochs of training. Training an ML model requires a memory to retain data between epochs and execute the algorithm(s) of the model.); and at least one processor (Liu, section 3.2, “RTX 3090 GPU”) coupled to the memory (GPUs are coupled to memory via a motherboard.), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations corresponding to the steps of the methods of claims 17-19, 22, and 23. 
The rationale for obviousness is the same as provided for claim 1.

Claims 3, 5, 11, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Das and in further in view of U.S. Pat. Appl. Pub. No. 20220366176 to Rotker et al. (hereinafter “Rotker”).

Regarding claim 3, Liu in view of Das teaches the method of claim 1, wherein the first non-image sensor is a RADAR sensor (Das, col. 5, ll. 21-50, “a type of sensor or format of data associated with the observation data 116 (e.g., lidar sensors, radar sensors, image sensors, etc.”).
The rationale for obviousness is the same as provided for claim 1.
Liu in view of Das does not explicitly teach that which is taught by Rotker.
Rotker teaches the first non-image sensor data is point cloud data (Rotker, par. [0029], “a sensor fusion-based top-view 3D stixel representation for general obstacle detection in a vehicle. Features are extracted from the data that is obtained with two or more sensors (e.g., camera, radar system, lidar system), and then transformed to a top-view representation (i.e., bird's eye view). The transformed features are then fused together to represent the aggregated features from all sensors in a top-view representation. The transformation of information from each sensor to the top-view facilitates a sensor agnostic approach and allows any number of sensors to be fused.”; par. [0033], “the second sensor data may be a lidar point cloud … an additional sensor may be the radar system 130 and the … additional sensor data may be a radar point cloud.”; par. [0036], “fusing the top-view feature representations refers to performing a fusion of the first top-view feature representation (from block 215) and the second top-view feature representation (from block 220). Additional top-view feature representations may also be fused if available.”).
Liu in view of Das is analogous to the claimed invention for the same reasons provided above. Rotker discloses a multi-sensor fusion system for autonomous vehicles that transforms and then fuses image data from a camera, a point cloud from a LiDAR sensor, and a point cloud from a radar sensor into a BEV feature space. Thus, Rotker shows that it was known in the art before the effective filing date of the claimed invention to obtain radar data as point cloud data in a multi-sensor fusion of camera, LiDAR, and radar features, which is analogous to the claimed invention in that it is pertinent to the problem being solved by the claimed invention, increasing the sensor accuracy of a multi-sensor autonomous vehicle system.
A person of ordinary skill in the art would have been motivated to modify the radar sensor of the multi-sensor fusion model of Liu in view of Das to generate point cloud data as disclosed by Rotker, to thereby transform the point cloud data into a BEV feature space and fuse the BEV radar features with the camera and LiDAR BEV features. Based on the foregoing, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have made such modification according to known methods to yield the predictable results to have the benefit of detecting and localizing complex objects at high accuracy.

Regarding claim 5, Liu in view of Das teaches the method of claim 4, wherein: 
the first non-image sensor is a LiDAR sensor (Liu, Figure 2, “LiDAR Point Cloud”); 
the second non-image sensor is a RADAR sensor (Das, col. 5, ll. 21-50, “a type of sensor or format of data associated with the observation data 116 (e.g., lidar sensors, radar sensors, image sensors, etc.”); and
the first non-image sensor data is first point cloud data (Liu, Figure 2, “LiDAR Point Cloud”), but does not explicitly teach that which is taught by Rotker.
Rotker teaches the second non-image sensor data is second point cloud data (Rotker, pars. [0029], [0036], and [0033], which discloses “… additional sensor data may be a radar point cloud.”).
The rationale for obviousness is the same as provided for claim 3.

Claims 11 and 13 substantially correspond to claims 3 and 5 by reciting an apparatus, comprising: a memory storing processor-readable code (Liu, Figure 4 shows results of BEVFusion and Table 5 describes 10 epochs of training. Training an ML model requires a memory to retain data between epochs and execute the algorithm(s) of the model.); and at least one processor (Liu, section 3.2, “RTX 3090 GPU”) coupled to the memory (GPUs are coupled to memory via a motherboard.), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations corresponding to the steps of the methods of claims 3 and 5. 
The rationale for obviousness is the same as provided for claim 3.

Claims 7, 15, 20, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Das and in further in view of DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars to Drews et al. (hereinafter “Drews”).

Regarding claim 7, Liu in view of Das teaches the method of claim 6, wherein the third information comprises:
a second indication of a difference (Das – A second covariance matrix of the Kalman filter generated according to FIG. 3 of Das as an object is tracked over a period of time); and
a third indication of a difference (Das – A third covariance matrix of the Kalman filter generated according to FIG. 3 of Das as an object is tracked over a period of time).
The rationale for obviousness is the same as provided for claim 1.
Liu in view of Das does not explicitly teach that which is taught by Drews.
Drews teaches wherein the training data comprises a first domain descriptor value associated with the image sensor (Drews, section IV, “When training the fusion model, we freeze the FPNs, and train all other parts of the architecture …We use the same training loss for sensor-specific detectors and the fusion network.”; The training data includes camera domain detection results. Each head has two descriptor values as shown in Fig. 1 (box and class). The output of the Camera Head includes a first domain descriptor value.), a second domain descriptor value associated with the first non-image sensor (The output of the LiDAR Head includes a second domain descriptor value.), 
a third domain descriptor value associated with a first BEV feature of the first set of BEV features (The output of the Camera Head changes over multiple sensor acquisitions.), and 
a fourth domain descriptor value associated with a second BEV feature of the second set of BEV features (The output of the LiDAR Head changes over multiple sensor acquisitions.).
Liu in view of Das is analogous to the claimed invention for the same reasons provided above. Drews discloses a multi-sensor fusion system for autonomous vehicles that transforms and then fuses data from a camera, a LiDAR sensor, and a radar sensor into a shared/unified BEV feature space/representation. Drews further discloses an Architecture in FIG. 1 that uses sensor-specific heads and a fusion head that operates on the fused features to detect objects based on data from each sensor. Thus, Drews shows that it was known in the art before the effective filing date of the claimed invention to used sensor(domain)-specific heads in combination with a head that operates on the fused BEV features, which is analogous to the claimed invention in that it is pertinent to the problem being solved by the claimed invention, increasing the sensor accuracy of a multi-sensor autonomous vehicle system.
A person of ordinary skill in the art would have been motivated to modify the architecture of Liu in view of Das by adding sensor(domain)-specific heads to the output of each sensor’s feature extractor as disclosed by Drews, to thereby train the covariance model using sensor-specific detections and fused feature detections, where the covariance matrix is updated repeatedly as the vehicle encounters new features in its environment, the updates to the covariance model being indicative of differences between the outputs of the heads and the corresponding data used to train the model. Based on the foregoing, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have made such modification according to known methods to yield the predictable results to have the benefit of training each sensor according to its strengths to better inform its contribution to the fused feature detection.

Claim 15 substantially corresponds to claim 7 by reciting an apparatus, comprising: a memory storing processor-readable code (Liu, Figure 4 shows results of BEVFusion and Table 5 describes 10 epochs of training. Training an ML model requires a memory to retain data between epochs and execute the algorithm(s) of the model.); and at least one processor (Liu, section 3.2, “RTX 3090 GPU”) coupled to the memory (GPUs are coupled to memory via a motherboard.), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations corresponding to the steps of the method of claim 7. 
The rationale for obviousness is the same as provided for claim 7.

Regarding claim 20, Liu in view of Das teaches the method of claim 19, wherein determining the third information comprises:
determining indications of differences between camera observation features associated with the third set of BEV features (Das - Camera and LiDAR/radar training features for the first covariance matrix) and LiDAR observation features associated with the fourth set of BEV features, wherein the third information is based on the indications of differences (Das - Camera and LiDAR/radar training features for the second covariance matrix as the update of the first covariance matrix).
The rationale for obviousness is the same as provided for claim 1.
Liu in view of Das does not explicitly teach that which is taught by Drews.
Drews teaches determining a first plurality of domain descriptors associated with the third set of features (Drews, Fig. 1 – the output of the “Camera Head”); and 
determining a second plurality of domain descriptors associated with the fourth set of BEV features (Drews, Fig. 1 – the output of the “Lidar Head”).
The rationale for obviousness is the same as provided for claim 7.

Claim 27 substantially corresponds to claim 20 by reciting an apparatus, comprising: a memory storing processor-readable code (Liu, Figure 4 shows results of BEVFusion and Table 5 describes 10 epochs of training. Training an ML model requires a memory to retain data between epochs and execute the algorithm(s) of the model.); and at least one processor (Liu, section 3.2, “RTX 3090 GPU”) coupled to the memory (GPUs are coupled to memory via a motherboard.), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations corresponding to the steps of the method of claim 20. 
The rationale for obviousness is the same as provided for claim 7.

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Das, in view of Drews, and in further view of U.S. Pat. No. 11,636,348 to Tang et al. (hereinafter “Tang”).

Regarding claim 8, Liu in view of Das and in further view of Drews teaches the method of claim 7, but does not explicitly teach that which is taught by Tang.
Tang teaches wherein differences between the features of training data and a first set of features and a second set of features are indicative of covariate shift between the first set of features and a third set of features, between the second set of features and a fourth set of features, or both (Tang, col. 10, ll. 35-57, “A model trained initially at a centralized training resource and associated parameters may be received at the autonomous vehicle in the depicted embodiment, and adapted to a local environment within which the vehicle operates, using some combination of components 110a, 111a and 112a, together with sensor data 113b collected from sensors.”; col. 10, ll. 58-61, “A wide variety of sensors may be employed in the depicted embodiment, including video cameras, radar devices, LIDAR (light detection and ranging) devices and the like.”; col. 10, l. 61 – col. 11, l. 30, “All of these sensors may capture and provide raw sensor data to respective sensor data processing pipelines implemented by the on-vehicle computing devices to make perception decisions, such as detecting, classifying, or tracking road objects. Such data may be used for local adaptive training of neural network models in at least some embodiments.”).
Liu in view of Das and in further view of Drews is analogous to the claimed invention for the same reasons provided above. Tang discloses local adaptation of a machine learning model deployed in an autonomous vehicle including camera, LiDAR, and radar sensors, where for each sensor pipeline, a shift in distribution (difference) of a sensor’s input/observation data in comparison to the data used to train the model indicates a covariate shift and a need to modify the sensor data and/or dynamically gate the sensor data using an auxiliary network. Thus, Tang shows that it was known in the art before the effective filing date of the claimed invention to determine covariate shift of sensors in a multi-sensor system and locally adapt a multi-sensor machine learning model to mitigate the covariance shift, which is analogous to the claimed invention in that it is pertinent to the problem being solved by the claimed invention, increasing the sensor accuracy of a multi-sensor autonomous vehicle system.
A person of ordinary skill in the art would have been motivated to modify the architecture of Liu in view of Das and in further view of Drews to incorporate a neural network implemented at the autonomous vehicle to locally adapt the features of each sensor as disclosed by Tang, to thereby “change model parameters depending on the statistics of the environment” (col. 17, ll. 1-15) when the feature distribution of a sensor is shifted from a distribution of corresponding features used to train the model, thereby indicating a covariate shift between observation sensor data and training sensor data. Based on the foregoing, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have made such modification according to known methods to yield the predictable results to have the benefit of improving the model’s performance by adapting the deployed model to its local environment.

Claim 16 substantially corresponds to claim 8 by reciting an apparatus, comprising: a memory storing processor-readable code (Liu, Figure 4 shows results of BEVFusion and Table 5 describes 10 epochs of training. Training an ML model requires a memory to retain data between epochs and execute the algorithm(s) of the model.); and at least one processor (Liu, section 3.2, “RTX 3090 GPU”) coupled to the memory (GPUs are coupled to memory via a motherboard.), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations corresponding to the steps of the method of claim 8. 
The rationale for obviousness is the same as provided for claim 8.

Allowable Subject Matter
Claims 21 and 28 are each objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
FUTR3D: A Unified Sensor Fusion Framework for 3D Detection to Chen et al. is pertinent because it discloses a “Modality-Agnostic Feature Sampler” in Figure 2 that fuses Camera, LiDAR, and Radar features modality agnostic sampler in Figure 2. The arrangement of the sampler and the feature extraction of each sensor is in a similar arrangement to the architecture in applicant’s FIG. 4B.
Dynamic Task Prioritization for Multitask Learning to Guo et al. discloses training a multitask model that adjust the amount of learning based on how easy or hard the given task is, which is pertinent to the thresholds described in paragraph 91 of applicant’s disclosure because those thresholds are also used to adjust learning based on task difficulty.
U.S. Pat. Appl. Pub. No. 20230294687 (filed 14 Feb. 2022) to Philbin et al. is pertinent to applicant’s disclosure because FIG. 2 shows an architecture with a shared BEV space of camera, Lidar, and radar features and a plurality of heads, which are similar components of applicant’s architecture in applicant’s FIG. 4B. 
X-Align++: Cross-Modal Cross-View Alignment for Bird’s-Eye-View Segmentation (published 6 Jun., 2023) to Borse et al. is considered pertinent to applicant’s disclosure because its authors include two inventors of the instant applicant and it describes BEV segmentation using a fusion of camera and LiDAR features.
X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection (published 3 Mar. 2023) to Klingner et al. is considered pertinent because its authors include two inventors of the instant applicant and it describes fusion of LiDAR BEV features derived from a LiDAR point cloud and camera BEV features in Figure 2 and generalizing to radar in section 4.5. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN P POTTS whose telephone number is (571)272-6351. The examiner can normally be reached M-F, 9am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at 571-272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/RYAN P POTTS/Examiner, Art Unit 2672



/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672
Read full office action
Prosecution Timeline

Jun 09, 2023
Application Filed
Oct 17, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/209,389
Patent 12591966
METHOD AND APPARATUS FOR ANALYZING BLOOD VESSEL BASED ON MACHINE LEARNING MODEL
2y 5m to grant Granted Mar 31, 2026
18/262,703
Patent 12560734
METHOD AND SYSTEM FOR PROCESSING SEISMIC IMAGES TO OBTAIN A REFERENCE RGT SURFACE OF A GEOLOGICAL FORMATION
2y 5m to grant Granted Feb 24, 2026
17/923,319
Patent 12555259
PRODUCT IDENTIFICATION APPARATUS, PRODUCT IDENTIFICATION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM
2y 5m to grant Granted Feb 17, 2026
18/045,772
Patent 12548658
Systems and Methods for Scalable Mapping of Brain Dynamics
2y 5m to grant Granted Feb 10, 2026
18/249,630
Patent 12538743
WARPAGE AMOUNT ESTIMATION APPARATUS AND WARPAGE AMOUNT ESTIMATION METHOD
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
80%
Grant Probability
99%
With Interview (+36.8%)
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 235 resolved cases by this examiner. Grant probability derived from career allow rate.