Last updated: April 19, 2026
Application No. 18/416,641
EARLY FUSION OF CAMERA AND RADAR FRAMES

Final Rejection §103§DP
Filed
Jan 18, 2024
Examiner
AN, IG TAI
Art Unit
3662
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Qualcomm Incorporated
OA Round
2 (Final)
This examiner grants 56% of cases after interview

— +26.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 523 resolved cases, 2023–2026
Examiner Intelligence

AN, IG TAI View full profile →
Grants 56% of resolved cases
Career Allow Rate
292 granted / 523 resolved
+3.8% vs TC avg
Strong +26% interview lift
Without
With
+26.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
32 currently pending
Career history
555
Total Applications
across all art units
Statute-Specific Performance

§101
19.3%
-20.7% vs TC avg
§103
49.8%
+9.8% vs TC avg
§102
19.0%
-21.0% vs TC avg
§112
10.2%
-29.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 523 resolved cases
Office Action

§103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Summary
The Amendment filed on 25 September 2025 has been acknowledged. 
Claims 1, 12, 16, and 27 are amended. 
Currently, claims 1 – 30 are pending and considered as set forth.

Terminal Disclaimer
The terminal disclaimer filed on 25 September 2025 was approved on 2 October 2025 and it is hereby acknowledged.
The Examiner notes that the Terminal disclaimer with amendment overcomes the statutory double patenting rejection from the previous office action mailed on 8 July 2025.

Examiner’s Note
The Examiner notes that due to the amendment, the claims 1 – 30 are no longer allowed and rejected as below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 8-9, 16-19, and 23-24 are rejected under 35 U.S.C. 103 as being obvious over Steinmeyer et al. (US 2021/0158544, based on foreign priority to DE 10 2018 205 879.2, filed 18 Apr 18), herein “Steinmeyer”, in view of Ozdemir et al. (US 2019/0122073, filed 23 Oct 17), herein “Ozdemir”.
Claims 12-13 and 27-28 are rejected under 35 U.S.C. 103 as being obvious over Steinmeyer in view of Official Notice (as evidenced by one or more of (a) Wang et al. (NPL: “A Folded Neural Network Autoencoder for Dimensionality Reduction”, published in 2012), herein “Wang”, (b) Badrinarayanan et al. (NPL: “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation”, published December 2017), herein “Badrinarayanan”, and/or (c) obvious-to-try design choice).
Additionally/alternatively, Claims 12-13 and 27-28 are rejected under 35 U.S.C. 103 as being obvious over Steinmeyer in view of Official Notice, further in view of one or more of (a) Wang, (b) Badrinarayanan, and/or (c) obvious-to-try design choice.

Regarding Claims 1, 12, 16, and 27 (all independent) and Claims 2 and 17 (dependent), Steinmeyer discloses:
a method of performing early fusion of camera and radar frames to perform object detection in one or more spatial domains performed by an on-board computer of a host vehicle, comprising: (per Claims 1 and 12) / An on-board computer of a host vehicle, comprising: at least one processor configured to: (per Claims 16 and 27) (“the method according to the preceding discussion or a device according to the preceding discussion is used in a vehicle, in particular a motor vehicle”, Paragraph 54, “motor vehicle 60 has a camera 61 for capturing camera images and a radar sensor 62 for capturing 3D-checkpoints. Furthermore, the motor vehicle 60 has a device 40 for processing sensor data, by means of which the camera images are merged with the 3D-checkpoints to form data of a virtual sensor”, Paragraph 67, “Input variables for the sensor fusion by a data fusion circuit 43 are 3D-checkpoints of a 3D-sensor (radar 62) and camera images from a camera 61”, Paragraph 70)
receive/-ing, from a camera sensor of the host vehicle, a plurality of camera frames (per Claims 1, 12, 16, and 27) (“camera images are recorded 20”, Paragraph 59, “camera images from a camera 61”, Paragraph 70); 
receive/-ing, from a radar sensor of the host vehicle, a plurality of radar frames (per Claims 1, 12, 16, and 27) (“3D-checkpoints are also recorded by at least one 3D-sensor 21”, Paragraph 59, “3D-checkpoints of a 3D-sensor (radar 62)”, Paragraph 70); 
perform/-ing a camera feature extraction process on a first camera frame of the plurality of camera frames to generate a first camera feature map (“camera 61 can already process the camera images, for example to determine the optical flow, to classify image points as part of a segmentation or to extract points from the camera images using SfM algorithms (SfM: Structure from Motion). This processing of the camera images can, however, also be carried out by the data fusion circuit 43”, Paragraph 70, “The 3D-checkpoint usually has an angular uncertainty, e.g. by beam expansion. Therefore, all pixels in the vicinity of the uncertainty are for example taken into account in order to add attributes from the image to the 3D-checkpoint. The attributes can be, for example, the averaged optical flow o (o.sub.x, o.sub.y) or the position in the image space p (p.sub.x, p.sub.y”, Paragraph 80, “optical flow o and pixel position p are specified in the image space, while the velocities v.sub.x,y,z are determined in the camera coordinate system. In addition to the measurement attributes, a camera constant “K” is required that takes into account the image distance “b” (in m) and the resolution “D” (pixels per m) of the imaging system”, Paragraphs 87-88, “those pixels are identified which, according to the pixel shift, associate with the available 3D-measurement data, for example from…radar measurements. Since there are beam expansions during the measurements, several pixels are usually affected here. The associated pixels are expanded by additional dimensions and the measurement attributes are entered accordingly. Possible attributes are, for example: in addition to the distance measurement from…radar…the Doppler speed of the radar, the reflectivity or the radar cross-section or even confidence. The synchronized camera image that has been expanded to include measurement attributes…”, Paragraphs 92-93); (per Claims 1 and 16); 
perform/-ing a radar feature extraction process on a first radar frame of the plurality of radar frames to generate a first radar feature map (“detecting 3D-checkpoints by at least one 3D-sensor”, Paragraphs 26/30), wherein the first radar frame corresponds in time to the first camera frame; (per Claims 1 and 16) (“the image is used, which regarding the instant of time of recording t is closest to the instant of time of measurement of the 3D-sensor”, Paragraph 73)
convert/-ing the first camera feature map, the first radar feature map, or both to a common spatial domain (per Claims 1 and 16); concatenate/-ing the first radar feature map and the first camera feature map to generate a first concatenated feature map in the common spatial domain; and (per Claims 1 and 16) / apply/-ing an encoder-decoder network (“at least one of the camera images can be segmented 22, e.g. through a neural network”, Paragraph 59, “The device 40 also…has a segmenting device 42 for segmenting at least one camera image and, respectively, a camera image I1, I2 enriched with further measurements, e.g. by means of a neural network”, Paragraph 61, “The segmentation is for example carried out by a neural network”, Paragraph 52, “Due to recent advances in image processing using “[Deep] Convolutional Neural Networks (CNN)” ([deep] folded neural networks), pixel-accurate segmentation of images is possible with the appropriate computing power. If at least one of the camera images is segmented by such a neural network, the 3D-checkpoints can also be expanded to include the class(es) resulting from the segmentation and the associated identifier”, Paragraph 81, “extended 3D-checkpoints or clusters from the virtual sensor and, respectively, the clusters are then transferred to an accumulating sensor data fusion, which enables filtering over time. With some current neural networks it is possible that these form so-called instances. As an example, there is a row of parking lots with stationary vehicles that are recorded angularly by the camera. Newer methods in such a case can separate the different vehicles despite the overlap in the image. If the neural network forms instances, these can of course be used as cluster information in the accumulating sensor data fusion”, Paragraph 83, “suitable algorithms can be used to also determine the changes in the individual image segments over time, which in particular can be implemented efficiently”, Paragraph 84, “FIG. 8 schematically shows the concept of a virtual sensor with a classifier. The concept largely corresponds to the concept known from FIG. 7. At present, folded neural networks are often used for image classification. If possible, these require locally associable data that is naturally present in an image. Neighboring pixels often belong to the same object and describe the neighborhood in the polar image space”, Paragraph 90, “The synchronized camera image that has been expanded to include measurement attributes is now classified with a classifier or segmenting device 42, for example with a folded neural network”, Paragraph 93; also see third obviousness discussion below) on the first camera frame to generate a first camera feature map in a spatial domain (see citations below) of the first radar frame (see first obviousness discussion below) (per Claims 12 and 27); combine/-ing the first radar frame and the first camera feature map to generate a first combined feature map in the spatial domain (“By using the calculated optical flow, the 3D-checkpoints are synchronized with the camera images”, Paragraph 39, “With help of the optical flow, the entire camera image can be converted into the instant of time of the measurement of the 3D-sensor. Subsequently, 3D-checkpoints can be projected from the depth-measuring sensor into the camera image. For this purpose, the pixels can be treated, for example, as infinitely long rays that intersect with the 3D-checkpoints”, Paragraph 43, “The coordinate systems of the camera, 3D-sensor, image and ego vehicle are closely linked. Since the ego vehicle moves relative to the world coordinate system, the following four transformations are defined between the coordinate systems: T.sub.V←W(t) is the transformation that transforms a 3D-point in the world coordinate system into the 3D-coordinate system of the ego vehicle. This transformation depends on the time t, since the ego vehicle moves over time. T.sub.S←V is the transformation that transforms a 3D-point in the 3D-coordinate system of the ego vehicle into the 3D-coordinate system of the 3D-sensor. T.sub.C←V is the transformation that transforms a 3D-point in the 3D-coordinate system of the ego vehicle into the 3D-coordinate system of the camera. P.sub.I←C is the transformation that projects a 3D-point in the 3D-coordinate system of the camera into the 2D image coordinate system”, Paragraphs 101-105, “Equation (16) establishes a relationship between the measurements from the camera and the measurements from the 3D-sensor. If the world point is well-defined in world coordinates, the times to, t.sub.1 and t.sub.2 as well as the image coordinates in two camera images and the measurement of the 3D-sensor are known, then equation (16) establishes a complete relationship, i.e. there are no unknown quantities”, Paragraph 122) of a first radar frame (see first obviousness discussion below), wherein the first radar frame corresponds in time to the first camera frame; and (per Claims 12 and 27) (“the image is used, which regarding the instant of time of recording t is closest to the instant of time of measurement of the 3D-sensor”, Paragraph 73)
perform/-ing object detection on the first concatenated feature map to detect one or more objects in the first concatenated feature map without performing object detection/performance of object detection on the first camera feature map or the first radar feature map (per Claims 1 and 16) / perform/-ing object detection on the first combined feature map to detect one or more objects in the first combined feature map without performing object detection/performance of object detection on the first camera feature map or the first radar frame (per Claims 12 and 27) (“During the subsequent object tracking the resulting data from the virtual sensor may be clustered into object hypotheses with high quality, as they contain extensive information in order to separate different classes. The solution according to the present teachings prevents the object hypotheses of different sensors with systematic errors per time from being merged in a common model, with association errors easily occurring. This enables a robust perception of the surroundings, which allows highly automated and autonomous driving functions”, Paragraph 35, “an algorithm for object tracking is applied to the data of the virtual sensor. This algorithm e.g. performs an accumulating sensor data fusion. The accumulating sensor data fusion enables filtering of the data over time and therefore reliable object tracking”, Paragraph 53, “3D-checkpoints can be expanded to include attributes from at least one of the camera images. The resulting data are finally output 24 for further processing. During further processing, for example, an algorithm for object tracking can be applied to the data of the virtual sensor. The algorithm can, e.g., perform an accumulating sensor data fusion. In addition, the data from the virtual sensor can be segmented”, Paragraph 59, “object tracker 44 can carry out object tracking on basis of the data VS from the virtual sensor. The object tracker 44 can e.g. perform an accumulating sensor data fusion. However, this can also be carried out outside the device 40. The data VS of the virtual sensor or the results of the object tracking or segmentation are output via an output 47 of the device 40 for further processing”, Paragraph 62), and
estimate/-ing a width, length, or both of the one or more objects based on a bounding box in the first camera frame encapsulating each of the one or more objects (per Claims 1 and 16) (“different points of an object are detected by the sensors then, which may be far away from one another but are assigned to the same object”, Paragraph 10, “During the subsequent object tracking the resulting data from the virtual sensor may be clustered into object hypotheses with high quality, as they contain extensive information in order to separate different classes…This enables a robust perception of the surroundings, which allows highly automated and autonomous driving functions”, Paragraph 35, “before segmentation measurement points of the 3D-sensor are projected precisely into the image by means of the optical flow and their measurement attributes are stored in further dimensions. This enables cross-sensor segmentation”, Paragraph 51, “all optical flow vectors can be rendered using line algorithms in such a way that the bounding box of the vector is specified in each pixel. If several flow vectors overlap in a pixel, the bounding box is enlarged accordingly so that both vectors are contained in the box. The subsequent search algorithm now only has to take into account that bounding box in which the searched pixel must be contained”, Paragraph 78, “points from the virtual sensor can be clustered into object hypotheses with high quality, as they contain extensive information in order to separate classes. In particular, these are the class information and the identifier from the segmentation, as well as the Cartesian velocity vector, which e.g. is useful with overlapping objects of the same class”, Paragraph 82; also see second obviousness discussion below) / wherein the common spatial domain is a spatial domain of the radar sensor (per Claims 12 and 27 and dependent Claims 2 and 17, dependent upon independent Claims 1 and 16, respectively) (see first obviousness discussion below).
Firstly, Steinmeyer remains silent in that the common static frame of reference (i.e. common spatial domain) is specifically (a) a spatial domain of the radar sensor, as opposed to (b) a spatial domain of the camera sensor, or (c) a common spatial domain different from both a spatial domain of the radar sensor and a spatial domain of the camera sensor.  Relating to (c), since the sensor fusion system may have other sensors besides just the camera sensor and the radar sensor, it may thus be preferable to use a common spatial domain that is a spatial domain of another sensor (as obvious examples, one of ordinary skill in the art at the time of filing may prefer to try using a spatial domain of whichever sensor is closest to the center/middle of all the sensors, or may prefer using a spatial domain of whichever sensor collects the most amount of data, for the sake of trying to minimize processing requirements to localize all the sensory data together).  Additionally relating to (c), it may alternatively be preferable to use a common spatial domain that is based on a world spatial domain, the vehicle spatial domain, or perhaps some sort of average of all the sensors utilized in the fusion but not one in particular (although this may increase processing efforts because all sensor data would need to be transformed rather than eliminating one sensor’s data from needing to be transformed by choosing that sensor’s spatial domain as the common spatial domain).  Therefore, it appears to only be an obvious matter of design choice (and/or an obvious-to-try option out of a finite number of reasonable possibilities) as to which common spatial domain to utilize, and if it coincides with one of the sensor’s spatial domain, which sensor to utilize for that purpose.  In support of this obviousness rationale, the Applicant’s specification does not appear to particularly point out any unexpected result or particular advantage by specifically having the common spatial domain be a spatial domain of the radar sensor (as opposed to all other obvious-to-try options out of a finite number of reasonable alternatives for the common spatial domain, based on all of the sensors involved in the sensor fusion process and how processing-intensive the data transformations may be for each one).  As such, it would have been obvious to one of ordinary skill in the art at the time of filing to have modified the disclosure of Steinmeyer to specifically use a spatial domain of the radar sensor as the common spatial domain, as is merely one obvious-to-try option out of a finite number of reasonable options (and merely a matter of obvious design choice), in order to eliminate at least one set of sensory data (in this case, the radar sensor’s data) from having to be transformed in order to merge with the other sensory data (in this case, the camera sensor’s data).
Secondly, and relating to only independent Claims 1 and 16 (and dependent Claims 2-5, 7-11, 17-20, and 22-26), which now require “estimate/-ing a width, length, or both of the one or more objects based on a bounding box in the first camera frame encapsulating each of the one or more objects”; while Steinmeyer discusses clustering, classifying, grouping, and labeling objects during segmentation, assigning measurement/dimensional attributes to these objects, as well as the utilization of bounding boxes, each as already cited to above, it is not entirely clear from Steinmeyer if the bounding boxes mentioned actually encapsulate each of the one or more objects.  However, specifically using bounding boxes to encapsulate classified objects as part of the segmentation process is taught by Ozdemir (“These steps include:…(b) segmentation for differentiation of different structures in the image…(c) structure/ROI (Region of Interest) analysis, in which a detected region is analyzed individually for special characteristics, which can include compactness, form, size and location”, Paragraph 5, “An example of segmentation results is shown in FIGS. 6-8. FIG. 6 demonstrates an example of segmentation on a test image, showing the output of a traditional network where the final thresholded binary image (640 below) has one true positive and one false positive nodule candidate. As shown, the first, leftmost image frame 610 shows a preprocessed (step 310 in FIG. 3), 2D representation of a slice in which an organ (e.g. the patient's lung) 612. The interior wall of the lung 612 includes a small inward protuberance, highlighted by a box 614…In this example, there is one true positive enclosed with a box 642 and one false positive enclosed with a dashed box 644”, Paragraph 50, “According to FIGS. 7 and 8, the same test image (610 in FIG. 6) is passed through the processes of the illustrative Bayesian neural network herein, and the mean and variance segmentation outputs are presented. FIG. 7 particularly shows an image 700 characterizing the segmentation probability mean. A true positive of the nodule described above is enclosed with a box 710. From this image 700, it is clear that the false positive 644 (of FIG. 6) is no longer visible”, Paragraph 52).  It would have been obvious to one of ordinary skill in the art at the time of filing to have further modified the disclosure of Steinmeyer to include the estimation of the size of objects using bounding boxes that enclose them, as taught by Ozdemir, so that they may be further analyzed in further processing steps such as evaluation/classification steps that may make classification decisions based on size comparisons of the detected object and known sizes of candidate object classifications.
Thirdly, and relating to only independent Claims 12 and 27 (and dependent Claims 13-15 and 28-30), which mention the term “encoder-decoder network”, it should be noted that this term under Broadest Reasonable Interpretation (BRI) could be encompassing many different types of neural networks, and Office takes Official Notice to the following indisputably old and well known in the art details about neural networks that are relevant to this term: (a) an encoder-decoder is synonymous with the term autoencoder; (b) folded neural networks include folded autoencoders; (c) an encoder followed by a decoder may also be known as a semantic segmentation architecture; (d) in semantic segmentation the encoder is generally a pre-trained classification network while the decoder generally projects features learned by the encoder onto a higher resolution pixel space to get dense classification; (e) many semantic segmentation models are convolutional neural networks (meaning there are no fully connected layers); and (f) the SegNet model used for semantic segmentation is a common convolutional encoder-decoder neural network.  Steinmeyer never specifically utilizes the term “encoder-decoder network” while more generally discussing the use of various types of neural networks and both suitable algorithms and neural networks in general based on which tasks are to be accomplished with them, but Steinmeyer (in the citations provided above with regards to the encoder-decoder network limitation of Claims 12 and 27) clearly covers the use of convolutional neural networks, folded neural networks, pixel-accurate segmentation of image data, and classifying/clustering/applying identifiers/using instances.  Based on at least the above provided citations present within Steinmeyer and the Official Notice elements described above, it would have been obvious to one of ordinary skill in the art at the time of filing to achieve the functionality of the more generally described convolutional/folded neural network terms in Steinmeyer by specifically using an “encoder-decoder network”, as Office takes Official Notice that this kind of neural network has been commonly and effectively used for image processing in the past.  As merely evidence to the above Official Notice, (a) evidencing reference Wang describes a folded neural network “autoencoder” for image reconstruction (“A Folded Neural Network Autoencoder for Dimensionality Reduction”, Title, “The implementation-level framework of folded autoencoder is illustrated in Fig.5. The black solid line represents the direction that data propagates and red dashed line represents the direction that error propagates. When autoencoder works in “encoding” mode, original image data propagates from input layer to code layer, where original image is reduced to low-dimensional codes. Whereas in “decoding” mode data “flows” in the reverse direction: low-dimensional codes are expanded layer by layer and eventually mapped to a high-dimensional space with the same dimensionality of original data, where reconstructed image is obtained”, Paragraph 3 of 3. Folded autoencoder; also see Abstract), (b) evidencing reference Badrinarayanan describes a deep convolutional encoder-decoder neural network architecture for image segmentation (“SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation”, Title, “Semantic segmentation has a wide array of applications ranging from scene understanding, inferring support relationships among objects to autonomous driving. Early methods that relied on low-level vision cues have fast been superseded by popular machine learning algorithms. In particular, deep learning has seen huge success lately in handwritten digit recognition, speech, categorising whole images and detecting objects in images. Now there is an active interest for semantic pixel-wise labelling. Our motivation to design SegNet arises from this need to map low resolution features to input resolution for pixel-wise classification. This mapping must produce features which are useful for accurate boundary localization”, Introduction Paragraph 1; also see Abstract), and (c) there are only a finite number of currently known and effective neural network architectures for image processing so one of ordinary skill in the art would consider the specific encoder-decoder network for this purpose as obvious-to-try, particularly in view of this type of encoder-decoder network only being mentioned for some, but not all, of the claims presented herein, leading one of ordinary skill in the art to understand that the specific type of neural architecture used for each function is not actually critical to the invention itself.  Finally, in the case Applicant traverses the Official Notice presented above as evidenced by Wang, Badrinarayanan, and/or an obvious-to-try obviousness-based rationale, it would also be possible to consider these claims rejected using one or more of these as modifying instead of evidencing in an additional/alternative 35 USC 103 rejection that actually modifies the neural network/-s of Steinmeyer to specifically be an encoder-decoder network for the same reasons already presented above.

Regarding Claims 3 and 18, Steinmeyer as modified by Ozdemir renders obvious the method of Claim 1 and the on-board computer of Claim 16, respectively, and Steinmeyer further discloses that converting the first camera feature map, the first radar feature map, or both to the common spatial domain comprises converting the first camera feature map to the common spatial domain, and converting the first camera feature map to the common spatial domain comprises performing an explicit inverse perspective mapping transformation on the first camera feature map (“By using the calculated optical flow, the 3D-checkpoints are synchronized with the camera images”, Paragraph 39, “With help of the optical flow, the entire camera image can be converted into the instant of time of the measurement of the 3D-sensor. Subsequently, 3D-checkpoints can be projected from the depth-measuring sensor into the camera image. For this purpose, the pixels can be treated, for example, as infinitely long rays that intersect with the 3D-checkpoints”, Paragraph 43, “The coordinate systems of the camera, 3D-sensor, image and ego vehicle are closely linked. Since the ego vehicle moves relative to the world coordinate system, the following four transformations are defined between the coordinate systems: T.sub.V←W(t) is the transformation that transforms a 3D-point in the world coordinate system into the 3D-coordinate system of the ego vehicle. This transformation depends on the time t, since the ego vehicle moves over time. T.sub.S←V is the transformation that transforms a 3D-point in the 3D-coordinate system of the ego vehicle into the 3D-coordinate system of the 3D-sensor. T.sub.C←V is the transformation that transforms a 3D-point in the 3D-coordinate system of the ego vehicle into the 3D-coordinate system of the camera. P.sub.I←C is the transformation that projects a 3D-point in the 3D-coordinate system of the camera into the 2D image coordinate system”, Paragraphs 101-105, “Equation (16) establishes a relationship between the measurements from the camera and the measurements from the 3D-sensor. If the world point is well-defined in world coordinates, the times to, t.sub.1 and t.sub.2 as well as the image coordinates in two camera images and the measurement of the 3D-sensor are known, then equation (16) establishes a complete relationship, i.e. there are no unknown quantities”, Paragraph 122).

Regarding Claims 4 and 19, Steinmeyer as modified by Ozdemir renders obvious the method of Claim 1 and the on-board computer of Claim 16, respectively, and Steinmeyer further discloses that converting the first camera feature map, the first radar feature map, or both to the common spatial domain comprises converting the first camera feature map to the common spatial domain, and converting the first camera feature map to the common spatial domain occurs during performing the camera feature extraction process (“In a first step, camera images are recorded 20. 3D-checkpoints are also recorded by at least one 3D-sensor 21…The camera images are then fused 23 with the 3D-checkpoints by a data fusion circuit to form data of a virtual sensor. Here, an optical flow is determined which is used to synchronize image points and 3D-checkpoints. The 3D-checkpoints can be expanded to include attributes from at least one of the camera images”, Paragraph 59, also see the method of Fig. 2 (specifically, step 23)).

Regarding Claims 8 and 23, Steinmeyer as modified by Ozdemir renders obvious the method of Claim 1 and the on-board computer of Claim 16, respectively, but Steinmeyer generally remains silent regarding specifically perform/-ing the camera feature extraction process on a second camera frame of the plurality of camera frames to generate a second camera feature map; perform/-ing the radar feature extraction process on a second radar frame of the plurality of radar frames to generate a second radar feature map; convert/-ing the second camera feature map to the common spatial domain to generate a converted camera feature map, the second radar feature map to the common spatial domain to generate a converted radar feature map, or both; and concatenate/-ing the converted second radar feature map, the converted second camera feature map, or both to generate a second concatenated feature map, wherein detecting the one or more objects is further based on the second concatenated feature map.  However, this is merely the exact duplication of the previously cited to steps of (per the prior art rejection of independent Claims 1 and 16 shown above): extracting features from a first camera frame of the plurality of camera frames to generate a first camera feature map, extracting features from a first radar frame of the plurality of radar frames to generate a first radar feature map, convert/-ing the first camera feature map and/or the first radar feature map to a common spatial domain; and concatenate/-ing the converted first radar feature map and the converted first camera feature map to generate a first concatenated feature map, wherein detecting the one or more objects is based on the first concatenated feature map.  As such, it is merely a matter of obvious design choice to duplicate an already known process (for example, repeating the process over time) in order to improve the accuracy of the feature extraction processes, especially when trying to detect one or more objects that may be moving over time or may be present in one or more feature maps at one point in time but missing in one or more feature maps at another point in time.

Regarding Claims 9 and 24, Steinmeyer as modified by Ozdemir renders obvious the method of Claim 1 and the on-board computer of Claim 16, respectively, and Steinmeyer further discloses/renders obvious that the radar sensor and the camera sensor are collocated in a shared housing in the host vehicle (i.e. the shared housing may be considered the body of the host vehicle itself as shown in Fig. 6, which clearly includes at least camera 61 and radar 62; regardless, collocating multiple sensors, even of different types, is undeniably known in the art and it would be merely a matter of obvious design choice to do so, as Applicant’s own specification describes collocation and non-collocation as equal options and has not provided any reasoning for one option being advantageous of the other (Although FIG. 1 illustrates an example in which the radar component and the camera component are collocated components in a shared housing, as will be appreciated, they may be separately housed in different locations within the vehicle 100”, Paragraph 29 of the Specification)).

Regarding Claims 13 and 28, Steinmeyer as modified by Official Notice (and as evidenced by or further modified by one or more of (a) Wang, (b) Badrinarayanan, and/or (c) obvious-to-try design choice) renders obvious the method of Claim 12 and the on-board computer of Claim 27, respectively, and Steinmeyer further discloses provide/-ing the first combined feature map to a neural network (“at least one of the camera images can be segmented 22, e.g. through a neural network”, Paragraph 59, “The device 40 also…has a segmenting device 42 for segmenting at least one camera image and, respectively, a camera image I1, I2 enriched with further measurements, e.g. by means of a neural network”, Paragraph 61, “Due to recent advances in image processing using “[Deep] Convolutional Neural Networks (CNN)” ([deep] folded neural networks), pixel-accurate segmentation of images is possible with the appropriate computing power. If at least one of the camera images is segmented by such a neural network, the 3D-checkpoints can also be expanded to include the class(es) resulting from the segmentation and the associated identifier”, Paragraph 81, “synchronized camera image that has been expanded to include measurement attributes is now classified with a classifier or segmenting device 42, for example with a folded neural network”, Paragraph 93).

Claims 5, 7, 10-11, 20, 22, 25-26, and 29-30 are rejected under 35 U.S.C. 103 as being obvious over Steinmeyer in view of Ozdemir, further in view of Rust (US 2018/0341263, filed 25 May 17).
Claims 14-15 and 29-30 are rejected under 35 U.S.C. 103 as being obvious over Steinmeyer in view of Official Notice (as evidenced by one or more of (a) Wang, (b) Badrinarayanan, and/or (c) obvious-to-try design choice), further in view of Rust.
Additionally/alternatively, Claims 14-15 and 29-30 are rejected under 35 U.S.C. 103 as being obvious over Steinmeyer in view of Official Notice, further in view of one or more of (a) Wang, (b) Badrinarayanan, and/or (c) obvious-to-try design choice, and further in view of Rust.

Regarding Claims 5 and 20, Steinmeyer as modified by Ozdemir renders obvious the method of Claim 1 and the on-board computer of Claim 16, respectively, and while Steinmeyer discusses the optical flow calculations between two camera images (“calculating an optical flow from at least a first camera image and a second camera image; and determining in at least one of the camera images and on the basis of the optical flow pixels to be assigned to one of the 3D-checkpoints at an instant of time of the measurement”, Paragraphs 37-38), Steinmeyer generally remains silent regarding, but Rust teaches hash/-ing a plurality of blocks of the first camera frame to identify one or more blocks that have not changed between a previous camera frame of the plurality of camera frames and the first camera frame; and copy/-ing feature map values of a second camera feature map of the previous camera frame to corresponding feature map values of the first feature map (“the static scene alignment module 414 is configured to register the point clouds 416 based on visual odometry methods, in which two frames are compared and the difference between them is minimized. Such methods are able to remove error from inertial sensors, as well as to build a high resolution local map”, Paragraph 67, “Clusters of moving data points 413a,b that correspond to the same object in real space are identified in the matching step 508. A registration or matching algorithm is run in step 508 to derive a spatial transformation from a reference cluster of moving data points 413a to a target cluster of moving data points 413b. The registration or matching algorithm may be an iterative closest point algorithm or a mesh matching algorithm in exemplary embodiments. The matching step 508 is carried out through the object matching module 418 and produces transformation data 420. The method 500 includes a step 510 of determining distance moved d.sub.1 . . . d.sub.n of each cluster identified as being corresponding in step 508. In particular, the transformation data 420 provides a spatial relationship between clusters of moving data points 413a,413b that have moved in the position aligned static scenes 416 constituting a static frame of reference. Such a spatial relationship allows a distance parameter d.sub.1 . . . d.sub.n to be derived in scalar or vector form. The step 510 of determining distance moved d.sub.1 . . . d.sub.n is carried out through the distance module 422”, Paragraphs 85-86).  It would have been obvious to one of ordinary skill in the art at the time of filing to have further modified the method and onboard computer of Steinmeyer to hash a plurality of blocks of the first camera frame to identify one or more blocks that have not changed between a previous camera frame of the plurality of camera frames and the first camera frame and to copy feature map values of a second camera feature map of the previous camera frame to corresponding feature map values of the first feature map, as taught by Rust, in order to reduce the processing required for determining significant features in the images taken by the camera, as would occur when certain matching features between two sequential images show zero relative change/movement (i.e. zero optical flow).

Regarding Claims 7 and 22, Steinmeyer as modified by Ozdemir renders obvious the method of Claim 1 and the on-board computer of Claim 16, respectively, and while Steinmeyer generally remains silent regarding, Rust teaches that the width, length, or both of the one or more objects is estimated based at least in part on a make, model, or both of the one or more objects (“Grouping can be based on similar shapes between recent point clouds 416. Grouping can be based on shapes in a predetermined obstacle set. For example, a member of this obstacle set could be a particular vehicle model which tend to look the same no matter where and when you see them”, Paragraph 69).  It would have been obvious to one of ordinary skill in the art at the time of filing to have further modified the method and onboard computer of Steinmeyer so that the width, length, or both of the one or more objects is estimated based at least in part on a make, model, or both of the one or more objects, as taught by Rust, in order to enable object classification besides just object tracking, and to potentially reduce processing needed if a make and/or model of an object can be identified, since the size and shape of that object can be looked up in a table if the make and/or model are already known regardless of the quality of the entire data received by a camera and/or a radar.

Regarding Claims 10-11, 14-15, 25-26, and 29-30, Steinmeyer as modified by Ozdemir renders obvious the method of Claim 1 (per Claim 10) and the on-board computer of Claim 16 (per Claim 25), and Steinmeyer as modified by Official Notice (and as evidenced by or further modified by one or more of (a) Wang, (b) Badrinarayanan, and/or (c) obvious-to-try design choice) renders obvious the method of Claim 12 (per Claim 14) and the on-board computer of Claim 27 (per Claim 29), and while Steinmeyer discusses the use of virtual sensor data for autonomous driving operations (“The concept of a virtual sensor is introduced as part of a preprocessing step for the evaluation of sensor data, in particular in the context of object tracking. This merges the measurement data of the camera and 3D-sensors on an earlier measurement point level and thus abstracts the individual sensors. During the subsequent object tracking the resulting data from the virtual sensor may be clustered into object hypotheses with high quality, as they contain extensive information in order to separate different classes. The solution according to the present teachings prevents the object hypotheses of different sensors with systematic errors per time from being merged in a common model, with association errors easily occurring. This enables a robust perception of the surroundings, which allows highly automated and autonomous driving functions”, Paragraph 35), Steinmeyer generally remains silent regarding, but Rust teaches perform/-ing an autonomous driving operation based on detecting the one or more objects (per Claims 10 and 14) / trigger an autonomous driving operation based on detecting the one or more objects (per Claims 25 and 29) (“Exemplary uses of the velocity parameters by the autonomous driving system 70 include inference of the future motion of identified objects. Such inference may involve use of a Kalman filter that assumed a pre-determined movement model, or a generative model that has been trained on how similar looking obstacles have moved in the past e.g. pedestrians on this crosswalk tend to ignore the light etc. Based on the inferred future motion, the autonomous driving system 70 can generate one or more autonomous driving commands taking into account probable future motion of identified object”, Paragraph 78), wherein the autonomous driving operation is one or more of braking, accelerating, steering, adjusting a cruise control setting, or signaling (per Claims 11, 15, 26, and 30, dependent on Claims 10, 14, 25, and 29, respectively) (“The actuator system 30 includes one or more actuator devices 42a-42n that control one or more vehicle features such as, but not limited to, the propulsion system 20, the transmission system 22, the steering system 24, and the brake system 26”, Paragraph 39).  It would have been obvious to one of ordinary skill in the art at the time of filing to have further modified the method and onboard computer of Steinmeyer to trigger and/or perform an autonomous driving operation based on detecting the one or more objects, wherein the autonomous driving operation is one or more of braking, accelerating, steering, adjusting a cruise control setting, or signaling, as taught by Rust, in order to ensure the safety of the occupants in one or both of the host vehicle and/or the one or more objects, since it is possible that the host vehicle may otherwise collide with the detected one or more objects without one or more of these types of old and well known in the art autonomous interventions (that each relate to collision avoidance).  It should be further noted that regarding Claims 25 and 29, “triggering” an autonomous operation could merely be describing an intended use, as these claims do not have any limitations anywhere describing any one or more actuators that may later potentially execute said autonomous operation (assuming these one or more actuators actually successfully receive what was merely used to provide the “trigger”), and even if they did, these one or more actuators would certainly not be part of the claimed on-board computers of Claims 16 and 27.  As such, these particular claims merely mentioning an intended use of the detected one or more objects (i.e. to “trigger” an autonomous operation) do not appear to contain significant patentable weight as currently claimed.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IG T AN whose telephone number is (571)270-5110. The examiner can normally be reached M - F: 10:00AM- 4:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aniss Chad can be reached at (571) 270-3832. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

IG T AN
Primary Examiner
Art Unit 3662



/IG T AN/Primary Examiner, Art Unit 3662
Read full office action
Prosecution Timeline

Jan 18, 2024
Application Filed
Jul 03, 2025
Non-Final Rejection — §103, §DP
Sep 25, 2025
Response Filed
Oct 22, 2025
Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/459,570
Patent 12594902
VEHICLE WITH CONTROLLED HOOD MOVEMENT
2y 5m to grant Granted Apr 07, 2026
18/480,686
Patent 12592171
VEHICULAR DRIVING ASSIST SYSTEM WITH HEAD UP DISPLAY
2y 5m to grant Granted Mar 31, 2026
18/524,781
Patent 12592067
EARLY WARNING METHOD FOR ANTI-COLLISION, VEHICLE MOUNTED DEVICE AND STORAGE MEDIUM
2y 5m to grant Granted Mar 31, 2026
17/871,835
Patent 12584745
DYNAMIC EASYROUTING UTILIZING ONBOARD SENSORS
2y 5m to grant Granted Mar 24, 2026
18/133,990
Patent 12572144
GENERATING ENVIRONMENTAL PARAMETERS BASED ON SENSOR DATA USING MACHINE LEARNING
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
56%
Grant Probability
82%
With Interview (+26.1%)
3y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 523 resolved cases by this examiner. Grant probability derived from career allow rate.