DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Receipt is acknowledged of claim amendments with associated arguments/remarks, received January 20, 2026. Claims 1-5, 7-13, 15-21, 23-24 are pending in which claims 1, 7, 9, 15, 17, 23 were amended. Claims 6, 14, 22 were cancelled.
Response to Arguments
Applicant’s arguments, see Remarks, pg 11, filed 01/20/2026, with respect to the rejection of claims 1-4, 8-12, 16-20, 24 under 35 U.S.C. § 101 has been fully considered and, in light of the associated amendment, is persuasive. Therefore, the rejection has been withdrawn.
Applicant’s arguments, see Remarks, pg 11-13, filed 01/20/2026, with respect to the rejections of claim 1, 3, 6, 9, 11, 14, 17, 19, 22 under 35 U has been fully considered and, based on the amendment that changed the scope and interpretation, is persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new grounds of rejection is made under Bongio Karrman et al (US 2021/0027207, hereinafter “Karrman et al”) in view of Liang et al (US 2020/0025931). Regarding independent Claims 1, 9, 17, the applicant’s argument that Liang et al does not teach the second modality imaging system or determining intermediate camera features and second intermediate 3D point cloud features is not persuasive. In particular, Liang et al states “Our overall architecture includes two streams, with one stream extracting image features and another one extracting features from LIDAR BEV. The continuous fusion layer can be configured to bridge multiple intermediate layers on both sides in order to perform multi-sensor fusion at multiple scales. This architecture facilitates generation of the final detection results in BEV space, as desired by many autonomous driving application domains.” (Liang et al, Figure 9 and ¶ [0140]). The structure of Liang et al Fig 9 demonstrates there are multiple intermediate layers for both sensor data models and the applicant has not provided detail as to how the applicant’s claimed intermediate features are different from the prior art. Respectfully, the argument is not persuasive.
Applicant’s arguments, see Remarks, pg 13-16, filed 01/20/2026, with respect to the rejections of dependent claims 2-5, 7-8, 10-13, 15-16, 18-21, 23-24 under 35 U.S.C. §§ 102, 103 has been fully considered and are dependent upon the arguments of claims 1, 9, 17 as discussed above. Respectfully, the arguments are not persuasive for the same reasons.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, 7, 9, 11, 15, 17, 19, 23 are rejected under 35 U.S.C. 103 as being unpatentable over Bongio Karrman et al (US 2021/0027207, cited in Non-Final 10/22/2025, hereinafter “Karrman et al”) in view of Liang et al (US 2020/0025931, cited in Non-Final 10/22/2025).
Regarding Claim 1, Karrman et al teach a method (method 300 of using system 100 to train ML models for image processing; Fig 1, 3 and ¶ [0023]-[0037], [0041]), comprising:
training a first modality imaging system (a first image processing ML model 106 is trained based on a common loss function; Fig 1, 3 and ¶ [0037]);
receiving first input data samples and second input data samples from the first modality imaging system and a second modality imaging system, respectively (a first image 102 and a second image 104, of a plurality of corresponding images, are received and can be of different modalities (the first image 102 may be a LiDAR image and the second image 104 may be an IR or visual spectrum (camera) image); Fig 1, 3 and ¶ [0024], [0042]), the first input data samples and the second input data samples being time-synchronized (the first image 102 is time aligned by immediately preceding the second image 104; Fig 1, 3 and ¶ [0043]),
wherein the first modality imaging system comprises a LIDAR-based system and the second modality imaging system comprises a camera-based system (the first image 102 may be generated from a LiDAR sensor and the second image 104 may be generated from a visual spectrum (camera) sensor; Fig 1 and ¶ [0015], [0017]);
processing the first input data samples in the first modality imaging system to generate a first output (the first image 102 is processed by depth estimation ML model 106 to generate estimated depth 116; Fig 1 and ¶ [0030]);
processing the second input data samples in the second modality imaging system to generate a second output (the second image frame 104 is processed by sensor pose difference ML model 108 to generate sensor pose difference 118; Fig 1 and ¶ [0031]), and
training the second modality imaging system based on the first output and the second output (the loss 126 generated by the difference between the second image frame 104 and the estimated second image frame 124 is used to train the ML model 108; Fig 1 and ¶ [0037]).
Karrman et al does not teach wherein processing the second input data samples in the second modality imaging system comprises determining intermediate camera features and second intermediate 3D point cloud features based on the second input data samples.
Liang et al is analogous art pertinent to the technological problem addressed in this application and teaches processing the second input data samples in the second modality imaging system (the camera stream image data is analyzed using a convolutional network; Fig 9,10 and ¶ [0140], [0142]) comprises determining intermediate camera features and second intermediate 3D point cloud features based on the second input data samples (the camera view image data is converted to corresponding image features of the 3D points of the 3D LiDAR point cloud data (¶ [0008]) based on a fusing process and both of the camera stream and the LiDAR BEV stream are performed over multiple intermediate layers on both sides to perform multi-sensor fusion at multiple scales; Fig 9 and ¶ [0140]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Karrman et al with Liang et al including processing the second input data samples in the second modality imaging system comprises determining intermediate camera features and second intermediate 3D point cloud features based on the second input data samples. By using fusion data between the camera and LiDAR, the autonomous vehicle CNN is trained with both data to improve analysis using both modalities, thereby improving AV driving applications, as recognized by Liang et al (¶ [0015], 0140]).
Regarding Claim 3, Karrman et al in view of Liang et al teach the method of claim 1 (as described above), further comprising: receiving third input data samples from the second modality imaging system (Karrman et al, there may be additional N images to each model and it is understood that training a model requires multiple input image data pairs, thereby additional (third) input data is used for training the second ML model 108; ¶ [0041], [0044]); and processing the third input data samples in the second modality imaging system to generate third output (Karrman et al, the second image frame 104 (understood to be a different image frame than the first image ¶ [0041], [0044]) is processed by sensor pose difference ML model 108 to generate sensor pose difference 118; Fig 1 and ¶ [0031]) based on a model for the second modality imaging system that is trained based on the first output of the first modality imaging system (Karrman et al, the ML model 108 is trained with loss 126, generated by comparing second image frame 104 to estimated second image frame 124, in which the estimated second image frame 124 is based on comparison between the first image frame 102 and the second image frame 104; Fig 1 and ¶ [0033]-[0037]).
Regarding Claim 7, Karrman et al in view of Liang et al teach the method of claim 1 (as described above), wherein: processing the first input data samples in the first modality imaging system comprises determining first intermediate 3D point cloud features based on the first input data samples (Liang et al, LiDAR points are determined based on the LiDAR sensor data, and intermediate layers are used to determine intermediate features based on the previous layer input; Fig 9 and ¶ [0140]),
the method further comprising: training the second modality imaging system based on the first intermediate 3D point cloud features and the second intermediate 3D point cloud features (Liang et al, the CNNs can be trained based on the relationship data identified between the LiDAR and camera sensor data for interpolation and to scale the data; ¶ [0140]), including performing a comparison between the first intermediate 3D point cloud features and the second intermediate 3D point cloud features (Liang et al, the continuous fusion layer is configured to bridge multiple intermediate layers for the sensor data including the 3D object detection data from the LiDAR; ¶ [0140]).
Regarding Claim 9, Karrman et al teach a non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations (memory 404 with instructions 424, executed on processor 402 in system 400; Fig 4 and ¶ [0068]) comprising: steps identical to claim 1 (as described above).
Regarding Claim 11, Karrman et al in view of Liang et al teach the non-transitory computer-readable medium of claim 9, wherein the operations further include: steps identical to claim 3 (as discussed above).
Regarding Claim 15, Karrman et al in view of Liang et al teach the non-transitory computer-readable medium of claim 9 (as discussed above), wherein the operations further include: steps identical to claim 7 (as discussed above).
Regarding Claim 17, Karrman et al teach an image capture device (computer system 400, includes cellular telephone (recognized to include image sensors 421); Fig 4 and ¶ [0067]-[0068]), comprising: an image sensor (image sensor(s) 421; Fig 4 and ¶ [0068]); a memory storing processor-readable code (memory 404 with instructions 424, executed on processor 402; Fig 4 and ¶ [0068]); and at least one processor coupled to the memory and to the image sensor (processor 402 that is interlinked 408 with memory 404 with image sensor(s) 421; Fig 4 and ¶ [0068]), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations (memory 404 with instructions 424, executed on processor 402 in system 400; Fig 4 and ¶ [0068]) including: steps identical to claim 1 (as described above).
Regarding Claim 19, Karrman et al in view of Liang et al teach the image capture device of claim 17 (as discussed above), wherein the operations further include: steps identical to claim 3 (as discussed above).
Regarding Claim 23, Karrman et al in view of Liang et al teach the image capture device of claim 17 (as discussed above), wherein the operations further include: steps identical to claim 7 (as discussed above).
Claims 2, 4, 5, 10, 12, 13, 18, 20, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Bongio Karrman et al (US 2021/0027207, hereinafter “Karrman et al”) in view of Liang et al (US 2020/0025931) and Caesar (US 2020/0272854, cited in Non-Final 10/22/2025).
Regarding Claim 2, Karrman et al in view of Liang et al teach the method of claim 1 (as described above), including wherein training the first modality imaging system (Karrman et al, a first image processing ML model 106 is trained based on a common loss function; Fig 1, 3 and ¶ [0037]) comprises: receiving third input data samples from the first modality imaging system (Karrman et al, there may be additional N images to each model and it is understood that training a model requires multiple input image data pairs, thereby additional (third) input data is used for training the first ML model 106; ¶ [0041], [0044]).
Karrman et al in view of Liang et al does not explicitly teach determining a model for the first modality imaging system based on the first input data samples and a first ground truth corresponding to the first input data sample (examiner notes Liang et al does teach bounding shape detection and ground truth data ¶ [0220] but not in the configuration described for the cited intermediate layers ¶ [0140]).
Caesar is analogous art pertinent to the technological problem addressed in this application and teaches determining a model for the first modality imaging system based on the first input data samples and a first ground truth corresponding to the first input data sample (a machine learning model is selected for a given function of the input image data to generate a predicted labeled bounding boxes where the predicted labeled bounding box is based on a ground truth for comparison; Fig 1, 13, 18 and ¶ [0013], [0148]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Karrman et al in view of Liang et al with Caesar including determining a model for the first modality imaging system based on the first input data samples and a first ground truth corresponding to the first input data sample. By using multiple active machine learning models, an autonomous vehicle may quickly efficiently implement image analysis and predict a trajectory with confidence, thereby improving the predictions, as recognized by Caesar (¶ [0004]-[0006]).
Regarding Claim 4, Karrman et al in view of Liang et al teach the method of claim 3 (as described above).
Karrman et al in view of Liang et al does not teach wherein the third output comprises at least one bounding box corresponding to objects detected in the third input data samples.
Caesar is analogous art pertinent to the technological problem addressed in this application and teaches wherein the third output comprises at least one bounding box corresponding to objects detected in the third input data samples (in the perception module 402, an object detector CNN is used to output an image or point cloud that includes bounding boxes surrounding the detected objects; Fig 1, 4 and ¶ [0100]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Karrman et al in view of Liang et al with Caesar including wherein the third output comprises at least one bounding box corresponding to objects detected in the third input data samples. By using bounding boxes with labels and confidences scores regarding object detection, the prediction confidence may be quickly determined , thereby improving the multiple machine learning models during training, resulting in improved model accuracy, as recognized by Caesar (¶ [0004]-[0006]).
Regarding Claim 5, Karrman et al in view of Liang et al and Caesar teach the method of claim 4 (as described above), further comprising operating a vehicle based on the at least one bounding box (Caesar, the bounding box data of the detected object, identified by the perception module 402, is used, along with additional data from database module 410, by the planning module 404 to determine the AV position 418 and calculate a path for the AV; Fig 1, 4 and ¶ [0100]-[0101]).
Regarding Claim 10, Karrman et al in view of Liang et al teach the non-transitory computer-readable medium of claim 9 (as discussed above), wherein the operations further include: steps identical to claim 2 (as discussed above).
Regarding Claim 12, Karrman et al in view of Liang et al teach the non-transitory computer-readable medium of claim 11 (as discussed above), wherein the operations further include: steps identical to claim 4 (as discussed above).
Regarding Claim 13, Karrman et al in view of Liang et al teach the non-transitory computer-readable medium of claim 12 (as discussed above), wherein the operations further include: steps identical to claim 5 (as discussed above).
Regarding Claim 18, Karrman et al in view of Liang et al teach the image capture device of claim 17 (as discussed above), wherein the operations further include: steps identical to claim 2 (as discussed above).
Regarding Claim 20, Karrman et al in view of Liang et al teach the image capture device of claim 19 (as discussed above), wherein the operations further include: steps identical to claim 4 (as discussed above).
Regarding Claim 21, Karrman et al in view of Liang et al teach the image capture device of claim 20 (as discussed above), wherein the operations further include: steps identical to claim 5 (as discussed above).
Claims 8, 16, 24 are rejected under 35 U.S.C. 103 as being unpatentable over Bongio Karrman et al (US 2021/0027207, hereinafter “Karrman et al”) in view of Liang et al (US 2020/0025931) and Wekel et al (US 2021/0063578, cited in Non-Final 10/22/2025).
Regarding Claim 8, Karrman et al in view of Liang et al teach the method of claim 1 (as described above).
Karrman et al in view of Liang et al does not teach wherein: the first output comprises a first plurality of bounding boxes corresponding to first objects in a scene, and the second output comprises a second plurality of bounding boxes corresponding to second objects in a scene, and training the second modality imaging system comprises training the second modality imaging system with a subset of the first and second objects, the objects of the subset being in both the first plurality of bounding boxes and the second plurality of bounding boxes (examiner notes Liang et al does teach bounding shape detection and ground truth data ¶ [0220] but not in the configuration described for the cited intermediate layers ¶ [0140]).
Wekel et al is analogous art pertinent to the technological problem addressed in this application and teaches wherein: the first output comprises a first plurality of bounding boxes corresponding to first objects in a scene (bounding boxes are used to identify objects in the LiDAR image data; Fig 1-3 and ¶ [0045]), and the second output comprises a second plurality of bounding boxes corresponding to second objects in a scene (bounding boxes are used to identify objects in camera image data; Fig 1-3 and ¶ [0045]), and training the second modality imaging system comprises training the second modality imaging system with a subset of the first and second objects, the objects of the subset being in both the first plurality of bounding boxes and the second plurality of bounding boxes (training the DNN with image data may be based on multiple sensor data (thereby camera sensor and LiDAR sensor data) and filtering may be performed, which may change the number and shape of the bounding box shape locations used for the ground truth data used for training; Fig 1-3 and ¶ [0045], [0048]-[0049], [0061]-[0066]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Karrman et al in view of Liang et al with Wekel et al including wherein: the first output comprises a first plurality of bounding boxes corresponding to first objects in a scene, and the second output comprises a second plurality of bounding boxes corresponding to second objects in a scene, and training the second modality imaging system comprises training the second modality imaging system with a subset of the first and second objects, the objects of the subset being in both the first plurality of bounding boxes and the second plurality of bounding boxes. By training a DNN with a combined point cloud segmentation and bounding box regression network that my adjust the data used for training, the training data set may be modified and allow for improved data for the DNN, including reducing and repurposing ground truth data, thereby improving a DNN that may be used to enable safe planning and control of an autonomous vehicle, as recognized by Wekel et al (¶[0004]-[0005]).
Regarding Claim 16, Karrman et al in view of Liang et al teach the non-transitory computer-readable medium of claim 9 (as discussed above), wherein the operations further include: steps identical to claim 8 (as discussed above).
Regarding Claim 24, Karrman et al in view of Liang et al teach the image capture device of claim 17 (as discussed above), wherein the operations further include: steps identical to claim 8 (as discussed above).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sakic et al (Camera-LiDAR Object Detection and Distance Estimation with Application in Collision Avoidance System, cited in Non-Final 10/22/2025) teaches the use of LiDAR and camera fusion to generate a plurality of bounding boxes representing objects in the scene and fusing the LiDAR and camera data with the plurality of bounding boxes.
Bhanushali et al (LiDAR-Camera Fusion for 3D Object Detection, cited in Non-Final 10/22/2025) also teaches the use of LiDAR and camera fusion to generate a plurality of bounding boxes representing objects in the scene and fusing the LiDAR and camera data with the plurality of bounding boxes.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHLEEN M BROUGHTON whose telephone number is (571)270-7380. The examiner can normally be reached Monday-Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KATHLEEN M BROUGHTON/Primary Examiner, Art Unit 2661