DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on April 27, 2027 has been entered.
Response to Amendment
Receipt is acknowledged of claim amendments with associated arguments/remarks, received April 27, 2027. Claims 1-5, 7-13, 15-21, 23-24 are pending in which all were amended. Claims 6, 14, 22 were cancelled.
Response to Arguments
Applicant’s arguments, see Remarks, pg 12-15, filed 04/27/2026, with respect to the rejections of dependent claims 1, 3, 7, 9, 11, 15, 17, 19, 23 under 35 U.S.C. § 103 has been fully considered but are not persuasive. In particular, with regard to the independent claims, the applicant amended the claim limitations that required reliance on the secondary reference Liang et al pertaining the 3D point cloud features. Applicant’s arguments with respect to independent claims 1, 9, 17 have been considered but are moot because the new ground of rejection does not rely on the Liang reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s arguments, see Remarks, pg 15-16, filed 04/27/2026, with respect to the rejections of dependent claims 3, 7, 11, 15, 19, 23 under 35 U.S.C. §103 has been fully considered and are dependent upon arguments cited to secondary reference Liang. Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on the Liang reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s arguments, see Remarks, pg 16, filed 04/27/2026, with respect to the rejections of dependent claims 2, 4-5, 8, 10, 12-13, 16, 18, 20-21, 23-24 under 35 U.S.C. §103 has been fully considered and are dependent upon the arguments of claims 1, 9, 17 as discussed above. Respectfully, the arguments are not persuasive for the same reasons.
All arguments were addressed.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 3, 7, 9, 11, 15, 17, 19, 23 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bongio Karrman et al (US 2021/0027207, hereinafter “Karrman et al”, previously cited in Final Rejection – 02/25/2026).
Regarding Claim 1, Karrman et al teach a method (method 300 of using system 100 to train ML models for image processing; Fig 1, 3 and ¶ [0023]-[0037], [0041]), comprising:
training a first modality imaging system (a first image processing ML model 106 is trained based on a common loss function; Fig 1, 3 and ¶ [0037]);
receiving first input data samples and second input data samples from the first modality imaging system and a second modality imaging system, respectively (a first image 102 and a second image 104, of a plurality of corresponding images, are received and can be of different modalities (the first image 102 may be a LiDAR image and the second image 104 may be an IR or visual spectrum (camera) image); Fig 1, 3 and ¶ [0024], [0042]), the first input data samples and the second input data samples being time-synchronized (the first image 102 is time aligned by immediately preceding the second image 104; Fig 1, 3 and ¶ [0043]),
wherein the first modality imaging system comprises a LIDAR-based system and the second modality imaging system comprises a camera-based system (the first image 102 may be generated from a LiDAR sensor and the second image 104 may be generated from a visual spectrum (camera) sensor; Fig 1 and ¶ [0015], [0017]);
processing the first input data samples in the first modality imaging system to generate a first output (the first image 102 is processed by depth estimation ML model 106 to generate estimated depth 116; Fig 1 and ¶ [0030]),
wherein processing the first input data samples in the first modality imaging system to generate the first output comprises determining first intermediate LIDAR features (applicant describes the intermediate features based on data as it is encoded, specification ¶ [0090]) based on the first input data samples (the first image 102 is input to a first neural network (CNN) machine learning model 106 with an autoencoder that includes an encoder to encode the data (determining first intermediate LiDAR features) based on convolutional features ; Fig 1 and ¶ [0024-[0030]);
processing the second input data samples in the second modality imaging system to generate a second output (the second image frame 104 is processed by sensor pose difference ML model 108 to generate sensor pose difference 118; Fig 1 and ¶ [0031]),
wherein processing the second input data samples in the second modality imaging system to generate the second output comprises determining intermediate camera features and second intermediate LIDAR features (applicant describes the intermediate features based on data as it is encoded, specification ¶ [0090]-[0091]) based on the second input data samples (the second image 104 and the first image 102 is input to a second neural network (CNN) machine learning model 108 with an autoencoder that includes an encoder to encode the data (determining first intermediate LiDAR features; Fig 1 and ¶ [0024-[0029], [0031]-[0033]),
wherein the second intermediate LIDAR features are different than (applicant describes and claims the intermediate features being different broadly, specification ¶ [0065]-[[0067]) the intermediate camera features (the sensor for the first image and the sensor for the second image may be from different perspectives based on intrinsic and extrinsic parameters; ¶ [0015]); and
training the second modality imaging system based on the first output, the second output, the first intermediate LIDAR features, and the second intermediate LIDAR features (the loss 126 generated by the difference between the second image frame 104 and the estimated second image frame 124, with the estimated second image frame 124 based on the first image frame 102, the estimated depth 116 (based on the intermediate feature data of ML 106) and the sensor pose different 118 (based on the intermediate feature data of ML 108) is used to train the ML model 108; Fig 1 and ¶ [0035]-[0037]).
Regarding Claim 3, Karrman et al teach the method of claim 1 (as described above), further comprising: receiving third input data samples from the second modality imaging system (image data may be part of respective video stream data, including for second image 104 data (thereby it is understood third data samples are input to CNN 108); Fig 1 and ¶ [0045]); and processing the third input data samples in the second modality imaging system to generate a third output (the second image frame 104 (a different image frame than the previous first image and part of a video stream ¶ [0045]) is processed by sensor pose difference ML model 108 to generate sensor pose difference 118; Fig 1 and ¶ [0031]) based on a model for the second modality imaging system that is trained based on the first output of the first modality imaging system (the ML model 108 is trained with loss 126, generated by comparing second image frame 104 to estimated second image frame 124, in which the estimated second image frame 124 is based on comparison between the first image frame 102 and the second image frame 104; Fig 1 and ¶ [0033]-[0037]).
Regarding Claim 7, Karrman et al teach the method of claim 1 (as described above), wherein the training includes performing a comparison between the first intermediate LIDAR features and the second intermediate LIDAR features (during training to optimize the neural network (pertaining to the LiDAR depth estimation in ML 106), a gradient descent technique is performed for function optimization between layer parameters (comparison between the layers of the first and second intermediate LiDAR features); ¶ [0013], [0048]).
Regarding Claim 9, Karrman et al teach a non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations (memory 404 with instructions 424, executed on processor 402 in system 400; Fig 4 and ¶ [0068]) comprising: steps identical to claim 1 (as described above).
Regarding Claim 11, Karrman et al teach the non-transitory computer-readable medium of claim 9 (as discussed above), wherein the operations further include: steps identical to claim 3 (as discussed above).
Regarding Claim 15, Karrman et al teach the non-transitory computer-readable medium of claim 9 (as discussed above), wherein the operations further include: steps identical to claim 7 (as discussed above).
Regarding Claim 17, Karrman et al teach an image capture device (computer system 400, includes cellular telephone (recognized to include image sensors 421); Fig 4 and ¶ [0067]-[0068]), comprising: an image sensor (image sensor(s) 421; Fig 4 and ¶ [0068]); a memory storing processor-readable code (memory 404 with instructions 424, executed on processor 402; Fig 4 and ¶ [0068]); and at least one processor coupled to the memory and to the image sensor (processor 402 that is interlinked 408 with memory 404 with image sensor(s) 421; Fig 4 and ¶ [0068]), the at least one processor configured to execute the processor-readable code to cause the at least one processor to perform operations (memory 404 with instructions 424, executed on processor 402 in system 400; Fig 4 and ¶ [0068]) including: steps identical to claim 1 (as described above).
Regarding Claim 19, Karrman et al teach the image capture device of claim 17 (as discussed above), wherein the operations further include: steps identical to claim 3 (as discussed above).
Regarding Claim 23, Karrman et al teach the image capture device of claim 17 (as discussed above), wherein the operations further include: steps identical to claim 7 (as discussed above).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2, 10, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bongio Karrman et al (US 2021/0027207, hereinafter “Karrman et al”, previously cited in Final Rejection – 02/25/2026)) in view of Caesar (US 2020/0272854, cited in Final 02/25/2026).
Regarding Claim 2, Karrman et al teach the method of claim 1 (as described above), including wherein training the first modality imaging system (a first image processing ML model 106 is trained based on a common loss function; Fig 1, 3 and ¶ [0037]) comprises: receiving third input data samples from the first modality imaging system (there may be additional N images to each model and it is understood that training a model requires multiple input image data pairs, thereby additional (third) input data is used for training the first ML model 106; ¶ [0041], [0044]).
Karrman et al does not explicitly teach determining a model for the first modality imaging system based on the first input data samples and a first ground truth corresponding to the first input data sample.
Caesar is analogous art pertinent to the technological problem addressed in this application and teaches determining a model for the first modality imaging system based on the first input data samples and a first ground truth corresponding to the first input data sample (a machine learning model is selected for a given function of the input image data to generate a predicted labeled bounding boxes where the predicted labeled bounding box is based on a ground truth for comparison; Fig 1, 13, 18 and ¶ [0013], [0148]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Karrman et al with Caesar including determining a model for the first modality imaging system based on the first input data samples and a first ground truth corresponding to the first input data sample. By using multiple active machine learning models, an autonomous vehicle may quickly efficiently implement image analysis and predict a trajectory with confidence, thereby improving the predictions, as recognized by Caesar (¶ [0004]-[0006]).
Regarding Claim 10, Karrman et al teach the non-transitory computer-readable medium of claim 9 (as discussed above), wherein the operations further include: steps identical to claim 2 (as discussed above).
Regarding Claim 18, Karrman et al teach the image capture device of claim 17 (as discussed above), wherein the operations further include: steps identical to claim 2 (as discussed above).
Claims 4, 5, 8, 12, 13, 16, 20, 21, 24 are rejected under 35 U.S.C. 103 as being unpatentable over Bongio Karrman et al (US 2021/0027207, hereinafter “Karrman et al”, cited in Final 02/25/2026) in view of Wekel et al (US 2021/0063578, cited in Final 02/25/2026).
Regarding Claim 4, Karrman et al teach the method of claim 3 (as described above).
Karrman et al does not teach wherein the third output comprises at least one three-dimensional (3D) bounding box corresponding to objects detected in the third input data samples.
Wekel et al is analogous art pertinent to the technological problem addressed in this application and teaches wherein the third output comprises at least one three-dimensional (3D) bounding box corresponding to objects detected in the third input data samples (LiDAR point cloud data to generate the range image is used to generate corresponding 3D bounding boxes for detected objects; ¶ [0026]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Karrman et all with Wekel et al including wherein the third output comprises at least one three-dimensional (3D) bounding box corresponding to objects detected in the third input data samples. By performing 3D understandings of a world-space environment based on LiDAR data, an autonomous vehicle is provided data to enable safe planning and control for operation, as recognized by Wekel et al (¶ [0004]-[0005]).
Regarding Claim 5, Karrman et al in view of Wekel et al teach the method of claim 4 (as described above), further comprising operating a vehicle based on the at least one 3D bounding box (Wekel et al, the autonomous vehicle may utilize the 3D bounding box data for tracking actor instances, path planning and control of operation of the vehicle; ¶ [0026], [0036]).
Regarding Claim 8, Karrman et al teach the method of claim 1 (as described above).
Karrman et al does not teach wherein: the first output comprises a first plurality of bounding boxes corresponding to first objects in a scene, and the second output comprises a second plurality of three-dimensional (3D) bounding boxes corresponding to second objects in a scene, training the second modality imaging system comprises training the second modality imaging system with a subset of the first and second objects, the objects of the subset being in both the first plurality of 3D bounding boxes and the second plurality of 3D bounding boxes.
Wekel et al is analogous art pertinent to the technological problem addressed in this application and teaches wherein: the first output comprises a first plurality of bounding boxes corresponding to first objects in a scene (bounding boxes are used to identify objects in the LiDAR image data; Fig 1-3 and ¶ [0045]), and the second output comprises a second plurality of three-dimensional (3D) bounding boxes corresponding to second objects in a scene (bounding boxes are used to identify objects in camera image data; Fig 1-3 and ¶ [0045]), and training the second modality imaging system comprises training the second modality imaging system with a subset of the first and second objects, the objects of the subset being in both the first plurality of 3D bounding boxes and the second plurality of 3D bounding boxes (training the DNN with image data may be based on multiple sensor data (thereby camera sensor and LiDAR sensor data) and filtering may be performed, which may change the number and shape of the bounding box shape locations used for the ground truth data used for training; Fig 1-3 and ¶ [0045], [0048]-[0049], [0061]-[0066]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Karrman et all with Wekel et al including wherein: the first output comprises a first plurality of three-dimensional (3D) bounding boxes corresponding to first objects in a scene, and the second output comprises a second plurality of bounding boxes corresponding to second objects in a scene, and training the second modality imaging system comprises training the second modality imaging system with a subset of the first and second objects, the objects of the subset being in both the first plurality of 3D bounding boxes and the second plurality of 3D bounding boxes. By training a DNN with a combined point cloud segmentation and bounding box regression network that my adjust the data used for training, the training data set may be modified and allow for improved data for the DNN, including reducing and repurposing ground truth data, thereby improving a DNN that may be used to enable safe planning and control of an autonomous vehicle, as recognized by Wekel et al (¶[0004]-[0005]).
Regarding Claim 12, Karrman et al teach the non-transitory computer-readable medium of claim 11 (as discussed above), wherein the operations further include: steps identical to claim 4 (as discussed above).
Regarding Claim 13, Karrman et al teach the non-transitory computer-readable medium of claim 12 (as discussed above), wherein the operations further include: steps identical to claim 5 (as discussed above).
Regarding Claim 16, Karrman et al teach the non-transitory computer-readable medium of claim 9 (as discussed above), wherein the operations further include: steps identical to claim 8 (as discussed above).
Regarding Claim 20, Karrman et al teach the image capture device of claim 19 (as discussed above), wherein the operations further include: steps identical to claim 4 (as discussed above).
Regarding Claim 21, Karrman et al teach the image capture device of claim 20 (as discussed above), wherein the operations further include: steps identical to claim 5 (as discussed above).
Regarding Claim 24, Karrman et al teach the image capture device of claim 17 (as discussed above), wherein the operations further include: steps identical to claim 8 (as discussed above).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Smolyanskiy et al (US 2021/0342609) teach a deep neural network system and method for 3D object detection, including use of 3D bounding boxes and classification labels for detected objects.
Sheu et al (US 2022/0172495) teach a system and method for using 3D point cloud data from LiDAR to create 3D bounding box data for tracking dynamic objects in an ego vehicle field of view.
Wen et al (Fast and Accurate 3D Object Detection for Lidar-Camera-Based Autonomous Vehicles using One Shared Voxel-Based Backbone) teach LiDAR-camera 3D detectors for identifying objects in 3D and using a point transform module to fuse the data allowing for 3D bounding box detection of the objects and used for autonomous vehicle operation.
Sakic et al (Camera-LiDAR Object Detection and Distance Estimation with Application in Collision Avoidance System, first cited in Non-Final 10/22/2025) teaches the use of LiDAR and camera fusion to generate a plurality of bounding boxes representing objects in the scene and fusing the LiDAR and camera data with the plurality of bounding boxes.
Bhanushali et al (LiDAR-Camera Fusion for 3D Object Detection, first cited in Non-Final 10/22/2025) also teaches the use of LiDAR and camera fusion to generate a plurality of bounding boxes representing objects in the scene and fusing the LiDAR and camera data with the plurality of bounding boxes.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHLEEN M BROUGHTON whose telephone number is (571)270-7380. The examiner can normally be reached Monday-Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KATHLEEN M BROUGHTON/Primary Examiner, Art Unit 2661