DETAILED ACTION
Claims 28-36, 38-51, 54-62 are currently pending and have been examined in this application. Claims 1-27, 37, 52, 53 have been Cancelled.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is made FINAL in response to the “amendment” and “remarks” filed 10/03/2025
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 28-36, 38-39, 44, 48-51, 54-55, 57-62 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (US20200082180) in view of Tong (US20190064841).
Claim 28:
Wang explicitly teaches:
A navigation system for a host vehicle, the system comprising:
(Wang) – “The vehicle 105 may include various vehicle subsystems such as a vehicle drive subsystem 142, vehicle sensor subsystem 144, vehicle control subsystem 146, and occupant interface subsystem 148.” (Para 0032)
“The vehicle control system 146 may be configured to control operation of the vehicle 105 and its components. Accordingly, the vehicle control system 146 may include various elements such as a steering unit, a throttle, a brake unit, a navigation unit, and an autonomous control unit.” (Para 0037)
at least one processor comprising circuitry and a memory, wherein the memory includes instructions that when executed by the circuitry cause the at least one processor to:
(Wang) – “The in-vehicle control system 150 can be configured to include a data processor 171 to execute the 3D image processing module 200 for processing image data received from one or more of the vehicle subsystems 140. The data processor 171 can be combined with a data storage device 172 as part of a computing system 170 in the in-vehicle control system 150. The data storage device 172 can be used to store data, processing parameters, camera parameters, terrain data, and data processing instructions. A processing module interface 165 can be provided to facilitate data communications between the data processor 171 and the 3D image processing module 200. In various example embodiments, a plurality of processing modules, configured similarly to 3D image processing module 200, can be provided for execution by data processor 171.” (Para 0026)
Examiner Note: A control system like this contains circuitry.
receive a plurality of captured images acquired by a camera onboard the host vehicle, the plurality of captured images being representative of an environment of the host vehicle;
(Wang) – “The example system and method for 3D object detection can include a 3D image processing system configured to receive image data from at least one camera associated with an autonomous vehicle. An example embodiment can be configured to output the location of a 2D bounding box around a detected object, and the location of the eight corners that depict the size and direction (heading) of the object. This is an improvement over conventional systems that do not provide real-world 3D information. With geological information related to a particular environment (e.g., road or terrain information) and camera calibration matrices, the example embodiment can accurately calculate the exact size and location of the object imaged by the camera in 3D coordinates.” (Para 0005)
“The 3D image processing module, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object.” (Para 0006)
“The in-vehicle control system 150 can be configured to include a data processor 171 to execute the 3D image processing module 200 for processing image data received from one or more of the vehicle subsystems 140. The data processor 171 can be combined with a data storage device 172 as part of a computing system 170 in the in-vehicle control system 150. The data storage device 172 can be used to store data, processing parameters, camera parameters, terrain data, and data processing instructions. A processing module interface 165 can be provided to facilitate data communications between the data processor 171 and the 3D image processing module 200. In various example embodiments, a plurality of processing modules, configured similarly to 3D image processing module 200, can be provided for execution by data processor 171.” (Para 0026)
Examiner Note: Because the camera is associated with the autonomous vehicle, the images are representative of an environment of the autonomous vehicle.
provide each of the plurality of captured images to at least one trained model configured to generate an output for each of the plurality of captured images, the at least one trained model having been trained using training data comprising a plurality of training images and real-world dimensions of reference objects represented in the plurality of training images, and
(Wang) –“During inference training of the deep learning module 212, sets of training images can be input to the network (e.g., a neural network) of the deep learning module 212, and all x and y coordinates for the 2D and 3D bounding boxes of every object in the images can be obtained… The deep learning module 212 can run in real-time at 40fps (frames per second) for a single image, which satisfies the requirements of an autonomous driving system.” (Para 0056)
“The example embodiment can be configured to: receive image data from at least one camera associated with an autonomous vehicle, the image data representing at least one image frame (processing block 1010); use a trained deep learning module to determine pixel coordinates of a two-dimensional (2D) bounding box around an object detected in the image frame (processing block 1020); use the trained deep learning module to determine vertices of a three-dimensional (3D) bounding box around the object (processing block 1030); use a fitting module to obtain geological information related to a particular environment associated with the image frame and to obtain camera calibration information associated with the at least one camera (processing block 1040); and use the fitting module to determine 3D attributes of the object using the 3D bounding box, the geological information, and the camera calibration information (processing block 1050).” (Para 0072)
Examiner Note: the system of Wang teaches operating in “real-time” and describes a frame-by-frame analysis.
wherein the output generated by the at least one trained model for each of the plurality of captured images includes at least: a first value [represented in real-world dimensions to represent physical real-world] height of a target object identified in a particular one of the plurality of captured images and [represented in real-world dimensions to represent indicative of a physical real-world] width or a length of the target object;
(Wang) – “The 3D image processing module, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object.” (Para 0006)
“The network resources 122 can also host network cloud services, which can support the functionality used to compute or assist in processing image input or image input analysis.” (Para 0028)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
“During inference training of the deep learning module 212, sets of training images can be input to the network (e.g., a neural network) of the deep learning module 212, and all x and y coordinates for the 2D and 3D bounding boxes of every object in the images can be obtained… The deep learning module 212 can run in real-time at 40fps (frames per second) for a single image, which satisfies the requirements of an autonomous driving system.” (Para 0056)
“The example embodiment can be configured to: receive image data from at least one camera associated with an autonomous vehicle, the image data representing at least one image frame (processing block 1010); use a trained deep learning module to determine pixel coordinates of a two-dimensional (2D) bounding box around an object detected in the image frame (processing block 1020); use the trained deep learning module to determine vertices of a three-dimensional (3D) bounding box around the object (processing block 1030); use a fitting module to obtain geological information related to a particular environment associated with the image frame and to obtain camera calibration information associated with the at least one camera (processing block 1040); and use the fitting module to determine 3D attributes of the object using the 3D bounding box, the geological information, and the camera calibration information (processing block 1050).” (Para 0072)
“The example system and method for 3D object detection can include a 3D image processing system configured to receive image data from at least one camera associated with an autonomous vehicle. An example embodiment can be configured to output the location of a 2D bounding box around a detected object, and the location of the eight corners that depict the size and direction (heading) of the object. This is an improvement over conventional systems that do not provide real-world 3D information. With geological information related to a particular environment (e.g., road or terrain information) and camera calibration matrices, the example embodiment can accurately calculate the exact size and location of the object imaged by the camera in 3D coordinates.” (Para 0005)
Examiner Note: bracketed text not explicitly taught by the primary reference, but is taught by non-primary reference later in the rejection. The output of the deep learning model is a bounding box with dimensions of pixels. The fitting module then converts this to real world dimensions (meters). While this process is very similar to output values represented in real-world dimensions of the target object, it is not exactly the same and the limitation will be further taught by non-primary reference.
receive the output generated by the at least one trained model, including the first and second values, for each of the plurality of captured images; and
(Wang) – “the computing system 170 may use input from the vehicle control system 146 in order to control the steering unit to avoid an obstacle detected by the vehicle sensor subsystem 144 and the 3D image processing module 200, move in a controlled manner, or follow a path or trajectory based on output generated by the 3D image processing module 200.” (Para 0045)
“The 3D attributes of an object can be provided as 3D object detection data 220 as output from the 3D image processing system 210, and the 3D image processing module 200 therein.” (Para 0071)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
“The example embodiment can be configured to: receive image data from at least one camera associated with an autonomous vehicle, the image data representing at least one image frame (processing block 1010); use a trained deep learning module to determine pixel coordinates of a two-dimensional (2D) bounding box around an object detected in the image frame (processing block 1020); use the trained deep learning module to determine vertices of a three-dimensional (3D) bounding box around the object (processing block 1030); use a fitting module to obtain geological information related to a particular environment associated with the image frame and to obtain camera calibration information associated with the at least one camera (processing block 1040); and use the fitting module to determine 3D attributes of the object using the 3D bounding box, the geological information, and the camera calibration information (processing block 1050).” (Para 0072)
cause at least one navigational action by the host vehicle based on the first and second values associated with at least one of the plurality of captured images.
(Wang) – “the autonomous control unit may be configured to incorporate data from the 3D image processing module 200, the GPS transceiver, the RADAR, the LIDAR, the cameras, and other vehicle subsystems to determine the driving path or trajectory for the vehicle 105.” (Para 0038)
“the computing system 170 may use input from the vehicle control system 146 in order to control the steering unit to avoid an obstacle detected by the vehicle sensor subsystem 144 and the 3D image processing module 200, move in a controlled manner, or follow a path or trajectory based on output generated by the 3D image processing module 200.” (Para 0045)
“The 3D attributes of an object can be provided as 3D object detection data 220 as output from the 3D image processing system 210, and the 3D image processing module 200 therein.” (Para 0071)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
Wang does not explicitly teach:
represented in real-world dimensions to represent physical real-world… represented in real-world dimensions to represent physical real-world
Tong, in the same field of endeavor of determining object dimensions, teaches:
represented in real-world dimensions to represent physical real-world… represented in real-world dimensions to represent physical real-world
(Tong) – “FIG. 8 illustrates a method by which the neural network learns to determine a proposal region for a target vehicle. The ground truth image patch 702 is compared to an image bounded by the context region 802. The context region 802 has a height H and a width W and includes only a portion of the target vehicle. A detection region 804 is located at a center 806 of the context region 802. Upon comparison, the neural network is able to predict a proposal region, i.e. a bounding box 810 that more closely approximates the dimensions of the target vehicle. The bounding box 810 has height hss and width wBB and center 816 that is located on the target vehicle. The area of the bounding box 810 can extend outside of the area of the image patch 802. The height NB, width wBB and center 816 of the bounding box 810 can be determined using a regression analysis.” (Para 0038)
“FIG. 9 illustrates a method by which the neural network learns to create a bounding box for the target object within a proposal region of interest 900. The proposal region of interest 900 may have an area that is greater than a bounding box of the target vehicle. Similarly to the learning process using the image patches, the neural network determine a center 902 of the proposal region of interest 900, determines a bounding box 910 for the target object having a height hss and width wBB and a center 904. The height hss, width wBB and center 904 of the bounding box 910 can be determined using a regression analysis.” (Para 0039)
PNG
media_image1.png
583
475
media_image1.png
Greyscale
Therefore, it would be obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the object detection system of Wang with the method of driving of an autonomous vehicle of Tong. One of ordinary skill in the art would have been motivated to make these modifications with a reasonable expectation of success because “it is desirable to provide a system for an autonomous vehicle to classify target objects in an urban environment.” (Tong Para 0002)
Claim 29:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein the
(Wang) – “The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
“During inference training of the deep learning module 212, sets of training images can be input to the network (e.g., a neural network) of the deep learning module 212, and all x and y coordinates for the 2D and 3D bounding boxes of every object in the images can be obtained. Non-maximum suppression (NMS) is also applied after the training of the deep learning module 212 to refine the bounding boxes and improve the prediction quality. The deep learning module 212 can run in real-time at 40fps (frames per second) for a single image, which satisfies the requirements of an autonomous driving system.” (Para 0056)
Examiner Note: Per BRI, x & y coordinates corresponds to depth information. The deep learning module is pre-trained.
Claim 30:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein the at least one trained model includes a neural network.
(Wang) – “During inference training of the deep learning module 212, sets of training images can be input to the network (e.g., a neural network) of the deep learning module 212, and all x and y coordinates for the 2D and 3D bounding boxes of every object in the images can be obtained. Non-maximum suppression (NMS) is also applied after the training of the deep learning module 212 to refine the bounding boxes and improve the prediction quality. The deep learning module 212 can run in real-time at 40fps (frames per second) for a single image, which satisfies the requirements of an autonomous driving system.” (Para 0056)
Claim 31:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein the real-world dimensions of the reference objects
(Wang) – “During inference training of the deep learning module 212, sets of training images can be input to the network (e.g., a neural network) of the deep learning module 212, and all x and y coordinates for the 2D and 3D bounding boxes of every object in the images can be obtained. Non-maximum suppression (NMS) is also applied after the training of the deep learning module 212 to refine the bounding boxes and improve the prediction quality. The deep learning module 212 can run in real-time at 40fps (frames per second) for a single image, which satisfies the requirements of an autonomous driving system.” (Para 0056)
“The remaining values are just the 3D properties of the bounding box, including its height (h), width (w), length (l), location in the 3D world relative to the camera (X, Y, Z), and the heading orientation of the bounding box (9).” (Para 0053)
Examiner Note: Per BRI, because a bounding box is a box whose sides correspond to the dimensions of the object being bound (see Figs. 2-20), x & y coordinates of 2D & 3D bounding boxes corresponds at least one of height, depth, or width dimensions.
Claim 32:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein the at least one trained model is configured to output the first value and the second value for a particular one of the plurality of captured images where at least one surface of the target object is at least partially obscured in the particular one of the plurality of captured images.
(Wang) – “The 3D image processing module 200 can also effectively handle situations such as significant occlusion (see FIGS. 3 and 4) and partial observation (see FIGS. 3, 4, 7 and 8).” (Para 0064)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
Examiner Note: Figs. 3 & 4 show a situation in which multiple surfaces of a target vehicle are at least partially obscured.
PNG
media_image2.png
464
753
media_image2.png
Greyscale
Claim 33:
Wang in combination with the references relied upon in Claim 32 teach those respective limitations. Wang further teaches:
wherein the target object is a target vehicle in the environment of the host vehicle, and the at least one surface is associated with a rear of the target vehicle.
(Wang) – “The 3D image processing module 200 can also effectively handle situations such as significant occlusion (see FIGS. 3 and 4) and partial observation (see FIGS. 3, 4, 7 and 8).” (Para 0064)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
Examiner Note: Figs. 3 & 4 (see above) show a situation in which multiple surfaces of a target vehicle are at least partially obscured, including the rear.
Claim 34:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein the at least one trained model is configured to output the first value, the second value, and a third value for a particular one of the plurality of captured images where at least two surfaces of the target object are at least partially obscured in the particular one of the plurality of captured images, and wherein the second value is indicative of a width of the target object, and the third value is indicative of a length of the target object.
(Wang) – “The 3D image processing module 200 can also effectively handle situations such as significant occlusion (see FIGS. 3 and 4) and partial observation (see FIGS. 3, 4, 7 and 8).” (Para 0064)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
Examiner Note: Figs. 3 & 4 (see above) show a situation in which multiple surfaces of a target vehicle are at least partially obscured.
Claim 35:
Wang in combination with the references relied upon in Claim 34 teach those respective limitations. Wang further teaches:
wherein the target object is a target vehicle in the environment of the host vehicle, and the at least two surfaces are associated with a rear of the target vehicle and a side of the target vehicle.
(Wang) – “The 3D image processing module 200 can also effectively handle situations such as significant occlusion (see FIGS. 3 and 4) and partial observation (see FIGS. 3, 4, 7 and 8).” (Para 0064)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
Examiner Note: Figs. 3 & 4 (see above) show a situation in which multiple surfaces of a target vehicle are at least partially obscured, including the rear and side.
Claim 36:
Wang in combination with the references relied upon in Claim 34 teach those respective limitations. Wang further teaches:
wherein the
(Wang) – “The 3D image processing module 200 can also effectively handle situations such as significant occlusion (see FIGS. 3 and 4) and partial observation (see FIGS. 3, 4, 7 and 8).” (Para 0064)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
Examiner Note: The 3D attributes of the object correspond to the real-world dimensions of the target object.
Claim 37: Cancelled
Claim 38:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein the target object includes a vehicle in the environment of the host vehicle.
(Wang) –“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
“In FIGS. 4, 6, and 8, the fitting results are obtained by projecting the calculated 3D properties back to the 2D image plane. For each bounding box, the red text describes the calculated 3D object properties in the following order: vehicle height, width, length, distance (in z axis), and the orientation.” (Para 0064)
Examiner Note: Figs. 3 & 4 (see above) show a vehicle as the target object.
Claim 39:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein each of the plurality of captured images includes a representation of the target object.
(Wang) –“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
“The example system and method for 3D object detection can include a 3D image processing system configured to receive image data from at least one camera associated with an autonomous vehicle. An example embodiment can be configured to output the location of a 2D bounding box around a detected object, and the location of the eight corners that depict the size and direction (heading) of the object. This is an improvement over conventional systems that do not provide real-world 3D information. With geological information related to a particular environment (e.g., road or terrain information) and camera calibration matrices, the example embodiment can accurately calculate the exact size and location of the object imaged by the camera in 3D coordinates.” (Para 0005)
Examiner Note: representation of a target object is recited with a high level of generality and, therefore, may correspond to any image or representation showing the object.
Claim 44:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein one or more of the plurality of captured images includes an occlusion that at least partially occludes the target object.
(Wang) – “The 3D image processing module 200 can also effectively handle situations such as significant occlusion (see FIGS. 3 and 4) and partial observation (see FIGS. 3, 4, 7 and 8).” (Para 0064)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
Claim 49:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein the navigational action includes at least one of accelerating, braking, or turning the host vehicle.
(Wang) – “the autonomous control unit may be configured to incorporate data from the 3D image processing module 200, the GPS transceiver, the RADAR, the LIDAR, the cameras, and other vehicle subsystems to determine the driving path or trajectory for the vehicle 105.” (Para 0038)
“the computing system 170 may use input from the vehicle control system 146 in order to control the steering unit to avoid an obstacle detected by the vehicle sensor subsystem 144 and the 3D image processing module 200, move in a controlled manner, or follow a path or trajectory based on output generated by the 3D image processing module 200.” (Para 0045)
Claim 50:
Wang in combination with the references relied upon in Claim 28 teach those respective limitations. Wang further teaches:
wherein the output generated by the at least one trained model for each of the plurality of captured images includes a bounding box associated with the target object.
(Wang) – “The 3D image processing module, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object.” (Para 0006)
“The network resources 122 can also host network cloud services, which can support the functionality used to compute or assist in processing image input or image input analysis.” (Para 0028)
“The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
“The example system and method for 3D object detection can include a 3D image processing system configured to receive image data from at least one camera associated with an autonomous vehicle. An example embodiment can be configured to output the location of a 2D bounding box around a detected object, and the location of the eight corners that depict the size and direction (heading) of the object. This is an improvement over conventional systems that do not provide real-world 3D information. With geological information related to a particular environment (e.g., road or terrain information) and camera calibration matrices, the example embodiment can accurately calculate the exact size and location of the object imaged by the camera in 3D coordinates.” (Para 0005)
Claim 51:
Wang in combination with the references relied upon in Claim 29 teach those respective limitations. Wang further teaches:
wherein the execution of the instructions included in the memory further cause the at least one processor to
(Wang) – “The in-vehicle control system 150 can be configured to include a data processor 171 to execute the 3D image processing module 200 for processing image data received from one or more of the vehicle subsystems 140. The data processor 171 can be combined with a data storage device 172 as part of a computing system 170 in the in-vehicle control system 150. The data storage device 172 can be used to store data, processing parameters, camera parameters, terrain data, and data processing instructions. A processing module interface 165 can be provided to facilitate data communications between the data processor 171 and the 3D image processing module 200. In various example embodiments, a plurality of processing modules, configured similarly to 3D image processing module 200, can be provided for execution by data processor 171.” (Para 0026)
bounding boxes associated with the target object generated by the target object analysis module based on analysis of two or more of the plurality of captured images.
(Wang) – “The 3D image processing module 200, as described herein, can be used to obtain the 3D attributes of an object, including its length, height, width, 3D spatial location (all in meters) in the camera coordinate space, and the moving direction (heading) of the object. In an example embodiment, the 3D image processing module 200 can include two submodules, namely; 1) a deep learning module 212 that learns the pixel coordinates of the 2D bounding box and all vertices of the 3D bounding box in the image plane; and 2) a fitting module 214 that solves the 3D attributes using geological information from a terrain map and camera information including camera calibration matrices with camera extrinsic and intrinsic matrices. A camera extrinsic matrix denotes the coordinate system transformations from 3D world coordinates to 3D camera coordinates. A camera intrinsic matrix denotes the coordinate system transformations from 3D camera coordinates to 2D image coordinates. The deep learning module 212 and the fitting module 214 are described in more detail below and in connection with FIG. 21. The 3D image processing module 200 can run in real-time across multiple cameras and can significantly contribute to the perception pipeline and improve the robustness and the safety level of an autonomous driving system. Details of the various example embodiments are provided below.” (Para 0050)
“The example system and method for 3D object detection can include a 3D image processing system configured to receive image data from at least one camera associated with an autonomous vehicle. An example embodiment can be configured to output the location of a 2D bounding box around a detected object, and the location of the eight corners that depict the size and direction (heading) of the object. This is an improvement over conventional systems that do not provide real-world 3D information. With geological information related to a particular environment (e.g., road or terrain information) and camera calibration matrices, the example embodiment can accurately calculate the exact size and location of the object imaged by the camera in 3D coordinates.” (Para 0005)
Wang does not explicitly teach:
determine a velocity of the target object based on
Tong, in the same field of endeavor of object detection, teaches:
determine a velocity of the target object based on
(Tong) – “The processor is further configured to track movement of a plurality of temporally spaced bounding boxes to determine a movement of the target object. The processor is further configured to determine a velocity of the target object across a line of sight of the vehicle.” (Para 0008)
“FIG. 6 shows an image 600 that includes bounding boxes 610, 612, 614, 616 and 618 corresponding to the regions of interest 510, 512, 514, 516 and 518 of FIG. 5, respectively. A bounding box is drawn around each target vehicle. The bounding box indicates a region taken up by the target object. The autonomous vehicle 102 tracks motion of the bounding box in order to know the location of the target object. Once the bounding box is determined, its motion can be tracked as the associated target object moves within the field of view of the vehicle 102. Additionally, the target object can be classified and various parameters of motion of the target object, such as distance, azimuthal location, velocity, etc., can be determined.” (Para 0035)
Therefore, it would be obvious to one of ordinary skill in the art, befo