Detailed Action
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 4 and 11-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 11 and 17 recite “generate a plurality of ground truth depth values for sample pixels of a scene captured in a camera image by a camera, one or more objects of the scene reflecting light projected onto the scene by a light projector in an illumination pattern” which is indefinite. It is unclear what the limitation “one or more objects of the scene reflecting light projected onto the scene by a light projector in an illumination pattern” is intended to modify. For example, it is unclear if the limitation is meant to describe the scene, the sample pixels, the camera image, or the generation of the ground truth depth values. Thus, a person of ordinary skill in the art would not be able to ascertain the scope of the claim. For examination purposes, the limitation will be interpreted to require that any object in the scene reflects light projected as an illumination pattern.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101.
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to a mathematical concept of calculating depth values, without significantly more.
The claim recites:
“A method comprising: projecting, by a light projector, an illumination pattern onto a scene; capturing, by a camera, a camera image of the scene; generating, by a computer device, a plurality of ground truth depth values for sample pixels of the camera image based at least in part on the illumination pattern; and estimating a depth map for the scene based at least in part on the camera image and the ground truth depth values for sample pixels.”
The limitations, as drafted, are processes that, under their broadest reasonable interpretation, cover mathematical calculations. In particular, the step for generating ground truth depth values for sample pixels of the camera image, under the broadest reasonable interpretation of the claim in light of the specification, encompasses a calculation of depth by performing triangulation operations (see paragraph 0052 of specifications).
The judicial exception is not integrated into a practical application. For example, the claim
recites the additional elements, (1) “projecting, by a light projector, an illumination pattern onto a scene;”, (2) “capturing, by a camera, a camera image of the scene;”, and (3) “estimating a depth map for the scene based at least in part on the camera image and the ground truth depth values for sample pixels.”. Additional elements (1) and (2) can reasonably be interpreted as merely a data gathering step of the method. An illumination pattern is projected onto a scene and captured by a camera for depth calculations; therefore, the additional elements does not add a meaningful limitation to the method as it is an insignificant extra-solution activity. Additional element (3) is recited at a high level of generality
such that they amount to estimating depth maps based on a mathematical calculation. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly
more than the judicial exception. As discussed above with respect to integration of the abstract idea into
a practical application, the additional elements (1) and (2) can be reasonably interpreted as well-understood, routine, and conventional in the field. Therefore, these limitation remains insignificant extra-solution activity even upon reconsideration and do not amount to significantly more. Further, as discussed above with respect to integration of the abstract idea into a practical application, the additional element (3) is recited at a high-level of generality. It is therefore a judicial exception that is not integrated into a practical application, and does not include additional elements that are sufficient to amount to significantly more than the judicial exception. This claim is not patent eligible.
Claim 2 is rejected under 35 U.S.C. 101 because the claim recites additional elements recited at a high
level of generality such that they amount to merely estimating depth maps using a generic machine learning model. Accordingly, these additional elements do not integrate the abstract idea into
a practical application because they do not impose any meaningful limits on practicing the abstract idea.
This claim is not patent eligible.
Claim 3 is rejected under 35 U.S.C. 101 because the claim recites additional elements recited at a high
level of generality such that they amount to merely training a machine learning model using depth values. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
This claim is not patent eligible.
Claim 4 is rejected under 35 U.S.C. 101 because the claim recites additional elements recited at a high
level of generality such that they amount to merely training a machine learning model using depth values. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
This claim is not patent eligible.
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to a further limitation of additional element (1) discussed above. Thus, the additional elements do not add a meaningful limitation to the method as it is insignificant extra-solution activity and can be reasonably interpreted as well-understood, routine, and conventional in the field. This claim is not patent eligible.
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to a further limitation of additional element (1) discussed above. Thus, the additional elements do not add a meaningful limitation to the method as it is insignificant extra-solution activity and can be reasonably interpreted as well-understood, routine, and conventional in the field. This claim is not patent eligible.
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to a further limitation of additional element (1) discussed above. Thus, the additional elements do not add a meaningful limitation to the method as it is insignificant extra-solution activity and can be reasonably interpreted as well-understood, routine, and conventional in the field. This claim is not patent eligible.
Claim 8 is rejected under 35 U.S.C. 101 because the claim recites additional elements recited at a high
level of generality such that they amount to merely detecting objects based on the depth maps. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. This claim is not patent eligible.
Claim 9 is rejected under 35 U.S.C. 101 because the claim recites additional elements recited at a high
level of generality such that they amount to merely resolving a scale of objects based on the depth maps. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. This claim is not patent eligible.
Claim 10 is rejected under 35 U.S.C. 101 because the claim recites additional elements recited at a high
level of generality such that they amount to merely performing monocular depth estimation for estimating the depth map. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. This claim is not patent eligible.
Claims 11 and 17 contains elements found analogous to that of claim 1, with the addition of “a memory that stores instructions; and processing circuity that execute the instructions” and “non-transitory computer-readable storage media”. The additional elements can be reasonably interpreted as merely using a generic computer as a tool to implement the abstract idea. Implementing an abstract idea on a generic computer does not integrate a judicial exception into a practical application. Thus, claims 11 and 17 are similarly rejected under 35 U.S.C. 101.
Claims 12-16 contain elements found analogous to that of claims 2-6, respectively. Claims 18-20 contain elements found analogous to that of claims 8-10, respectively. Accordingly, claims 12-16 and 18-20 are similarly rejected under 35 U.S.C. 101.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-8, 10-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (US 20200210726 A1), (hereinafter Yang) in view of Potter et al. (US 10891745 B1), (hereinafter Potter).
Regarding claim 1, Yang teaches a method comprising:
capturing, by a camera, a camera image of the scene (Yang, “Now referring to FIG. 1, FIG. 1 is a data flow diagram for a process 100 of training a machine learning model(s) to predict distances to objects and/or obstacles in an environment, in accordance with some embodiments of the present disclosure. The process 100 may include generating and/or receiving sensor data 102 from one or more sensors of the vehicle 1400… In some examples, the sensor data 102 may include the sensor data generated by one or more forward-facing sensors, side-view sensors, and/or rear-view sensors. This sensor data 102 may be useful for identifying, detecting, classifying, and/or tracking movement of objects around the vehicle 1400 within the environment. In embodiments, any number of sensors may be used to incorporate multiple fields of view ( e.g., the fields of view of the long-range cameras 1498, the forward-facing stereo camera 1468, and/or the forward facing wide-view camera 1470 of FIG. 14B) and/or sensory fields (e.g., of a LIDAR sensor 1464, a RADAR sensor 1460, etc.).”, paragraphs 0039-0040, see Figs 1 and 14B);
generating, by a computer device, a plurality of ground truth depth values for sample pixels of the camera image (Yang, “In some embodiments, a machine learning model(s) 104 may be trained to predict distance(s) 106 and/or object detection(s) 116 using image data alone. For example, the process 100 may be used to train the machine learning model(s) 104 to predict the distance(s) 106----or a depth map that may be converted to distances-of one or more objects and/or obstacles in the environment using images alone as input data… In order to more effectively train the machine learning model(s) 104, however, additional data from the sensor data 102-such as LIDAR data, RADAR data, SONAR data, and/or the like may be used to generate ground truth data corresponding to the images (e.g., via ground truth encoding 110).”, paragraph 0041, “LIDAR data and/or RADAR data, as non-limiting examples-may be used to determine distances to the objects or obstacles corresponding to the respective bounding shapes. As such, where a distance to an object or obstacle may be difficult to ascertain accurately using image data alone-or another two-dimensional representation this additional sensor data 102 may be used to increase the accuracy of the predictions with respect to the distances to objects or obstacles within the images.”, paragraph 0043, lines 17-25, see Fig. 5A, ground truth depth maps 501A-502C, Ground truth depth maps are generated for vehicle camera images using depth measurements from LIDAR and/or RADAR. These depth maps are used in training a machine learning model to predict object depth from camera images alone.); and
estimating a depth map for the scene based at least in part on the camera image and the ground truth depth values for sample pixels (Yang, “FIG. 9 is a flow diagram showing a method 900 for
predicting distances to objects and/or obstacles in an environment using a machine learning model(s), in accordance with some embodiments of the present disclosure. The method 900, at block B902, includes applying first data representative of an image of a field of view of an image sensor to a neural network, the neural network trained based at least in part on second data representative of ground truth information generated using at least one of a LIDAR sensor or a RADAR sensor… The method 900, at block B904, includes computing, using the neural network and based at least in part on the first data, third data representative of depth values corresponding to an image. For example, the machine learning model(s) 104 may compute the distance(s) 106 (or a depth map representative thereof).”, pg. 11, paragraphs 0108-0109, see Fig. 5A, predicted depth maps 504A-504C, and Fig. 9, The model is trained to predict depth maps for vehicle camera images at run time for object detection.).
Yang does not teach projecting, by a light projector, an illumination pattern onto a scene; and generating, by a computer device, a plurality of ground truth depth values for sample pixels of the camera image based at least in part on the illumination pattern.
However, Potter teaches projecting, by a light projector, an illumination pattern onto a scene; and generating, by a computer device, a plurality of ground truth depth values for sample pixels of the camera image based at least in part on the illumination pattern (Potter, “The present disclosure provides a hybrid system with a structured-light stereo device (also referred to as a device that provides laser triangulation or structured-light pattern projection) and a time of flight device for real-time depth sensing that can determine more accurate range and reflectance measurements. The hybrid system provides high resolution, real-time depth sensing that can be used by automated systems for a wide range of applications including but not limited to autonomous vehicles, robotics, and industrial manufacturing.”, column 2, lines 58-67, “The method 800 includes determining depth measurements of a scene using information received by the structured-light stereo device, via operation 802; determining time of flight measurements of the scene using information received by the time of flight device, via operation 804; generating a depth map using the depth measurements and generating calibration points using the time of flight measurements, via operation 806; and updating the depth map using the calibration points, via operation 808.”, column 11, lines 44-52, see Fig. 8, Depth maps are generated using structured light projected on a scene captured by a vehicles camera. These depth maps are then updated using calibration points from LIDAR sensors.).
Yang teaches training a machine learning model using ground-truth depth maps generated based on LIDAR and/or RADAR sensor data (Yang, paragraph 0041, paragraph 0043, lines 17-25, see Fig. 5A, ground truth depth maps 501A-502C). Potter teaches generating depth maps for vehicle camera images based on structured light in combination with LIDAR calibration points (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the machine learning model of Yang by replacing the LIDAR generated ground-truth depth maps with depth maps generated using structured light, as taught by Potter (Potter, column 2, lines 58-67, column 11, lines 44-52, see Fig. 8). The motivation for doing so would have been to provide higher-resolution coverage for ground-truth depth map generation (as suggested by Potter, “The hybrid system also provides high spatial resolution coverage that time of flight devices or LIDAR systems alone cannot provide.”, column 3, lines 51-53), thereby improving model training. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Yang with Potter to obtain the invention as specified in claim 1.
Regarding claim 2, Yang in view of Potter teaches the method of claim 1, wherein estimating the depth map comprises estimating the depth map using a machine learning depth model (Yang, “The method 900, at block B904, includes computing, using the neural network and based at least in part on the first data, third data representative of depth values corresponding to an image. For example, the machine learning model(s) 104 may compute the distance(s) 106 (or a depth map representative thereof).”, pg. 11, paragraph 0109, see Fig. 5A, predicted depth maps 504A-504C, and Fig. 9).
Regarding claim 3, Yang in view of Potter teaches the method of claim 2, further comprising training the machine learning depth model using the ground truth depth values for sample pixels (Yang, “ …additional data from the sensor data 102-such as LIDAR data, RADAR data, SONAR data, and/or the like may be used to generate ground truth data corresponding to the images (e.g., via ground truth encoding 110). In return, the ground truth data may be used to increase the accuracy of the machine learning model(s) 104…”, pg. 3, paragraph 0041, lines 15-20, Generated ground truth depth maps are used to train the machine learning model.).
Regarding claim 4, Yang in view of Potter teaches the method of claim 2, Further comprising training the machine learning depth model using the ground truth depth values for sample pixels and estimated depth ground truth values for the camera image generated from Light Detection and Ranging (LIDAR) sensor data (Potter, “The method 800 includes determining depth measurements of a scene using information received by the structured-light stereo device, via operation 802; determining time of flight measurements of the scene using information received by the time of flight device, via operation 804; generating a depth map using the depth measurements and generating calibration points using the time of flight measurements, via operation 806; and updating the depth map using the calibration points, via operation 808.”, column 11, lines 44-52, see Fig. 8, The hybrid system generates ground truth depth maps based on depth data of both a structured-light device and LIDAR sensor.).
Regarding claim 5, Yang in view of Potter teaches the method of claim 1, wherein the light projector, the camera, and the computing device are disposed on a vehicle (Yang, “In contrast to conventional systems, such as those described above, a DNN may be trained-using one or more depth sensors, such as LIDAR sensors, RADAR sensors, SONAR sensors, and/or the like-to predict distances to objects or obstacles in the environment using image data generated by one or more cameras of a vehicle.”, pg. 1, paragraph 0008, lines 1-6, see Figs. 14A-14B).
Regarding claim 6, Yang in view of Potter teaches the method of claim 5, further comprising projecting, by the light projector, the illumination pattern onto the scene in front of the vehicle (Potter, “The structured-light stereo device and the time of flight device components or sub-systems can be
housed within the same headlight space of the vehicle, can be housed in another shared location (e.g., on the top of the vehicle, on the front of the vehicle near the license plate housing, etc.), or can be housed within the vehicle using a different configuration.”, column 9, lines 42-47, see Fig. ).
Regarding claim 7, Yang in view of Potter teaches the method of claim 1, wherein the light projector projects the illumination pattern using near infrared light (Potter, “The light source 400
includes a light emitter that comprises a vertical-cavit surface-emitting laser (VCSEL) array 402, a phase modulator 404 in series with the VCSEL array 402, a projection image plane 406, and projection optics 408”, column 6, lines 57-61, The hybrid system uses a VCSEL array to project illumination patterns. This array provides structured light projections at near-infrared wavelengths1.).
Regarding claim 8, Yang in view of Potter teaches the method of claim 1, further comprising detecting at least one object in the scene based at least in part on the depth map (Yang, “A decoder 706 may use the output(s) 114 and/or the outputs of the object detector 708 to determine a correlation between the depth values from the distance(s) 106 (e.g., from the predicted depth map) and the bounding shape(s) corresponding to the objects. For example, where a single bounding shape is computed for an object, the distance values corresponding to pixels of the image within the bounding shape of the object may be used by the decoder 706 to determine the distance to the object.”, pg. 11, paragraph 0102, lines 1-9, The depth maps estimated by the machine learning model allows the system to detect distances of objects from the vehicle.).
Regarding claim 10, wherein estimating the depth map for the scene comprises monocular depth estimation (Yang, “The sensor data 702 may include similar sensor data to that described herein at least with respect to FIGS. 1 and 2. However, in some embodiments, the sensor data 702 applied to the machine learning model(s) 104 in deployment may be image data only. For example, using the
process 100 of FIG. 1 the machine learning model(s) 104 may be trained to accurately predict the distance(s) 106 and/or the object detection(s) 116 using image data alone. In such embodiments, the image data may be generated by one or more cameras ( e.g., a single monocular camera, in embodiments, such as a wide view camera 1470 of FIG. 14B, multiple camera(s), etc.).”, pg. 10, paragraph 0096, lines 5-16).
Regarding claim 11, Yang teaches an apparatus comprising:
a memory that stores instructions; and processing circuitry (Yang, “FIG. 15 is a block diagram of an example computing device 1500 suitable for use in implementing some embodiments of the present disclosure. Computing device 1500 may include a bus 1502 that directly or indirectly couples the following devices: memory 1504, one or more central processing units (CPUs)…”, pg. 26, paragraph 0236, lines 1-6, see Fig. 15) that executes the instruction to:
generate a plurality of ground truth depth values for sample pixels of a scene captured in a camera image by a camera (Yang, “In some embodiments, a machine learning model(s) 104 may be trained to predict distance(s) 106 and/or object detection(s) 116 using image data alone. For example, the process 100 may be used to train the machine learning model(s) 104 to predict the distance(s) 106----or a depth map that may be converted to distances-of one or more objects and/or obstacles in the environment using images alone as input data… In order to more effectively train the machine learning model(s) 104, however, additional data from the sensor data 102-such as LIDAR data, RADAR data, SONAR data, and/or the like may be used to generate ground truth data corresponding to the images (e.g., via ground truth encoding 110).”, paragraph 0041, “LIDAR data and/or RADAR data, as non-limiting examples-may be used to determine distances to the objects or obstacles corresponding to the respective bounding shapes. As such, where a distance to an object or obstacle may be difficult to ascertain accurately using image data alone-or another two-dimensional representation this additional sensor data 102 may be used to increase the accuracy of the predictions with respect to the distances to objects or obstacles within the images.”, paragraph 0043, lines 17-25, see Fig. 5A, ground truth depth maps 501A-502C, Ground truth depth maps are generated for vehicle camera images using depth measurements from LIDAR and/or RADAR. These depth maps are used in training a machine learning model to predict object depth from camera images alone.); and
estimate a depth map for the scene based at least in part on the camera image and the ground truth depth values for sample pixels (Yang, “FIG. 9 is a flow diagram showing a method 900 for
predicting distances to objects and/or obstacles in an environment using a machine learning model(s), in accordance with some embodiments of the present disclosure. The method 900, at block B902, includes applying first data representative of an image of a field of view of an image sensor to a neural network, the neural network trained based at least in part on second data representative of ground truth information generated using at least one of a LIDAR sensor or a RADAR sensor… The method 900, at block B904, includes computing, using the neural network and based at least in part on the first data, third data representative of depth values corresponding to an image. For example, the machine learning model(s) 104 may compute the distance(s) 106 (or a depth map representative thereof).”, pg. 11, paragraphs 0108-0109, see Fig. 5A, predicted depth maps 504A-504C, and Fig. 9, The model is trained to predict depth maps for vehicle camera images at run time for object detection.).
Yang does not teach one or more objects of the scene reflecting light projected onto the scene by a light projector in an illumination pattern.
However, Potter teaches one or more objects of the scene reflecting light projected onto the scene by a light projector in an illumination pattern (Potter, “The present disclosure provides a hybrid system with a structured-light stereo device (also referred to as a device that provides laser triangulation or structured-light pattern projection) and a time of flight device for real-time depth sensing that can determine more accurate range and reflectance measurements. The hybrid system provides high resolution, real-time depth sensing that can be used by automated systems for a wide range of applications including but not limited to autonomous vehicles, robotics, and industrial manufacturing.”, column 2, lines 58-67, “The method 800 includes determining depth measurements of a scene using information received by the structured-light stereo device, via operation 802; determining time of flight measurements of the scene using information received by the time of flight device, via operation 804; generating a depth map using the depth measurements and generating calibration points using the time of flight measurements, via operation 806; and updating the depth map using the calibration points, via operation 808.”, column 11, lines 44-52, see Fig. 8, Depth maps are generated using structured light projected on a scene containing objects, captured by a vehicles camera. These depth maps are then updated using calibration points from LIDAR sensors.).
Yang teaches training a machine learning model using ground-truth depth maps generated based on LIDAR and/or RADAR sensor data (Yang, paragraph 0041, paragraph 0043, lines 17-25, see Fig. 5A, ground truth depth maps 501A-502C). Potter teaches generating depth maps for vehicle camera images based on structured light in combination with LIDAR calibration points (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the machine learning model of Yang by replacing the LIDAR generated ground-truth depth maps with depth maps generated using structured light, as taught by Potter (Potter, column 2, lines 58-67, column 11, lines 44-52, see Fig. 8). The motivation for doing so would have been to provide higher-resolution coverage for ground-truth depth map generation (as suggested by Potter, “The hybrid system also provides high spatial resolution coverage that time of flight devices or LIDAR systems alone cannot provide.”, column 3, lines 51-53), thereby improving model training. Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Yang with Potter to obtain the invention as specified in claim 11.
Regarding claim 12, Yang in view of Potter teaches the apparatus of claim 11, wherein the processing circuitry executes instructions to estimate the depth map comprises the processing circuitry to execute instruction to estimate the depth map using a machine learning depth model (Yang, “The method 900, at block B904, includes computing, using the neural network and based at least in part on the first data, third data representative of depth values corresponding to an image. For example, the machine learning model(s) 104 may compute the distance(s) 106 (or a depth map representative thereof).”, pg. 11, paragraph 0109, see Fig. 5A, predicted depth maps 504A-504C, and Fig. 9).
Regarding claim 13, Yang in view of Potter teaches the apparatus of claim 12, further comprising the processing circuitry to execute instruction to train the machine learning depth model using the ground truth depth values for sample pixels (Yang, “ …additional data from the sensor data 102-such as LIDAR data, RADAR data, SONAR data, and/or the like may be used to generate ground truth data corresponding to the images (e.g., via ground truth encoding 110). In return, the ground truth data may be used to increase the accuracy of the machine learning model(s) 104…”, pg. 3, paragraph 0041, lines 15-20, Generated ground truth depth maps are used to train the machine learning model.).
Regarding claim 14, Yang in view of Potter teaches the apparatus of claim 12, further comprising the processing circuitry to execute instructions to train the machine learning depth model using the ground truth depth values for sample pixels and estimated depth ground truth values for the camera image generated from Light Detection and Ranging (LIDAR) sensor data (Potter, “The method 800 includes determining depth measurements of a scene using information received by the structured-light stereo device, via operation 802; determining time of flight measurements of the scene using information received by the time of flight device, via operation 804; generating a depth map using the depth measurements and generating calibration points using the time of flight measurements, via operation 806; and updating the depth map using the calibration points, via operation 808.”, column 11, lines 44-52, see Fig. 8, The hybrid system generates ground truth depth maps based on depth data of both a structured-light device and LIDAR sensor.).
Regarding claim 15, Yang in view of Potter teaches the apparatus of claim 11, wherein the light projector, the camera, the memory and the processing circuitry are disposed on a vehicle (Yang, “In contrast to conventional systems, such as those described above, a DNN may be trained-using one or more depth sensors, such as LIDAR sensors, RADAR sensors, SONAR sensors, and/or the like-to predict distances to objects or obstacles in the environment using image data generated by one or more cameras of a vehicle.”, pg. 1, paragraph 0008, lines 1-6, see Figs. 14A-14B).
Regarding claim 16, Yang in view of Potter teaches the apparatus of claim 15, wherein the light projector is to project the illumination pattern onto the scene in front of the vehicle (Potter, “The structured-light stereo device and the time of flight device components or sub-systems can be housed within the same headlight space of the vehicle, can be housed in another shared location (e.g., on the top of the vehicle, on the front of the vehicle near the license plate housing, etc.), or can be housed within the vehicle using a different configuration.”, column 9, lines 42-47, see Fig. ).
Claim 17 corresponds to claim 11, with the addition of a non-transitory computer-readable storage media comprising instructions, that when executed by processing circuitry, cause the processing circuitry to perform the functions according to claim 11. Yang in view of Potter teaches the addition of a non-transitory computer-readable storage media comprising instructions (Yang, “The memory 1504 may include any of a variety of computer-readable media.”, pg. 26, paragraph 0239, lines 1-2), that when executed by processing circuitry, cause the processing circuitry to perform the functions according to claim 11. As indicated in the analysis of claim 11, Yang in view of Potter teaches all the limitation according to claim 11. Therefore, claim 17 is rejected for the same reasons of obviousness as claim 11.
Regarding claim 18, Yang in view of Potter teaches the non-transitory computer-readable storage media of claim 17, comprising instructions, that when executed by processing circuitry, cause the processing circuitry to: detect the one or more object in the scene based at least in part on the depth map (Yang, “A decoder 706 may use the output(s) 114 and/or the outputs of the object detector 708 to determine a correlation between the depth values from the distance(s) 106 (e.g., from the predicted depth map) and the bounding shape(s) corresponding to the objects. For example, where a single bounding shape is computed for an object, the distance values corresponding to pixels of the image within the bounding shape of the object may be used by the decoder 706 to determine the distance to the object.”, pg. 11, paragraph 0102, lines 1-9, The depth maps estimated by the machine learning model allows the system to detect distances of objects from the vehicle.).
Regarding claim 20, Yang in view of Potter teaches the non-transitory computer-readable storage media of claim 18, comprising instructions, that when executed by processing circuitry, cause the processing circuitry to: estimate the depth map for the scene by monocular depth estimation (Yang, “The sensor data 702 may include similar sensor data to that described herein at least with respect to FIGS. 1 and 2. However, in some embodiments, the sensor data 702 applied to the machine learning model(s) 104 in deployment may be image data only. For example, using the process 100 of FIG. 1 the machine learning model(s) 104 may be trained to accurately predict the distance(s) 106 and/or the object detection(s) 116 using image data alone. In such embodiments, the image data may be generated by one or more cameras ( e.g., a single monocular camera, in embodiments, such as a wide view camera 1470 of FIG. 14B, multiple camera(s), etc.).”, pg. 10, paragraph 0096, lines 5-16).
Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (US 20200210726 A1) in view of Potter et al. (US 10891745 B1) and further in view of Busam et al. (US 12260575 B2), (hereinafter Busam).
Regarding claim 9, Yang in view of Potter teaches the method of claim 8. Yang in view of Potter does not teach further comprising resolving a scale of the at least one object in the scene based at least in part on the depth map.
However, Busam teaches resolving a scale of the at least one object in the scene based at least in part on the depth map (Busam, “The described neural network can perform both stereo-based depth and scale estimation simultaneously. The method for training the network involves entangling the two. The problem of scale ambiguity is addressed as a multi-task problem consisting of the two tasks (1) scale-ambiguous depth estimation and (2) scale parameter estimation. The two tasks are based on the same data and entangle their results such that each task profits from the other. Task (1) is to estimate a depth map from a temporal monocular input (i.e. images at different time instances) with the scale "s0". Task (2) is to estimate a scaling parameter "st" based on the same data and scale the output of task (1) accordingly. Thus the two tasks can be fused in an end-to-end image processing
pipeline. At runtime, the depth estimation branch may be dropped, thereby resolving the scale ambiguity.”, column 6, lines 30-50, see Figs. 7 and 10, Neural networks are trained to estimate both a depth map and a scaling parameter for a scene. This scaling parameter is applied to the depth map to resolve the scale of objects.).
Yang in view of Potter teaches estimating depth maps of a scene using a machine learning model (Yang, pg. 11, paragraph 0109, see Fig. 5A, predicted depth maps 504A-504C, and Fig. 9). Busam teaches training a machine learning model to resolve a scale ambiguity for object in depth maps of a scene (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the machine learning model of Yang in view of Potter to include scale correction for the depth maps as taught by Busam. The motivation for doing so would have been to obtain both relative and absolute sizes of objects, thereby resolving scale ambiguity and improving depth accuracy (as suggested by Busam, “If the scale ambiguity is resolved, this means that the relative and/or absolute sizes of objects within a scene being captured will be known by using their relative depths within the image or their actual depths (i.e. distance from the camera).”, column 6, lines 44-48). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Yang in view of Potter with Busam to obtain the invention as specified in claim 9.
Regarding claim 19, Yang in view of Potter teaches the non-transitory computer-readable storage media of claim 18.
Yang in view of Potter does not teach resolving a scale of the at least one object in the scene based at least in part on the depth map.
However, as indicated in the analysis of claim 9 above, the combination of Yang in view of Potter and further in view of Busam teaches resolving a scale of the at least one object in the scene based at least in part on the depth map (see analysis of claim 9). Therefore, claim 19 is rejected for the same reasons of obviousness as claim 9.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CONNOR LEVI HANSEN whose telephone number is (703)756-5533. The examiner can normally be reached Monday-Friday 9:00-5:00 (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CONNOR L HANSEN/Examiner, Art Unit 2672
/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672
1 “This NIR illuminator is available in various wavelengths in the IR including standard wavelengths at 850 and 860 nm, but is also available at other wavelength options, including 808, 885, 940, and 975 nm.” Vertical Cavity Surface Emitting Laser (VCSEL) Array Technology | RPMC, www.rpmclasers.com/blog/vertical-cavity-surface-emitting-laser-vcsel-array-technology/. Accessed 4 Nov. 2025.