DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see Remarks pages 10, filed 09/15/2025, with respect to the rejections of claims 1-6, 11-16, and 21 under 35 U.S.C. 112(b) have been fully considered and are persuasive. The rejections of claims 1-6, 11-16, and 21 have been withdrawn.
Applicant’s arguments, see Remarks pages 10-13, filed 09/15/2025, with respect to the rejections of claims 1-6, 8-16, and 21 under 35 U.S.C. 101, specifically with regards to the arguments directed towards the claim’s integration into a practical application on pages 11-13, have been fully considered and are persuasive. The rejections of claims 1-6, 8-16, and 21 have been withdrawn.
Applicant’s arguments, see Remarks pages 13-15, filed on 09/15/2025, with respect to the rejection of amended claim(s) 1, 8, 11, and 21 under 35 U.S.C. 102(a)(1) have been fully considered and are moot in view of the new grounds of rejection (detailed in the rejections below) necessitated by Applicant’s amendment to the claim(s).
Claim Objections
Claims 1, 8, 11, and 21 are objected to because of the following informalities:
Regarding claims 1, 8, 11 and 21, a duplicate instance of “of the” in the claim limitation “the position of the target object in the world coordinate system into positional information of the target object in the radar image indicating a position of the of the target object in the radar image,” needs to be deleted.
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-5, 8-15, and 21 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding claim 1, the claim recites the limitation(s): “determining that an incorrect label is likely to be generated for a target object in a radar image based on a shape or a pose of the target object in a radar image due to the shape or the pose of the target object being unclear in the radar image; based on determining that incorrect label is likely to be generated, determining, using a first picture image acquired by a first camera, a position of the target object in the first picture image”.
However, the Specification recites for the operation of the second embodiment, with reference to Figure 4, in paragraphs 0065-0067: “First, synchronization processing (S201) is an operation of the synchronization unit 201 in Fig. 3 and outputs a synchronization signal to the first camera measurement unit 202, the radar measurement unit 208, and the second camera measurement unit 210.Camera measurement processing (S202 ) is an operation of the first camera measurement unit 202 in Fig. 3…Target object position determination processing (S203) is an operation of the target object position determination unit 203 in Fig. 3,” wherein the operation steps of the embodiment do not disclose the determining of whether an incorrect label for a radar image is likely, prior to the determination of a target object’s position in an image. Therefore, the limitations directed to “determining that an incorrect label is likely to be generated for a target object in a radar image” constitute new matter.
Regarding claim 2-5, it/they is/are rejected under 112(a) for inheriting and failing to cure the deficiencies of parent claim 1.
As per claim(s) 8, 11, and 21, arguments made in rejecting claim(s) 1 are analogous.
Regarding claims 9-10 and 12-15, it/they is/are rejected under 112(a) for inheriting and failing to cure the deficiencies of the parent claims 8 and 11, respectively.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-5, 8-15, and 21 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, the limitations: “acquiring depth information associated with an imaging area of the first camera from the first picture image based on a second picture image acquired by a second camera or a measurement result of a radar” and “transforming, based on the position of the first camera in the world coordinate system with respect to a radar position as the origin of the world coordinate system and radar imaging information used when the radar image is generated from a measurement result of the radar,” are indefinite for it is unclear whether the radar corresponding to the radar image is the same as the radar corresponding to the depth information, or the same as the radar corresponding to an origin of the world coordinate system. For the purposes of examination, the radars disclosed in claim 1 are interpreted as referencing the same radar which corresponds to the radar image, depth information, and radar position as the origin of the world coordinate..
Regarding claim 2-5, it/they is/are rejected under 112(b) for inheriting and failing to cure the deficiencies of parent claim 1.
As per claim(s) 8, 11, and 21, arguments made in rejecting claim(s) 1 are analogous.
Regarding claims 9-10 and 12-15, it/they is/are rejected under 112(b) for inheriting and failing to cure the deficiencies of the parent claims 8 and 11.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3, 11-13, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Madhow et al. (US10451712B1) hereinafter referenced as Madhow, in view of Li et al. (3D triangulation based extrinsic calibration between a stereo vision system and a LIDAR) hereinafter referenced as Li.
Regarding claim 1, Madhow discloses: A data processing apparatus comprising: at least one memory configured to store instructions; and at least one processor configured to execute the instructions to perform operations (Madhow: 0026: “According to some aspects of the technology described herein, a system includes processing circuitry and memory…The processing circuitry determines, based on the common geographic region and the common time period, that one or more labeled objects in the one or more images map to one or more radar tracks.”), the operations comprising:
determining that an incorrect label is likely to be generated for a target object in a radar image based on a shape or a pose of the target object in a radar image due to the shape or the pose of the target object being unclear in the radar image (Madhow: 0029-0030: “radar units output radar tracks that include range, Doppler, and/or micro-Doppler measurements. However, identifying objects represented by the Doppler measurements may be challenging. Some aspects of the technology described herein provide techniques for identifying objects represented by the Doppler measurements. According to some schemes, representations of some common items (e.g., car, motorcycle, etc.), in terms of radar data, are collected and are stored in the memory of a computing machine. The computing machine then recognizes one of the common items by determining that an input radar data representation is similar to the stored representation. However, manually collecting and labeling many such stored representations may be tedious and expensive (in terms of human effort), and may not provide a diverse enough representation for reliable inference.”);
based on determining that incorrect label is likely to be generated, determining, using a first picture image acquired by a first camera, a position of the target object in the first picture image (Madhow: Figure 14; 0107: “As shown in FIG. 14, camera outputs 1405 are provided to a computer vision module 1410. The computer vision module 1410 produces labels and bounding boxes 1415 in image(s).”; 0109: “a computing machine identifies two adjacent bounding boxes 1505 and 1510 that represent humans.”; Wherein the computer vision module acquires the position of people in images.);
acquiring depth information associated with an imaging area of the first camera from the first picture image based on a second picture image acquired by a second camera or a measurement result of a radar; extracting a depth distance from the first camera to the target object based on the depth information associated with the imaging information of the first camera (Madhow: 0121: “The computer vision system, which may be based on a DNN, detects objects, draws bounding boxes around them, and classifies the detected objects. These bounding boxes are typically 2D and based on the image sensor, but can have depth information from stereo image sensors.”; Wherein the depth distance from the camera to the target object is acquired through a stereo image from the camera);
transforming, based on the depth distance, the position of the target object in the first picture image into a position of the target object in a world coordinate system (Madhow: Figures 13 & 14; 0115: “As shown in FIG. 20, the CVD detects an object at range r and angle θ with respect to its own frame of reference at time t. This is converted to an approximate object geographic location in a global reference frame.”; 0121: “These bounding boxes are typically 2D and based on the image sensor, but can have depth information from stereo image sensors…The radar system uses these labels for the radar data corresponding to the geometric region indicated by the vision system.”; 0130: “the geometric registration between the radar data and the labels and bounding boxes output by the vision system may indicate that multiple objects that are not separable by the radar system correspond to a given portion of the radar data.”; Wherein the positions of detected object in an image are converted to a global reference frame based on geometric registration between a radar and camera.);
transforming, based on the position of the first camera in the world coordinate system (Madhow: 0156: “the disclosed system for gathering and labeling data comprises one or more radar units, along with one or more computer vision devices, whose location and orientation with respect to a common coordinate frame are known or can be estimated.”) and radar imaging information used when the radar image is generated from a measurement result of the radar (Madhow: 0108: “radar data 1425 is provided to localization and tracking module 1430. The localization and tracking module 1430 provides targets to isolate data for track module 1435 and provides tracks to the geometric reconfiguration module 1420.”), the position of the target object in the world coordinate system into positional information of the target object in the radar image indicating a position of the target object in the radar image; and
determining a label of the target object in the radar image based on the positional information (Madhow: Figure 14; 0108: “The data for the tracks from the isolate data for track module 1435 and the geometric reconfiguration from the geometric reconfiguration module 1420 are provided to the label radar data for track module 1440. The label radar data for track module 1440 generates labeled radar data 1445”;
0115: “the CVD detects an object at range r and angle θ with respect to its own frame of reference at time t. This is converted to an approximate object geographic location in a global reference frame. The computing machine assigns the label “human” to the detected object 2020 . The radar system takes labels generated by the vision system and assigns those labels to radar tracks at approximately the same geographic location and time. The radar track(s) corresponding to the trajectory that includes the geographic location and time detected from the vision system are saved, and the label “human” is assigned to those radar track(s).”; Wherein the positions of detected objects in an image are converted to a world coordinate system and then assigned to objects in radar data in order to generate labeled radar data.).
Madhow does not disclose expressly: transforming, based on the depth distance, the position of the target object in the first picture image into a position of the target object in a world coordinate system with a position of the first camera as an origin of the world coordinate system; transforming, based on the position of the first camera in the world coordinate system with respect to a radar position as the origin of the world coordinate system and radar imaging information used when the radar image is generated from a measurement result of the radar, the position of the target object in the world coordinate system into positional information of the target object in the radar image indicating a position of the target object in the radar image.
Li discloses: transforming, based on a depth distance, the position of an object in an image into a position of the object in a world coordinate system with a position of a camera as an origin of the world coordinate system (Li: II. B. Coordinate Systems and Sensor Models: “1) Coordinate Systems: To analyze such a multi-sensor system, we set several coordinate systems with respect to the different sensors…since the stereoscopic system is composed of two cameras, we assume the left camera frame as the reference of the stereoscopic system coordinate system, R 3 stereo = R 3 left.
2) Sensor Models: The left camera and right camera are modeled by the classical pinhole model. Suppose a point P in the left camera frame: Pl = (Xl Yl Zl)T. Then, with zero skew, its corresponding point coordinates p l ij = [xl , yl ] in the image frame, are given by:
PNG
media_image1.png
21
234
media_image1.png
Greyscale
… where KL denotes the intrinsic matrix of the left camera”; Wherein the conversion from the image coordinate system to the stereoscopic coordinate system constitutes a conversion to a world coordinate system.);
transforming, based on the position of the camera in the world coordinate system with respect to a radar position as the origin of the world coordinate system and radar imaging information used when a radar image is generated from a measurement result of the radar (Li: II. B. Coordinate Systems and Sensor Models: “For the LIDAR coordinate system, we specify the origin as the point which emits laser rays. The directions of X, Y , Z are set as rightward, upward and forward from the sensor respectively”; Wherein the lidar coordinate system, with an origin at the lidar sensor, constitutes a world coordinate system.), the position of the object in the world coordinate system into positional information of the object in the radar image indicating a position of the object in the radar image (Li: II. B. Coordinate Systems and Sensor Models: “We denote (Φl , ∆l) and (Φr, ∆r) as the 3D rigid transformation from LIDAR coordinate system R 3 lidar to the left and right camera coordinate systems R 3 left, R 3 right respectively, where Φl , Φr are the orthogonal rotation matrices, ∆l and ∆r are translation vectors. Suppose a fixed point P observed by both the LIDAR and the stereoscopic system, denoted by Plidar in the LIDAR’s coordinate system, and Pl , Pr in the left and right camera coordinate systems. The three coordinates are connected by:
PNG
media_image2.png
73
235
media_image2.png
Greyscale
”; Wherein the calibration between Stereo Vision and Lidar systems allowing for conversion of Lidar coordinate points to camera coordinate points allows for inversely converting camera coordinate points to Lidar coordinate points.).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the calibration algorithms between the binocular stereo vision system and Lidar taught by Li for the registration of the camera and radar systems disclosed by Madhow. The suggestion/motivation for doing so would have been “Based on 3D reconstruction and non-linear optimization algorithm, rigid transformation parameters are calculated conveniently by making a common calibration chessboard detected by all the sensors with several different poses. Moreover, the real data experiments show that our method results in less system errors than Zhang’s method, which is widely used. The proposed method can be applied in any multisensor fusion system which consists of multiple cameras and multiple LIDARs” (Li: V. CONCLUSION AND FUTURE WORK). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Madhow with Li to obtain the invention as specified in claim 1.
Regarding claim 2, Madhow in view of Li discloses: The data processing apparatus according to claim 1, wherein the radar imaging information includes a starting point of an area being a target of the radar image in the world coordinate system (Madhow: 0156: “the disclosed system for gathering and labeling data comprises one or more radar units, along with one or more computer vision devices, whose location and orientation with respect to a common coordinate frame are known or can be estimated.”) and a length per voxel in the world coordinate system in the radar image (Madhow: Figures 13 & 18; 0124: “this received data has three dimensions: ADC samples per chirp (“fast time”), chirp index within a frame (“slow time”), and receive antenna element. As shown in FIG. 18, this can then be converted to range-Doppler-angle space (block 1810) by taking a 3D FFT…Some aspects operate in such a transform domain, since it provides a geometric interpretation of the collected data.”; Wherein the transformed radar data domain provides lengths in 3 directions according to a common coordinate frame).
Regarding claim 3, Madhow in view of Li discloses: The data processing apparatus according to claim 1, wherein the operations further comprise extracting the depth distance by further using, as the depth information associated with the imaging range, the second picture image being generated by the second camera capable of measuring depth and including the target object (Madhow: 0121: “The computer vision system, which may be based on a DNN, detects objects, draws bounding boxes around them, and classifies the detected objects. These bounding boxes are typically 2D and based on the image sensor, but can have depth information from stereo image sensors.”; Wherein the inclusion of stereo image sensors indicates the inclusion of a second camera being used to extract the depth distance of the target object.).
As per claim(s) 11, arguments made in rejecting claim(s) 1 are analogous.
As per claim(s) 12, arguments made in rejecting claim(s) 2 are analogous.
As per claim(s) 13, arguments made in rejecting claim(s) 3 are analogous.
As per claim(s) 21, arguments made in rejecting claim(s) 1 are analogous. In addition, Paragraph 0186 of Madhow discloses a non-transitory computer-readable medium storing a program for causing a computer perform operations.
Claim(s) 4-5, 8-10, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Madhow in view of Li, and further in view of Yamazaki et al. (US20200211219A1) hereinafter referenced as Yamazaki.
Regarding claim 4, Madhow in view of Li discloses: The data processing apparatus according to claim 1.
Madhow in view of Li does not disclose expressly: wherein the operations further comprise determining the position of the target object by determining a position of a marker mounted on the target object.
Yamazaki discloses: determining the position of a target object by determining a position of a marker mounted on the target object (Yamazaki: Figure 10; 0097: “the open surface of the quadrangular pyramid of the target 12 faces the direction of the millimeter wave radar 52 of the sensor section 31, and a marker 72 that includes a sheet of paper with a pattern printed on the open surface is affixed to the target 12. A QR (Quick Response) code, for example, may be used as this pattern.”; 0181: “the stereo camera image target detection section 293 recognizes the coordinate position (x,y) of the target 12 by recognizing the marker 72 in the stereo camera image.”).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to affix a target disclosed in Madhow in view of Li with the marker taught in Yamazaki. The suggestion/motivation for doing so would have been “the marker 72 is affixed that includes a sheet of paper with a pattern printed thereon for higher distance measurement accuracy of the stereo camera 51” (Yamazaki: 0100; Wherein the marker increases the accuracy of the object detection). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Madhow in view of Li with Yamazaki to obtain the invention as specified in claim 4.
Regarding claim 5, Madhow in view of Li and Yamazaki discloses: The data processing apparatus according to claim 4, wherein the operations further comprise computing the position of the marker by using a size of the marker in the first picture image (Yamazaki: 181: “the stereo camera image target detection section 293 recognizes the coordinate position (x,y) of the target 12 by recognizing the marker 72 in the stereo camera image. The marker 72 may be, for example, a QR (Quick Response) code.”; Wherein the computation of the marker position through the visual detection of the marker constitutes using a size of the marker) and extracting the depth distance from the first camera to the target object, based on the position of the marker (Madhow: 0121: “The computer vision system, which may be based on a DNN, detects objects, draws bounding boxes around them, and classifies the detected objects. These bounding boxes are typically 2D and based on the image sensor, but can have depth information from stereo image sensors.”; Wherein the depth distance from the camera to the target object/marker is acquired through a stereo image from the camera).
As per claim(s) 14, arguments made in rejecting claim(s) 4 are analogous.
As per claim(s) 15, arguments made in rejecting claim(s) 5 are analogous.
Claim(s) 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Madhow in view of Yamazaki and Li.
Regarding claim 8, Madhow discloses: A data processing apparatus comprising: at least one memory configured to store instructions; and at least one processor configured to execute the instructions to perform operations (Madhow: 0026: “According to some aspects of the technology described herein, a system includes processing circuitry and memory…The processing circuitry determines, based on the common geographic region and the common time period, that one or more labeled objects in the one or more images map to one or more radar tracks.”), the operations comprising:
determining that an incorrect label is likely to be generated for a target object in a radar image based on a shape or a pose of the target object in a radar image due to the shape or the pose of the target object being unclear in the radar image (Madhow: 0029-0030: “radar units output radar tracks that include range, Doppler, and/or micro-Doppler measurements. However, identifying objects represented by the Doppler measurements may be challenging. Some aspects of the technology described herein provide techniques for identifying objects represented by the Doppler measurements. According to some schemes, representations of some common items (e.g., car, motorcycle, etc.), in terms of radar data, are collected and are stored in the memory of a computing machine. The computing machine then recognizes one of the common items by determining that an input radar data representation is similar to the stored representation. However, manually collecting and labeling many such stored representations may be tedious and expensive (in terms of human effort), and may not provide a diverse enough representation for reliable inference.”);
based on determining that incorrect label is likely to be generated, determining, based on a first picture image acquired by a first camera, a position of the target object in the picture image (Madhow: Figure 14; 0107: “As shown in FIG. 14, camera outputs 1405 are provided to a computer vision module 1410. The computer vision module 1410 produces labels and bounding boxes 1415 in image(s).”; 0109: “a computing machine identifies two adjacent bounding boxes 1505 and 1510 that represent humans.”; Wherein the computer vision module acquires the position of people in images.);
extracting a depth distance from the first camera to the target object by using the radar image generated based on a radar signal generated by a radar (Madhow: 0121: “The computer vision system, which may be based on a DNN, detects objects, draws bounding boxes around them, and classifies the detected objects. These bounding boxes are typically 2D and based on the image sensor”;
0031: “The data from the one or more radar units comprises radar tracks, each radar track comprising one or more Doppler measurements, one or more range measurements, and one or more angle measurements.”;
0107-0108: “The computer vision module 1410 produces labels and bounding boxes 1415 in image(s). These labels and bounding boxes 1415 are provided to a geometric reconfiguration module 1420…radar data 1425 is provided to localization and tracking module 1430. The localization and tracking module 1430 provides targets to isolate data for track module 1435 and provides tracks to the geometric reconfiguration module 1420.”;
0156: “the disclosed system for gathering and labeling data comprises one or more radar units, along with one or more computer vision devices, whose location and orientation with respect to a common coordinate frame are known or can be estimated.”; Wherein the radar data is combined with the 2d bounding box data detected by the images for estimating the geometric reconfiguration from 2d image to 3d radar data);
transforming, based on the depth distance, the position of the target object in the picture image into a position of the target object in a world coordinate system (Madhow: Figures 13 & 14; 0107-0108: “The computer vision module 1410 produces labels and bounding boxes 1415 in image(s). These labels and bounding boxes 1415 are provided to a geometric reconfiguration module 1420…radar data 1425 is provided to localization and tracking module 1430. The localization and tracking module 1430 provides targets to isolate data for track module 1435 and provides tracks to the geometric reconfiguration module 1420.”; 0130: “the geometric registration between the radar data and the labels and bounding boxes output by the vision system may indicate that multiple objects that are not separable by the radar system correspond to a given portion of the radar data.”);
transforming, based on the position of the first camera in the world coordinate system (Madhow: 0156: “the disclosed system for gathering and labeling data comprises one or more radar units, along with one or more computer vision devices, whose location and orientation with respect to a common coordinate frame are known or can be estimated.”) and radar imaging information used when the radar image is generated from a measurement result of the radar (Madhow: 0108: “radar data 1425 is provided to localization and tracking module 1430. The localization and tracking module 1430 provides targets to isolate data for track module 1435 and provides tracks to the geometric reconfiguration module 1420.”), the position of the target object in the world coordinate system into positional information of the target object in the radar image indicating a position of the target object in the radar image; and
determining a label of the target object in the radar image based on the positional information (Madhow: Figure 14; 0108: “The data for the tracks from the isolate data for track module 1435 and the geometric reconfiguration from the geometric reconfiguration module 1420 are provided to the label radar data for track module 1440. The label radar data for track module 1440 generates labeled radar data 1445”; 0115: “the CVD detects an object at range r and angle θ with respect to its own frame of reference at time t. This is converted to an approximate object geographic location in a global reference frame. The computing machine assigns the label “human” to the detected object 2020 . The radar system takes labels generated by the vision system and assigns those labels to radar tracks at approximately the same geographic location and time. The radar track(s) corresponding to the trajectory that includes the geographic location and time detected from the vision system are saved, and the label “human” is assigned to those radar track(s).”; Wherein the positions of detected objects in an image are converted to a world coordinate system and then assigned to objects in radar data in order to generate labeled radar data.).
Madhow does not disclose expressly: determining, based on a first picture image acquired by a first camera, a position of a marker mounted on a target object in the picture image as a position of the target object in the picture image.
Yamazaki discloses: determining, based on a picture image acquired by a camera, a position of a marker mounted on a target object in the picture image as a position of a target object in the picture image (Yamazaki: Figure 10; 0097: “the open surface of the quadrangular pyramid of the target 12 faces the direction of the millimeter wave radar 52 of the sensor section 31, and a marker 72 that includes a sheet of paper with a pattern printed on the open surface is affixed to the target 12. A QR (Quick Response) code, for example, may be used as this pattern.”; 0181: “the stereo camera image target detection section 293 recognizes the coordinate position (x,y) of the target 12 by recognizing the marker 72 in the stereo camera image.”).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to affix a target disclosed in Madhow with the marker taught in Yamazaki. The suggestion/motivation for doing so would have been “the marker 72 is affixed that includes a sheet of paper with a pattern printed thereon for higher distance measurement accuracy of the stereo camera 51” (Yamazaki: 0100; Wherein the marker increases the accuracy of the object detection). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Madhow in view of Yamazaki does not disclose expressly: transforming, based on the depth distance, the position of the target object in the first picture image into a position of the target object in a world coordinate system with a position of the first camera as an origin of the world coordinate system; transforming, based on the position of the first camera in the world coordinate system with respect to a radar position as the origin of the world coordinate system and radar imaging information used when the radar image is generated from a measurement result of the radar, the position of the target object in the world coordinate system into positional information of the target object in the radar image indicating a position of the target object in the radar image.
Li discloses: transforming, based on a depth distance, the position of an object in an image into a position of the object in a world coordinate system with a position of a camera as an origin of the world coordinate system (Li: II. B. Coordinate Systems and Sensor Models: “1) Coordinate Systems: To analyze such a multi-sensor system, we set several coordinate systems with respect to the different sensors…since the stereoscopic system is composed of two cameras, we assume the left camera frame as the reference of the stereoscopic system coordinate system, R 3 stereo = R 3 left.
2) Sensor Models: The left camera and right camera are modeled by the classical pinhole model. Suppose a point P in the left camera frame: Pl = (Xl Yl Zl)T. Then, with zero skew, its corresponding point coordinates p l ij = [xl , yl ] in the image frame, are given by:
PNG
media_image1.png
21
234
media_image1.png
Greyscale
…where KL denotes the intrinsic matrix of the left camera”; Wherein the conversion from the image coordinate system to the stereoscopic coordinate system constitutes a conversion to a world coordinate system.); and transforming, based on the position of the camera in the world coordinate system with respect to a radar position as the origin of the world coordinate system and radar imaging information used when the radar image is generated from a measurement result of the radar (Li: II. B. Coordinate Systems and Sensor Models: “For the LIDAR coordinate system, we specify the origin as the point which emits laser rays. The directions of X, Y , Z are set as rightward, upward and forward from the sensor respectively”; Wherein the lidar coordinate system, with an origin at the lidar sensor, constitutes a world coordinate system.), the position of the object in the world coordinate system into positional information of the object in the radar image indicating a position of the object in the radar image (Li: II. B. Coordinate Systems and Sensor Models: “We denote (Φl , ∆l) and (Φr, ∆r) as the 3D rigid transformation from LIDAR coordinate system R 3 lidar to the left and right camera coordinate systems R 3 left, R 3 right respectively, where Φl , Φr are the orthogonal rotation matrices, ∆l and ∆r are translation vectors. Suppose a fixed point P observed by both the LIDAR and the stereoscopic system, denoted by Plidar in the LIDAR’s coordinate system, and Pl , Pr in the left and right camera coordinate systems. The three coordinates are connected by:
PNG
media_image2.png
73
235
media_image2.png
Greyscale
”; Wherein the calibration between Stereo Vision and Lidar systems allowing for conversion of Lidar coordinate points to camera coordinate points allows for inversely converting camera coordinate points to Lidar coordinate points.).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement the calibration algorithms between the binocular stereo vision system and Lidar taught by Li for the registration of the camera and radar systems disclosed by Madhow in view of Yamazaki. The suggestion/motivation for doing so would have been “Based on 3D reconstruction and non-linear optimization algorithm, rigid transformation parameters are calculated conveniently by making a common calibration chessboard detected by all the sensors with several different poses. Moreover, the real data experiments show that our method results in less system errors than Zhang’s method, which is widely used. The proposed method can be applied in any multisensor fusion system which consists of multiple cameras and multiple LIDARs” (Li: V. CONCLUSION AND FUTURE WORK). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine Madhow in view of Yamazaki with Li to obtain the invention as specified in claim 1.
Regarding claim 9, Madhow in view of Yamazaki and Li discloses: The data processing apparatus according to claim 8, wherein the marker can be visually recognized by the first camera (Yamazaki: 0019: “The predetermined target can include a radar reflector whose reflectance of radar waves of the millimeter wave radar is higher than a predetermined value and a marker recognizable with the stereo camera image.”) and cannot be visually recognized by the radar image (Yamazaki: Figure 4: radar reflector 71 and marker 72; 0095: “it is preferable that the target 12 should be an object that includes a metal and does not reflect light to such an extent that distance measurement of the stereo camera remains unaffected by light reflection and that the target 12 should, for example, be matt-finished or have a piece of paper affixed thereto.”; Wherein the paper marker is not recognized by the radar nor does it impede the performance of the radar reflector).
Regarding claim 10, Madhow in view of Yamazaki and Li discloses: The data processing apparatus according to claim 9, wherein the marker is formed by using at least one item out of paper, wood, cloth, or plastic (Yamazaki: 0097: “and a marker 72 that includes a sheet of paper with a pattern printed on the open surface is affixed to the target 12.”).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY J RODRIGUEZ whose telephone number is (703)756-5821. The examiner can normally be reached Monday-Friday 10am-7pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANTHONY J RODRIGUEZ/Examiner, Art Unit 2672
/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672