DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 6-7 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 6 recites “first features and second features” in line 3. “First features and second features” have not been introduced in the preceding claims. Further, there is a lack of a clear definition of the terms “first features and second features” in the specification. Therefore, it is unclear what the generation of the aligned features based on the first and second feature entails. For the purpose of examination, the examiner interprets “first features and second features” as “source features and target features”. Claim 7 is dependent on claim 6 and is rejected for failing to correct the ambiguity of claim 6.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Florea et al. ("Enhanced Perception for Autonomous Driving Using Semantic and Geometric Data Fusion", full reference on PTO-892; hereafter, Florea) in view of Chen et al. ("3D Point Cloud Processing and Learning for Autonomous Driving: Impacting Map Creation, Localization, and Perception", full reference on PTO-892; hereafter, Chen).
Regarding claim 1, Florea discloses:
An apparatus for processing data; the apparatus comprising: at least one memory (pg. 6 para. 2, the method is carried out by a processor. A person of ordinary skill in the art would understand that a processor for performing this method and the associated GPU would have an associated memory);
and at least one processor coupled to the at least one memory (pg. 6 para. 2, the method is carried out by a processor) and configured to:
obtain source features (pg. 8 para. 3, images undergo image undistortion which is understood as obtaining source features. It is noted that the phrase “obtain source features” is extremely broad. The examiner is interpreting “source features” as processed image data at least one step removed from the raw image data, see the applicant’s specification [0094] stating that “source features” may be features extracted by a network or fused top-down features or fused perspective view features, which are features of an image after some processing. The processing of the image to obtain an undistorted image can be broadly interpreted as “generating source features” because the processing of the image transforms the image into a different form to enable further processing of features in the image) generated based on first sensor data captured using a first set of sensors (pg. 8 para. 3, images are obtained by at least one camera);
PNG
media_image1.png
56
614
media_image1.png
Greyscale
obtain source semantic attributes related to the source features (pg. 9 para. 1, images are segmented semantically which is understood as semantic attributes);
PNG
media_image2.png
80
616
media_image2.png
Greyscale
obtain target features (pg. 8 para. 1, the LiDAR raw data goes through the LiDAR Grabber to output point clouds. The point cloud is understood as target features. It is noted that the phrase “obtain source features” is extremely broad. The examiner is interpreting “target features” as processed image data at least one step removed from the raw image data, see the applicant’s specification [0097] stating that “target features” may be features extracted by a network or fused top-down features or fused perspective view features or map-based features, which are features of an image after some processing. The processing of the raw data into a point-cloud is understood by the processor as generating a map-based view as it is mapping the data into a 3D space. The examiner understands this as providing the features for the rest of the process) generated based on second sensor data captured using a second set of sensors (pg. 8 para. 1, the point cloud is from the raw measurements of the LiDAR sensors);
PNG
media_image3.png
56
616
media_image3.png
Greyscale
obtain target semantic attributes (pg. 10 para. 1, the 3D point cloud is segmented into road and obstacles which is understood as semantic attributes)
PNG
media_image4.png
122
612
media_image4.png
Greyscale
align the target features (pg. 12 para. 5, the 3D points are projected onto the image which is understood as aligning the features) with a set of the source features (pg. 12 para. 5, the aligning of features is performed "once 2D segmentation results become available". The examiner is interpreting this as a subset of the source features as it is a limited access to the source features),
PNG
media_image5.png
180
616
media_image5.png
Greyscale
based on the source semantic attributes (pg. 12 para. 5, the projecting, i.e. aligning, is onto the segmented image, i.e. the source semantic attribute)
PNG
media_image6.png
18
628
media_image6.png
Greyscale
and the target semantic attributes (pg. 13 para. 3, the 3D measurements are projected, i.e. aligned, onto each image. Fig. 4, point cloud segmentation feeds into the geometric and semantic fusion which the examiner understand the fusion, i.e. aligning, to be based, at least in part, on the point cloud segmentation, i.e. target semantic attributes),
PNG
media_image7.png
38
612
media_image7.png
Greyscale
PNG
media_image8.png
232
338
media_image8.png
Greyscale
to generate aligned target features (pg. 12 para. 5, spatio-temporal and appearance based representations are generated which is understood as an alignment of the point cloud and image data. It is noted that the phrase “aligned target features” is extremely broad. The examiner is interpreting the phrase to be an aligned version of the previous source features and target features. The examiner understands the spatio-temporal and appearance based representations as aligned target features as the separate features, i.e. the undistorted image understood as source features and the point-cloud understood as target features, are now aligned);
PNG
media_image9.png
62
620
media_image9.png
Greyscale
and process the aligned target features to generate an output (pg. 14 para. 5, object classification is performed and output to label each object, which is understood as an output).
PNG
media_image10.png
80
612
media_image10.png
Greyscale
Florea does not disclose expressly to obtain map information, to obtain location information, and to obtain target semantic attributes from the map information and the location information.
Chen discloses:
obtain map information (pg. 76 col. 1 last para., an offline map is used in the processing);
PNG
media_image11.png
132
322
media_image11.png
Greyscale
obtain location information of a device comprising the second set of sensors (pg. 76 col. 2 para. 2, the pose of the vehicle, i.e. the device comprising the sensors, is localized);
PNG
media_image12.png
40
326
media_image12.png
Greyscale
obtain target semantic attributes (pg. 78 col. 2 para. 3, semantic features are extracted)
PNG
media_image13.png
82
326
media_image13.png
Greyscale
from the map information based on the location information (pg. 78 col. 2 last para., the output of semantic labels are projected onto the point cloud map and generate a traffic rule-related feature map. The examiner interprets the rule-related feature map to incorporate the location of the vehicle as relevant to the relevant traffic rules, see pg. 76 col. 2 para. 5);
PNG
media_image14.png
80
324
media_image14.png
Greyscale
Florea and Chen are combinable because they are from the same field of endeavor of perception systems for autonomous driving (Florea, pg. 2 para. 2; Chen, pg. 1 para. 1).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the map information of Chen with the invention of Florea.
The motivation for doing so would have been "The main reason for creating an offline HD map is that understanding traffic rules in real time is too challenging" (Chen, pg. 76 col. 1 last para.).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the location information of Chen with the invention of Florea.
The motivation for doing so would have been "These priors [including location] are used to register real-time lidar sweeps to the point cloud map, such that one can obtain the real-time high-precision ego motion of an AV [autonomous vehicle]" (Chen, pg. 76 col. 1 para. 2).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the semantic attributes based on map and location of Chen with the invention of Florea.
The motivation for doing so would have been "In an HD map, traffic rule-related semantic features such as lane geometries and connectivities, traffic lights, traffic signs and the speed limit of lanes are indispensable priors for the planning module" (Chen, pg. 76 col. 2 para. 5).
Therefore, it would have been obvious to combine Chen with Florea to obtain the invention as specified in claim 1.
Regarding claim 2, Florea in view of Chen discloses the subject matter of claim 1.
Florea further discloses:
The apparatus of claim 1, wherein at least one of extrinsic parameters or intrinsic parameters are different between the first set of sensors and the second set of sensors (pg. 6 section 3.1. and Fig. 3, the first set or sensors are understood as the cameras and the second set of sensors are understood as the LiDAR scanners. The camera and LiDAR have separate extrinsic parameters as shown by their divergent means of detecting information).
PNG
media_image15.png
252
624
media_image15.png
Greyscale
Regarding claim 3, Florea in view of Chen discloses the subject matter of claim 1.
Florea further discloses:
The apparatus of claim 1, wherein at least one of: a count of the first set of sensors is different than a count of the second set of sensors (as this claim is directed to “at least one of” the listed limitations, the claim is taught by the below limitations rather than this one);
a type of the first set of sensors is different than a type of the second set of sensors (pg. 6 section 3.1., the first set or sensors are understood as the cameras and the second set of sensors are understood as the LiDAR scanners);
or relative positions of the first set of sensors are different than relative positions of the second set of sensors (Fig. 3, the position of the sets of sensors are different).
PNG
media_image15.png
252
624
media_image15.png
Greyscale
Regarding claim 20, Florea discloses:
A method for processing data; the method omprising:
obtaining source features (pg. 8 para. 3, images undergo image undistortion which is understood as obtaining source features. It is noted that the phrase “obtain source features” is extremely broad. The examiner is interpreting “source features” as processed image data at least one step removed from the raw image data, see the applicant’s specification [0094] stating that “source features” may be features extracted by a network or fused top-down features or fused perspective view features, which are features of an image after some processing. The processing of the image to obtain an undistorted image can be broadly interpreted as “generating source features” because the processing of the image transforms the image into a different form to enable further processing of features in the image) generated based on first sensor data captured using a first set of sensors (pg. 8 para. 3, images are obtained by at least one camera);
PNG
media_image1.png
56
614
media_image1.png
Greyscale
obtaining source semantic attributes related to the source features (pg. 9 para. 1, images are segmented semantically which is understood as semantic attributes);
PNG
media_image2.png
80
616
media_image2.png
Greyscale
obtaining target features (pg. 8 para. 1, the LiDAR raw data goes through the LiDAR Grabber to output point clouds. The point cloud is understood as target features. It is noted that the phrase “obtain source features” is extremely broad. The examiner is interpreting “target features” as processed image data at least one step removed from the raw image data, see the applicant’s specification [0097] stating that “target features” may be features extracted by a network or fused top-down features or fused perspective view features or map-based features, which are features of an image after some processing. The processing of the raw data into a point-cloud is understood by the processor as generating a map-based view as it is mapping the data into a 3D space. The examiner understands this as providing the features for the rest of the process) generated based on second sensor data captured using a second set of sensors (pg. 8 para. 1, the point cloud is from the raw measurements of the LiDAR sensors);
PNG
media_image3.png
56
616
media_image3.png
Greyscale
obtaining target semantic attributes (pg. 10 para. 1, the 3D point cloud is segmented into road and obstacles which is understood as semantic attributes)
PNG
media_image4.png
122
612
media_image4.png
Greyscale
aligning the target features (pg. 12 para. 5, the 3D points are projected onto the image which is understood as aligning the features) with a set of the source features (pg. 12 para. 5, the aligning of features is performed "once 2D segmentation results become available". The examiner is interpreting this as a subset of the source features as it is a limited access to the source features),
PNG
media_image5.png
180
616
media_image5.png
Greyscale
based on the source semantic attributes (pg. 12 para. 5, the projecting, i.e. aligning, is onto the segmented image, i.e. the source semantic attribute)
PNG
media_image6.png
18
628
media_image6.png
Greyscale
and the target semantic attributes (pg. 13 para. 3, the 3D measurements are projected, i.e. aligned, onto each image. Fig. 4, point cloud segmentation feeds into the geometric and semantic fusion which the examiner understand the fusion, i.e. aligning, to be based, at least in part, on the point cloud segmentation, i.e. target semantic attributes),
PNG
media_image7.png
38
612
media_image7.png
Greyscale
PNG
media_image8.png
232
338
media_image8.png
Greyscale
to generate aligned target features (pg. 12 para. 5, spatio-temporal and appearance based representations are generated which is understood as an alignment of the point cloud and image data. It is noted that the phrase “aligned target features” is extremely broad. The examiner is interpreting the phrase to be an aligned version of the previous source features and target features. The examiner understands the spatio-temporal and appearance based representations as aligned target features as the separate features, i.e. the undistorted image understood as source features and the point-cloud understood as target features, are now aligned);
PNG
media_image9.png
62
620
media_image9.png
Greyscale
and processing the aligned target features to generate an output (pg. 14 para. 5, object classification is performed and output to label each object, which is understood as an output).
PNG
media_image10.png
80
612
media_image10.png
Greyscale
Florea does not disclose expressly to obtain map information, to obtain location information, and to obtain target semantic attributes from the map information and the location information.
Chen discloses:
obtaining map information (pg. 76 col. 1 last para., an offline map is used in the processing);
PNG
media_image11.png
132
322
media_image11.png
Greyscale
obtaining location information of a device comprising the second set of sensors (pg. 76 col. 2 para. 2, the pose of the vehicle, i.e. the device comprising the sensors, is localized);
PNG
media_image12.png
40
326
media_image12.png
Greyscale
obtaining target semantic attributes (pg. 78 col. 2 para. 3, semantic features are extracted)
PNG
media_image13.png
82
326
media_image13.png
Greyscale
from the map information based on the location information (pg. 78 col. 2 last para., the output of semantic labels are projected onto the point cloud map and generate a traffic rule-related feature map. The examiner interprets the rule-related feature map to incorporate the location of the vehicle as relevant to the relevant traffic rules, see pg. 76 col. 2 para. 5);
PNG
media_image14.png
80
324
media_image14.png
Greyscale
Florea and Chen are combinable because they are from the same field of endeavor of perception systems for autonomous driving (Florea, pg. 2 para. 2; Chen, pg. 1 para. 1).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the map information of Chen with the invention of Florea.
The motivation for doing so would have been "The main reason for creating an offline HD map is that understanding traffic rules in real time is too challenging" (Chen, pg. 76 col. 1 last para.).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the location information of Chen with the invention of Florea.
The motivation for doing so would have been "These priors [including location] are used to register real-time lidar sweeps to the point cloud map, such that one can obtain the real-time high-precision ego motion of an AV [autonomous vehicle]" (Chen, pg. 76 col. 1 para. 2).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the semantic attributes based on map and location of Chen with the invention of Florea.
The motivation for doing so would have been "In an HD map, traffic rule-related semantic features such as lane geometries and connectivities, traffic lights, traffic signs and the speed limit of lanes are indispensable priors for the planning module" (Chen, pg. 76 col. 2 para. 5).
Therefore, it would have been obvious to combine Chen with Florea to obtain the invention as specified in claim 20.
Regarding claim 21, Florea in view of Chen discloses the subject matter of claim 20.
Florea further discloses:
The method of claim 20, wherein at least one of extrinsic parameters or intrinsic parameters are different between the first set of sensors and the second set of sensors (pg. 6 section 3.1. and Fig. 3, the first set or sensors are understood as the cameras and the second set of sensors are understood as the LiDAR scanners. The camera and LiDAR have separate extrinsic parameters as shown by their divergent means of detecting information).
PNG
media_image15.png
252
624
media_image15.png
Greyscale
Regarding claim 22, Florea in view of Chen discloses the subject matter of claim 20.
Florea further discloses:
The method of claim 20, wherein at least one of: a count of the first set of sensors is different than a count of the second set of sensors (as this claim is directed to “at least one of” the listed limitations, the claim is taught by the below limitations rather than this one);
a type of the first set of sensors is different than a type of the second set of sensors (pg. 6 section 3.1., the first set or sensors are understood as the cameras and the second set of sensors are understood as the LiDAR scanners);
or relative positions of the first set of sensors are different than relative positions of the second set of sensors (Fig. 3, the position of the sets of sensors are different).
PNG
media_image15.png
252
624
media_image15.png
Greyscale
Claims 4-5 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Florea et al. ("Enhanced Perception for Autonomous Driving Using Semantic and Geometric Data Fusion", full reference on PTO-892; hereafter, Florea) in view of Chen et al. ("3D Point Cloud Processing and Learning for Autonomous Driving: Impacting Map Creation, Localization, and Perception", full reference on PTO-892; hereafter, Chen) in further view of Chen et al. (US 20170039436 A1; hereafter, Zang).
Regarding claim 4, Florea in view of Chen discloses the subject matter of claim 1.
Florea in view of Chen does not disclose expressly to process aligned target features by a machine learning model trained using training source features.
Zang discloses:
The apparatus of claim 1, wherein the aligned target features are processed using a machine-learning model ([0041] registered, i.e. aligned, features are processed by a "classifier". [0035] "Training 908 of the CNN receives the image and marking mask 906 and results in a classifier 910", therefore the classifier may be understood as a machine learning model) trained using training source features based on training source data ([0031] the classifier is trained based on image patches from the sources which is understood as training source data).
Zang is combinable with Florea in view of Chen because it is in the same field of endeavor of identifying road markings and fusing image and point cloud data (Zang, [0001]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the machine learning model processing of Zang with the invention of Florea in view of Chen.
The motivation for doing so would have been "using deep learning methods to determine lane markings and classification" (Zang, [0020]). In other words, determining lane markings is a motivation for using the machine learning model to assist autonomous driving applications.
Therefore, it would have been obvious to combine Zang with Florea in view of Chen to obtain the invention as specified in claim 4.
Regarding claim 5, Florea in view of Chen in further view of Zang discloses the subject matter of claim 4.
Florea in view of Chen does not disclose that the training source data comprises the first sensor data.
Zang discloses:
The apparatus of claim 4, wherein the training source data comprises the first sensor data ([0030] the image patches are from registered images which comprise first sensor data).
Regarding claim 23, Florea in view of Chen discloses the subject matter of claim 20.
Florea in view of Chen does not disclose expressly to process aligned target features by a machine learning model trained using training source features.
Zang discloses:
The method of claim 20, wherein the aligned target features are processed using a machine-learning model ([0041] registered, i.e. aligned, features are processed by a "classifier". [0035] "Training 908 of the CNN receives the image and marking mask 906 and results in a classifier 910", therefore the classifier may be understood as a machine learning model) trained using training source features based on training source data ([0031] the classifier is trained based on image patches from the sources which is understood as training source data).
Zang is combinable with Florea in view of Chen because it is in the same field of endeavor of identifying road markings and fusing image and point cloud data (Zang, [0001]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the machine learning model processing of Zang with the invention of Florea in view of Chen.
The motivation for doing so would have been "using deep learning methods to determine lane markings and classification" (Zang, [0020]). In other words, determining lane markings is a motivation for using the machine learning model to assist autonomous driving applications.
Therefore, it would have been obvious to combine Zang with Florea in view of Chen to obtain the invention as specified in claim 23.
Regarding claim 24, Florea in view of Chen in further view of Zang discloses the subject matter of claim 23.
Florea in view of Chen does not disclose that the training source data comprises the first sensor data.
Zang discloses:
The apparatus of claim 4, wherein the training source data comprises the first sensor data ([0030] the image patches are from registered images which comprise first sensor data).
Claims 6, 13, 16, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Florea et al. ("Enhanced Perception for Autonomous Driving Using Semantic and Geometric Data Fusion", full reference on PTO-892; hereafter, Florea) in view of Chen et al. ("3D Point Cloud Processing and Learning for Autonomous Driving: Impacting Map Creation, Localization, and Perception", full reference on PTO-892; hereafter, Chen) in further view of Cohen et al. (US 11062454 B1; hereafter, Cohen).
Regarding claim 6, Florea in view of Chen discloses the subject matter of claim 1.
Florea in view of Chen does not disclose expressly to process the target features and the set of source features using a machine learning model trained based on first features and second features.
Cohen discloses:
The apparatus of claim 1, wherein, to align the target features, the at least one processor is configured to process the target features and the set of the source features using a machine-learning model (Col. 21 line 40-41, an ROI generated from the merged global features of step 418 is generated from a machine learning model) trained to generate aligned features based on first features and second features (Col. 15 line 3-7, the model is trained to generate the ROI based at least in part on the feature map 220 of Fig. 2).
Cohen is combinable with Florea in view of Chen because it is from the same field of endeavor of object recognition in multiple types of sensors (Cohen, Col. 1 line 63 through col. 2 line 1).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the machine learning of Cohen with the invention of Florea in view of Chen.
The motivation for doing so would have been "to use sensor data from multiple types of sensors, thereby increasing the accuracy of the point cloud/object detection associations relative to techniques that rely on one modality of sensor, such as lidar data, for example" (Cohen col. 1 last line through col. 2 line 4).
Therefore it would have been obvious to combine Cohen with Florea in view of Chen to obtain the invention as specified in claim 6.
Regarding claim 13, Florea in view of Chen discloses the subject matter of claim 1.
Florea in view of Chen does not disclose expressly to generate the target features by a feature-extractor network trained to generate feature based on data.
Cohen discloses:
The apparatus of claim 1, wherein, to obtain the target features, the at least one processor is configured to generate the target features by processing the second sensor data using a feature-extractor network (Col. 20 line 14-18, target features, understood as the second local features, are generated by full connected layers which are understood as a feature-extraction network. By the wording of a "feature extractor network" the examiner interprets that it may not be a complete machine learning model but rather a network or subcomponent within a model) trained to generate features based on data (Col. 19 line 49-51, the step above may be accomplished by model 200. Col. 17 line 27-29, the model 200 as a whole is trained. Therefore, the examiner understand the network to be trained as well as a subcomponent of the model).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the feature extraction network of Cohen with the invention of Florea in view of Chen.
The motivation for doing so would have been "it is understood that the feature maps may not be described in humanly-comprehensible terms, as the feature maps may comprise an output that may be a computer and/or neural network transformation of the input thereto" (Cohen, col. 19 line 16-20). In other words, important features may not be human comprehensible so therefore a feature extraction network is needed.
Therefore, it would have been obvious to combine Cohen with Florea in view of Chen to obtain the invention as specified in claim 13.
Regarding claim 16, Florea in view of Chen discloses the subject matter of claim 1.
Florea in view of Chen does not disclose expressly to generate the source features by a feature-extractor network trained to generate feature based on data.
Cohen discloses:
The apparatus of claim 1, wherein the source features comprise features generated by processing the first sensor data using a feature-extractor network (Col. 20 line 14-18, source features, understood as the first local features, are generated by full connected layers which are understood as a feature-extraction network. By the wording of a "feature extractor network" the examiner interprets that it may not be a complete machine learning model but rather a network or subcomponent within a model) trained to generate features based on data (Col. 19 line 49-51, the step above may be accomplished by model 200. Col. 17 line 27-29, the model 200 as a whole is trained. Therefore, the examiner understand the network to be trained as well as a subcomponent of the model).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the feature extraction network of Cohen with the invention of Florea in view of Chen.
The motivation for doing so would have been "it is understood that the feature maps may not be described in humanly-comprehensible terms, as the feature maps may comprise an output that may be a computer and/or neural network transformation of the input thereto" (Cohen, col. 19 line 16-20). In other words, important features may not be human comprehensible so therefore a feature extraction network is needed.
Therefore, it would have been obvious to combine Cohen with Florea in view of Chen to obtain the invention as specified in claim 16.
Regarding claim 25, Florea in view of Chen discloses the subject matter of claim 20.
Florea in view of Chen does not disclose expressly to process the target features and the set of source features using a machine learning model trained based on first features and second features.
Cohen discloses:
The method of claim 20, wherein aligning the target features comprises processing the target features and the set of the source features using a machine-learning model (Col. 21 line 40-41, an ROI generated from the merged global features of step 418 is generated from a machine learning model) trained to generate aligned features based on first features and second features (Col. 15 line 3-7, the model is trained to generate the ROI based at least in part on the feature map 220 of Fig. 2).
Cohen is combinable with Florea in view of Chen because it is from the same field of endeavor of object recognition in multiple types of sensors (Cohen, Col. 1 line 63 through col. 2 line 1).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the machine learning of Cohen with the invention of Florea in view of Chen.
The motivation for doing so would have been "to use sensor data from multiple types of sensors, thereby increasing the accuracy of the point cloud/object detection associations relative to techniques that rely on one modality of sensor, such as lidar data, for example" (Cohen col. 1 last line through col. 2 line 4).
Therefore it would have been obvious to combine Cohen with Florea in view of Chen to obtain the invention as specified in claim 25.
Claims 7, 14-15, 17-18, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Florea et al. ("Enhanced Perception for Autonomous Driving Using Semantic and Geometric Data Fusion", full reference on PTO-892; hereafter, Florea) in view of Chen et al. ("3D Point Cloud Processing and Learning for Autonomous Driving: Impacting Map Creation, Localization, and Perception", full reference on PTO-892; hereafter, Chen) in further view of Cohen et al. (US 11062454 B1; hereafter, Cohen) and Chen et al. (US 20170039436 A1; hereafter, Zang).
Regarding claim 7, Florea in view of Chen in further view of Cohen discloses the subject matter of claim 6.
Florea in view of Chen does not disclose using a machine learning model in association with aligning features.
Cohen discloses:
using the machine-learning model (Col. 21 line 40-41, an ROI generated from the merged global features of step 418 is generated from a machine learning model. As the ROI is generated from merged features the examiner understands it as aligned features).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the use of a machine learning model in aligning features of Cohen with the invention of Florea in view of Chen.
The motivation for doing so would have been "to use sensor data from multiple types of sensors, thereby increasing the accuracy of the point cloud/object detection associations relative to techniques that rely on one modality of sensor, such as lidar data, for example" (Cohen col. 1 last line through col. 2 line 4).
Therefore it would have been obvious to combine Cohen with Florea in view of Chen.
Florea in view of Chen in further view of Cohen does not disclose expressly that to align the target features the source sensor parameters and target sensor parameter and processed.
Zang discloses:
The apparatus of claim 6, wherein, to align the target features, the at least one processor is further configured to processing source sensor parameters related to the source features ([0026] camera images to be aligned have their geolocation and pose data collected which is understood as sensor parameters as it describes the location and pose of the sensor) and target sensor parameters related to the target features ([0026] point cloud data to be aligned have their geolocation and pose data collected which is understood as sensor parameters as it describes the location and pose of the sensor. The geolocation and pose information is used in aligning the image and point cloud data)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the sensor parameters of Zang with the invention of Florea in view of Chen in further view of Cohen.
The motivation for doing so would have been "in order to facilitate matching point cloud data and image data that have been collected at different times" (Zang, [0026]).
Therefore, it would have been obvious to combine Zang with Florea in view of Chen in further view of Cohen to obtain the invention as specified in claim 7.
Regarding claim 14, Florea in view of Chen in further view of Cohen discloses the subject matter of claim 13.
Florea further discloses:
and wherein the target features comprise top-down features (pg. 11 para. 3, the voxel representation identifying obstacle may also be in a bird's-eye-view orientation which is understood as top-down) and perspective-view features (pg. 11 para. 2 and Fig. 7, an initial representation of the LiDAR data which looks out onto the street. This is understood as a perspective view).
Florea in view of Chen in further view of Cohen does not disclose expressly that the second sensor data comprises LIDAR and image data.
Zang discloses:
The apparatus of claim 13, wherein the second sensor data comprises light detection and ranging (LIDAR) based point-cloud data ([0040] point cloud data is received, which is understood as LiDAR data, see [0035]) and image data ([0040] 2D image data is generated from the point cloud data, therefore the sensor data comprises point cloud data and image data),
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the LIDAR and image data of Zang with Florea in view of Chen in further view of Cohen.
The motivation for doing so would have been so that the data can be registered with 2D color images (Zang, [0040]).
Therefore, it would have been obvious to combine Zang with Florea in view of Chen in further view of Cohen to obtain the invention as specified in claim 14.
Regarding claim 15, Florea in view of Chen in further view of Cohen and Zang discloses the subject matter of claim 14.
Florea further discloses:
The apparatus of claim 14, wherein the target features further comprise map-based features (Fig. 9, the target features may be represented as a map).
PNG
media_image16.png
350
624
media_image16.png
Greyscale
Regarding claim 17, Florea in view of Chen in further view of Cohen discloses the subject matter of claim 16.
Florea further discloses:
and wherein the target features comprise top-down features (pg. 11 para. 3, the voxel representation identifying obstacle may also be in a bird's-eye-view orientation which is understood as top-down) and perspective-view features (pg. 11 para. 2 and Fig. 7, an initial representation of the LiDAR data which looks out onto the street. This is understood as a perspective view).
Florea in view of Chen in further view of Cohen does not disclose expressly that the second sensor data comprises LIDAR and image data.
Zang discloses:
The apparatus of claim 16, wherein the second sensor data comprises light detection and ranging (LIDAR) based point-cloud data ([0040] point cloud data is received, which is understood as LiDAR data, see [0035]) and image data ([0040] 2D image data is generated from the point cloud data, therefore the sensor data comprises point cloud data and image data),
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the LIDAR and image data of Zang with Florea in view of Chen in further view of Cohen.
The motivation for doing so would have been so that the data can be registered with 2D color images (Zang, [0040]).
Therefore, it would have been obvious to combine Zang with Florea in view of Chen in further view of Cohen to obtain the invention as specified in claim 14.
Regarding claim 18, Florea in view of Chen in further view of Cohen and Zang discloses the subject matter of claim 17.
Florea further discloses:
The apparatus of claim 17, wherein the target features further comprise map-based features (Fig. 9, the target features may be represented as a map).
PNG
media_image16.png
350
624
media_image16.png
Greyscale
Regarding claim 26, Florea in view of Chen in further view of Cohen discloses the subject matter of claim 25.
Florea in view of Chen does not disclose using a machine learning model in association with aligning features.
Cohen discloses:
using the machine-learning model (Col. 21 line 40-41, an ROI generated from the merged global features of step 418 is generated from a machine learning model. As the ROI is generated from merged features the examiner understands it as aligned features).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the use of a machine learning model in aligning features of Cohen with the invention of Florea in view of Chen.
The motivation for doing so would have been "to use sensor data from multiple types of sensors, thereby increasing the accuracy of the point cloud/object detection associations relative to techniques that rely on one modality of sensor, such as lidar data, for example" (Cohen col. 1 last line through col. 2 line 4).
Therefore it would have been obvious to combine Cohen with Florea in view of Chen.
Florea in view of Chen in further view of Cohen does not disclose expressly that to align the target features the source sensor parameters and target sensor parameter and processed.
Zang discloses:
The method of claim 25, wherein aligning the target features further comprises processing source sensor parameters related to the source features ([0026] camera images to be aligned have their geolocation and pose data collected which is understood as sensor parameters as it describes the location and pose of the sensor) and target sensor parameters related to the target features ([0026] point cloud data to be aligned have their geolocation and pose data collected which is understood as sensor parameters as it describes the location and pose of the sensor. The geolocation and pose information is used in aligning the image and point cloud data)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the sensor parameters of Zang with the invention of Florea in view of Chen in further view of Cohen.
The motivation for doing so would have been "in order to facilitate matching point cloud data and image data that have been collected at different times" (Zang, [0026]).
Therefore, it would have been obvious to combine Zang with Florea in view of Chen in further view of Cohen to obtain the invention as specified in claim 26.
Claims 11-12, 19, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Florea et al. ("Enhanced Perception for Autonomous Driving Using Semantic and Geometric Data Fusion", full reference on PTO-892; hereafter, Florea) in view of Chen et al. ("3D Point Cloud Processing and Learning for Autonomous Driving: Impacting Map Creation, Localization, and Perception", full reference on PTO-892; hereafter, Chen) in further view of Khadem et al. (US 20240125899 A1; hereafter, Khadem).
Regarding claim 11, Florea in view of Chen discloses the subject matter of claim 1.
Florea in view of Chen does not disclose expressly to obtain source object trajectory information, to obtain target object trajectory information, and select the set of source features based on the trajectory informations.
Khadem discloses:
The apparatus of claim 1, wherein the at least one processor is further configured to: obtain source object trajectory information ([0025] the system identifies transient elements), wherein the source object trajectory information is indicative of first objects moving relative to the first set of sensors ([0025] the system considers data associated gathered from a vehicle, see Fig. 1 and Fig. 2. Therefore, the examiner understands the identification of object trajectories to be relative to the physical sensors);
obtain target object trajectory information, wherein the target object trajectory information is indicative of second objects moving relative to the second set of sensors ([0048] the system may repeat the above steps for another imaging modality such as image data or radar data. The examiner interprets one of the modalities as the first set of sensors and the source features and another modality as the second set of sensors and the target features);
and select the set of the source features ([0025] the data is filtered which is understood as selecting a set of source features) based on the source object trajectory information and the target object trajectory information ([0025] the filtering may be based on the moving objects by removing the moving objects).
Khadem is combinable with Florea in view of Chen because it is from the related field of endeavor of generating and updating semantically labeled roadway maps for autonomous vehicle navigation (Khadem, [0008]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the trajectory based selection of Khadem with the invention of Florea in view of Chen.
the motivation for doing so would have been "so that the filtered LiDAR data 120 primarily correspond to time-invariant elements of the environment" (Khadem, [0025]).
Therefore, it would have been obvious to combine Khadem in view of Florea in view of Chen to obtain the invention as specified in claim 11.
Regarding claim 12, Florea in view of Chen in further view of Khadem discloses the subject matter of claim 11.
Florea does not disclose expressly to obtain sensor data representative of one or more other objects and track the one or more other objects.
Chen discloses:
The apparatus of claim 11, wherein the at least one processor is further configured to: obtain sensor data representative of one or more other objects (pg. 76 col. 2 para. 3, the sensor can obtain data of objects such as road, trees, and foreground objects);
PNG
media_image17.png
98
320
media_image17.png
Greyscale
and track the one or more other objects based on the sensor data to generate trajectory data for the one or more other objects (pg. 76 col. 2 para. 4, the system can track and predict trajectories of objects in the scene).
PNG
media_image18.png
50
326
media_image18.png
Greyscale
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the object tracking of Chen with the invention of Florea.
The motivation for doing so would have been "to guide the predicted trajectories of objects" (Chen, pg. 76 col. 2 para. 4).
Therefore, it would have been obvious to combine Chen with Florea to obtain the invention as specified in claim 12.
Regarding claim 19, Florea in view of Chen discloses the subject matter of claim 1.
Florea further discloses:
and wherein the output relates to at least one of: a three-dimensional lane detection; a three-dimensional object detection; a two-dimensional lane detection; or a two-dimensional object detection (pg. 14 para. 5, three dimensional object detection is output).
Florea in view of Chen does not disclose expressly to provide the aligned target features to a machine-learning model trained to generate output based on features.
Khadem discloses:
The apparatus of claim 1, wherein, to process the aligned target features, the at least one processor is configured to provide the aligned target features to a machine-learning model ([0058] the 2d image 320, which is understood as aligned features as it depends on plural voxel space 306 which are aggregated or align in step 310, see fig. 3, is input into a machine learning model to determine a class label which is understood as an output) trained to generate outputs based on features ([0058] the machine learning model is trained to determine a class label based on the inputs which are features),
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention combine the machine learning model to generate the outputs of Khadem with the invention of Florea in view of Chen.
The motivation for doing so would have been that "the output image produced by an FCN provides localization of the class labels, since individual pixels which are located at known (x, y) coordinates within the output image, which corresponds to the same (x, y) coordinates within the input image" (Khadem, [0058]).
Therefore, it would have been obvious to combine Khadem with Florea in view of Chen to obtain the invention as specified in claim 19.
Regarding claim 30, Florea in view of Chen discloses the subject matter of claim 20.
Florea in view of Chen does not disclose expressly to ob