Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-8, 10-11, and 19-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Keilaf (US 10698114 B2), and further in view of Vineet (US 12283119 B2).
Regarding Claim 1, Keilaf teaches a computer-implemented method of locating and modelling a 3D object captured in multiple time-series of sensor data of multiple sensor modalities (col. 36, lines 12-28: “In one embodiment, LIDAR system 100 may be operable to generate depth maps of one or more different types, such as any one or more of the following types: point cloud model, polygon mesh, depth image (holding depth information for each pixel of an image or of a 2D array), or any other type of 3D model of a scene. The sequence of depth maps may be a temporal sequence, in which different depth maps are generated at a different time. Each depth map of the sequence associated with a scanning cycle (interchangeably “frame”) may be generated within the duration of a corresponding subsequent frame-time. In one example, a typical frame-time may last less than a second. In some embodiments, LIDAR system 100 may have a fixed frame rate (e.g. 10 frames per second, 25 frames per second, 50 frames per second) or the frame rate may be dynamic. In other embodiments, the frame-times of different frames may not be identical across the sequence.”), the method comprising:
optimizing a cost function applied to the multiple time-series of sensor data, wherein the cost function aggregates over time, and is defined over a set of variables (col. 121, lines 10-18: “According to some embodiments, the scene signal may be assessed and calculated with or without additional feedback signals such as a photonic steering assembly feedback PTX feedback, PRX feedback and host feedback and information stored in memory 2902 in a weighted means of local and global cost functions that determine a scanning/work plan such as a work plan signal for scanning unit 104 (such as: which pixels in the FOV are scanned, at which laser parameters budget, at which detector parameters budget).”), the set of variables comprising:
one or more shape parameters of a 3D object model (col. 114, lines 17-23: “Additional examples of environmental conditions upon which optical budget (or computational budget) apportionment may be based may include… detected characteristics of objects in space (e.g. shape, reflectivity, characteristics affecting SNR)”), and
whereby the 3D object is located at multiple time instants and modelled by tuning each pose and the shape parameters with the objective of optimizing the cost function (col. 83, line 66 – col. 84, line 2: “The generated depth maps may include a temporal characteristic. For example, the depth maps may be generated in a temporal sequence, in which different depth maps are generated at different times.”).
Keilaf fails to teach wherein the cost function aggregates over multiple sensor modalities;
Wherein the cost function is defined over a time sequence of poses of the 3D object model, each pose comprising a 3D object location and 3D object orientation;
wherein the cost function penalizes inconsistency between the multiple time-series of sensor data and the set of variables, or
wherein the object belongs to a known object class, and the 3D object model or the cost function encodes expected 3D shape information associated with the known object class.
Vineet teaches wherein the cost function aggregates over multiple sensor modalities (col. 4, lines 18-27: “Note, this exploits information about known relationships between components parts, or known relationships between instances of the same object at different times or from different sensor units. This is different from conventional training, where the cost function simply penalizes differences between a predicted 3D object and some ground truth information (sometimes referred to as an annotation or label), but does not take into account information about how those predicted 3D objects are expected to relate to each other.”);
Wherein the cost function is defined over a time sequence of poses of the 3D object model, each pose comprising a 3D object location and 3D object orientation (col. 3, lines 8-12: “A predicted 3D object may be represented as one or more of: a 3D location/position, a 3D orientation/pose, and a size (extend in 3D space). A 3D bounding box or other 3D boundary object may encode all three (position, orientation and size).”);
wherein the cost function penalizes inconsistency between the multiple time-series of sensor data and the set of variables (col. 3, lines 30-45: “The 3D images of a given training input can for example be images in a temporal sequence of images as captured by an image capture device. This could for example be an image capture device of a travelling vehicle that captures a sequence of 3D images of the vehicle's surroundings as it travels (e.g. a view in front of or behind the vehicle). An object tracker may be used to track changes in the position and/or orientation of the common object over time in 3D space. In this case, any differences in the position and/or orientation of the predicted 3D objects as determined for a given training input should match the corresponding changes in position and/or orientation of the common object as determined by the object tracker. In that case, the cost function may penalize deviations from the differences in position and/or orientation as determined using the object tracker.”), or
wherein the object belongs to a known object class, and the 3D object model or the cost function encodes expected 3D shape information associated with the known object class (col. 11, lines 23-25: “the size constraint may only be applied for certain classes of object that can be assumed to be rigid (such as card, buses etc.).”).
It would have been obvious to one familiar in the art prior to the effective filing date of the claimed invention to have incorporated the cost function of Vineet into Keilaf’s LIDAR system, as both are directed to the same field of endeavor of 3D object perception. Vineet’s cost function could allow Keilaf to more accurately predict the position, orientation, and size of an object, which would enable more effective object modeling.
Regarding claim 2, Keilaf and Vineet teach the method of claim 1. Vineet further teaches wherein the variables of the cost function comprise one or more motion parameters of a motion model for 3D object (col. 3, lines 35-42: “An object tracker may be used to track changes in the position and/or orientation of the common object over time in 3D space. In this case, any differences in the position and/or orientation of the predicted 3D objects as determined for a given training input should match the corresponding changes in position and/or orientation of the common object as determined by the object tracker.”), wherein the cost function also penalizes inconsistency between the time sequence of poses and the motion model (col. 3, lines 42-45: “In that case, the cost function may penalize deviations from the differences in position and/or orientation as determined using the object tracker.”), whereby the object is located and modelled, and motion of the object is modelled, by tuning each pose, the shape parameters and the motion parameters with the objective of optimizing the cost function (col. 14, lines 47-52: “the presented consistency and geometric losses can be incorporated along with losses defined on object center position, orientation and sizes. In recent time, several methods have been proposed that uses RGB-LiDAR data to estimate these values by directly optimizing for them in supervised learning setting.”).
Regarding claim 3, Keilaf and Vineet teach the method of claim 2. Keilaf further teaches wherein at least one of the multiple time-series of sensor data comprises a piece of sensor data which is not aligned in time with any pose of the time sequence of poses (col. 129, lines 58-64: “a processor (e.g., processor 118, CPU 3234, etc.) may be further configured to collect data indicative of an inclination of a vehicle (e.g., FIG. 33). Information indicative of the inclination of the vehicle may be provide as an output of one or more accelerometers, one or more three-dimensional accelerometers, an inertial measurement unit (IMU), etc.”), the method comprising:
using the motion model to compute, from the time sequence of poses, an interpolated pose that coincides in time with the piece of sensor data (col. 85, lines 22-25: “some items of the depth map (pixel, PC point, polygon or part thereof) may be based on interpolation or averaging of detection-based values determined for illuminated parts of the FOV.”).
Vineet further teaches wherein the cost function penalizes inconsistency between the piece of sensor data and the interpolated pose (col. 3, lines 25-29: “the cost function can penalize deviations from the known differences. In this case, the 3D structure detector, as a minimum, predicts one or both of 3D object location and 3D object orientation.”).
Regarding claim 4, Keilaf and Vineet teach the method of claim 3. Keilaf further teaches wherein the at least one time-series of sensor data comprises a time-series of images, and the piece of sensor data is an image (col. 44, lines 60-62: “The plurality of sensors 116 may thus generate signals associated with images of differing light beam-spots.”).
Regarding claim 5, Keilaf and Vineet teach the method of claim 3. Keilaf further teaches wherein the at least one time-series of sensor data comprises a time-series of lidar or radar data, the piece of sensor data is an individual lidar or radar return, and the interpolated pose coincides with a return time of the lidar or radar return (col. 16, lines 38-46: “In some embodiments, the light deflector may be moved such that during a scanning cycle of the LIDAR FOV the light deflector is located at a plurality of different instantaneous positions. In other words, during the period of time in which a scanning cycle occurs, the deflector may be moved through a series of different instantaneous positions/orientations, and the deflector may reach each different instantaneous position/orientation at a different time during the scanning cycle.”).
Regarding Claim 6, Keilaf and Vineet teach the method of claim 1. Keilaf further teaches wherein:
the variables additionally comprise one or more object dimensions for scaling the 3D object model, the shape parameters being independent of the object dimensions (col. 42, lines 28-31: “processor 118 may alternatively or concurrently vary spatial dimensions (e.g., length or width or otherwise alter a cross-sectional area) of light pulses emitted from the light source.”); or
the shape parameters of the 3D object model encode both 3D object shape and object dimensions (col. 111, lines 12-18: “Some examples of how a computation budget may be apportioned include, for example: …classification of objects/object type; tracking of objects (e.g., between frames): determining object characteristics (e.g., size, direction, velocity, reflectivity, etc.).”).
Regarding Claim 7, Keilaf and Vineet teach the method of claim 1. Vineet further teaches wherein the cost function additionally penalizes each pose to the extent the pose violates an environmental constraint (col. 17, lines 34-37: “the cost function penalizes deviation from an expected geometric relationship between the set of predicted 3D boundary objects determined for each training input.”).
Regarding Claim 8, Keilaf and Vineet teach the method of claim 7. Keilaf further teaches wherein the environmental constraint is defined relative to a known 3D road surface (col. 59, lines 39-42: “LIDAR system may identify various foreground objects, such as a surface of a road 1414, a curb 1416, and/or a surface of a sidewalk 1418.”).
Regarding Claim 10, Keilaf and Vineet teach the method of claim 1. Keilaf further teaches wherein the multiple sensor modalities comprise two or more of: an image modality, a lidar modality, and a radar modality (col. 20, lines 64-67: “host 210 may integrate, synchronize or otherwise use together the outputs of LIDAR system 100 with outputs of other sensing systems (e.g. cameras, microphones, radar systems).”).
Regarding Claim 11, Keilaf and Vineet teach the method of claim 1. Keilaf further teaches wherein at least one of the sensor modalities is such that the poses and the shape parameters are not uniquely derivable from that sensor modality alone (col. 36, lines 39-41: “The number of pulses may vary between 0 to 32 pulses (e.g., 1, 5, 12, 28, or more pulses) and may be based on information derived from previous emissions.”).
Regarding Claim 19, Keilaf and Vineet teach the method of claim 1.
Keilaf further teaches using an object classifier to determine the known class of the object from multiple available object classes, the multiple object classes associated with respective expected 3D shape information (col. 14, lines 21-23: “any other type of reconstructed three-dimensional model may store additional information for some or all of its objects.”).
Regarding Claim 21, Keilaf and Vineet teach the method of claim 1. Keilaf further teaches wherein the 3D object model is a deformable model, with at least one of the shape parameters varied across frames (col. 46, lines 35-40: “method 700 may include generating output data (e.g., a 3D model) in which the differing measurements are associated with different directions with respect to the LIDAR. In such an example, processor 118 may create a 3D-model frame (or the like) from the information of different light beams and many pixels from different angles of the FOV.”).
Regarding Claim 20, Keilaf and Vineet teach the method of claim 1. Keilaf further teaches wherein the same shape parameters are applied to each pose of the time sequence of poses for modelling a rigid object (col. 83, lines 60-65: “As previously described, the LIDAR may be operable to generate depth maps of one or more different types, such as any one or more of the following types: point cloud model (PC), polygon mesh, depth image (holding depth information for each pixel of an image or of a 2D array), or any other type of 3D model of a scene.”).
Claim 22 is functionally identical to Claim 1, and differs only in that it outlines a computer system rather than a computer-implemented method. As such, it is rejected on the same basis as Claim 1.
Claim 23 is functionally identical to Claim 1, and differs only in that it outlines a non-transitory medium embodying computer-readable instructions rather than a computer-implemented method. As such, it is rejected on the same basis as Claim 1.
Claim(s) 9, 12, 14, and 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Keilaf (US 10698114 B2) and Vineet (US 12283119 B2) as applied to claims 1 and 8 above, and further in view of Crouch (US 11802965 B2).
Regarding Claim 9, Keilaf and Vineet teach the method of claim 8, but fail to teach wherein each pose is used to locate the 3D object model relative to the road surface, and the environmental constraint penalizes each pose to the extent the 3D object model does not lie on the known 3D road surface.
Crouch teaches wherein each pose is used to locate the 3D object model relative to the road surface, and the environmental constraint penalizes each pose to the extent the 3D object model does not lie on the known 3D road surface (col. 22, lines 50-56: “Filtering the point cloud data based on Doppler has the effect of identifying and removing vegetation that may be moving in the breeze. Hard targets, man-made targets, or dense targets are then better revealed by the filtering process. This can be advantageous in defense and surveillance scenarios. In the vehicle scenario—the Doppler can be used to segment targets (i.e. road surface versus moving vehicle).”).
It would have been obvious to one familiar in the art prior to the effective filing date of the claimed invention to combine Crouch’s method for Doppler Detection with Keilaf’s modeling method, as both are in the same field of endeavor of object detection for autonomous vehicles.
Regarding Claim 12, Keilaf and Vineet teach the method of claim 1. Crouch further teaches wherein:
one of the multiple time-series of sensor data is a time-series of radar data encoding measured Doppler velocities, wherein the time sequence of poses and the 3D object model are used to compute expected Doppler velocities, and the cost function penalizes discrepancy between the measured Doppler velocities and the expected Doppler velocities (col. 21, lines 56-59: “Note that the imaging system could itself be used to estimate relative velocities. The cost matrix includes a cost for every pair of ranges in the two sets, one up-chirp range and one down-chirp range.”); or
one of the multiple time-series of sensor data is a time-series of images, and the cost function penalizes an aggregate reprojection error between (i) the images and (ii) the time sequence of poses and the 3D object model (col. 23, lines 29-35: “Another test was run with a fast frame rate, narrow field of view imager that produces 10,000 data-point point clouds per frame at a 10 Hz frame rate. A test person ran back and forth in the field of view of the sensor. Each image of the person is cut from a time series of several hundred 3D imaging frames (cut from the same 3D orientation perspective).”); or
one of the multiple time-series of sensor data is a time-series of lidar data, wherein the cost function is based on a point-to-surface distance between lidar points and a 3D surface defined by the parameters of the 3D object model, wherein the point-to-surface distance is aggregated across all points of the lidar data (as above).
It would have been obvious to one familiar in the art prior to the effective filing date of the claimed invention to combine Crouch’s method for Doppler Detection with Keilaf’s modeling method, as both are in the same field of endeavor of object detection for autonomous vehicles.
Regarding Claim 14, Keilaf, Vineet, and Crouch teach the method of claim 12. Keilaf further teaches wherein a semantic keypoint detector is applied to each image, and the reprojection error is defined on semantic keypoints of the object (col. 3, lines 41-60: “Consistent with a disclosed embodiment, a LIDAR system may include at least one processor configured to: control light emission of a light source; scan a field of view by repeatedly moving at least one light deflector located in an outbound path of the light source, wherein during a single scanning cycle of the field of view, the at least one light deflector is instantaneously located in a plurality of positions; while the at least one deflector is in a particular instantaneous position, receive via the at least one deflector, reflections of a single light beam spot along a return path to a sensor; receive from the sensor on a beam-spot-by-beam-spot basis, signals associated with an image of each light beam-spot, wherein the sensor includes a plurality of detectors and wherein a size of each detector is smaller than the image of each light beam-spot, such that on a beam-spot-by-beam-spot basis, the image of each light beam-spot impinges on a plurality of detectors; and determine, from signals resulting from the impingement on the plurality of detectors, at least two differing range measurements associated with the image of the single light beam-spot.”).
Regarding Claim 16, Keilaf, Vineet, and Crouch teach the method of claim 12. Crouch further teaches wherein the 3D object model is encoded as a distance field (col. 46, lines 34-40: “Method 700 may include additional steps. For example, method 700 may include generating output data (e.g., a 3D model) in which the differing measurements are associated with different directions with respect to the LIDAR. In such an example, processor 118 may create a 3D-model frame (or the like) from the information of different light beams and many pixels from different angles of the FOV.”).
Regarding Claim 17, Keilaf and Vineet teach the method of claim 1. Crouch further teaches wherein:
the expected 3D shape information is encoded in the 3D object model, the 3D object model learned from a set of training data comprising example objects of the known object class (col. 114, lines 17-40: “ Additional examples of environmental conditions upon which optical budget (or computational budget) apportionment may be based may include weather conditions, positions or distribution of detected objects in space (e.g., relative to LIDAR system 100 and/or a host vehicle), detected characteristics of objects in space (e.g. shape, reflectivity, characteristics affecting SNR), type/class of objects (e.g., pedestrian, building, vehicle, light post), a relative position of the sun or other light sources, a state of traffic (e.g., jammed vs. open highway), a state of other host vehicle systems (e.g., driving related or other sensors—in some cases LIDAR system 100 may compensate for a malfunctioning camera 2920), conditions of the road itself (e.g., bumpiness, roughness, going up/down, curving, its reflectivity), map/GPS based data (e.g., road location and orientation in the scene, building location and orientation in scene—(a region of lower interest may be established relative to a building or other obstacle, as LIDAR may not expect to receive reflections from objects on a far side of a building), ambient temperature around LIDAR system 100, ambient temperature of a host vehicle environment, data analysis from previous collected FOV frames (e.g., point clouds, normal to surfaces, reflectivity, confidence levels, etc.).”); or
the expected 3D shape information is encoded in a regularization term of the cost function, which penalizes discrepancy between the 3D object model and a 3D shape prior for the known object class (col. 121, lines 10-18: “According to some embodiments, the scene signal may be assessed and calculated with or without additional feedback signals such as a photonic steering assembly feedback PTX feedback, PRX feedback and host feedback and information stored in memory 2902 in a weighted means of local and global cost functions that determine a scanning/work plan such as a work plan signal for scanning unit 104 (such as: which pixels in the FOV are scanned, at which laser parameters budget, at which detector parameters budget).”).
It would have been obvious to one familiar in the art prior to the effective filing date of the claimed invention to combine Crouch’s method for Doppler Detection with Keilaf’s modeling method, as both are in the same field of endeavor of object detection for autonomous vehicles.
Response to Arguments
Applicant’s arguments, see Applicant’s Response, filed 1/26/2026, with respect to the rejection(s) of claim(s) 1, 22, and 23 under Keilaf (US 10698114 B2) have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Keilaf (US 10698114 B2), and further in view of Vineet (US 12283119 B2).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN A BARHAM whose telephone number is (571)272-4338. The examiner can normally be reached Mon-Fri, 8:30am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu, can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RYAN ALLEN BARHAM/Examiner, Art Unit 2613
/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613