DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Status
Claims 1-20 are pending for examination in the application filed 11/18/2025. Claims 1, 10-12, and 18 have been amended.
Response to Arguments and Amendments
Applicant’s arguments with respect to independent claims 1, 12, and 18 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument, as facilitated by the newly added amendments.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 10 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 10, as amended, recites the limitation "wherein determining the relative pose of the first camera with respect to the second camera comprises accounting for changes in the relative pose of the first camera and a velocity vector of the vehicle over time using the trained machine learning algorithm." Claim 1, which claim 10 depends from, as amended, recites “determining, by the processor using a trained machine learning algorithm, a relative pose of the first camera based on temporal motion data of the vehicle”. There is insufficient antecedent basis for this limitation in claim 10 due to the amendments to claim 1.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (US20170341583A1) in view of Ferencz (US20230382422A1) and Brizzi (US20210397854A1).
Regarding claim 1, Zhang teaches a method ([0002] The present disclosure generally relates to vehicles and more particularly relates to systems and methods for a towing vehicle and a trailer with surround view imaging devices) comprising: obtaining, by a processor (processor 64), first data from a first camera mounted on an object configured to be towed by a vehicle and second data from a second camera mounted on the vehicle ([0005] a method includes: receiving, from a first imaging device coupled to the trailer, a first image stream having a plurality of first images; receiving, from a second imaging device coupled to the towing vehicle, a second image stream having a plurality of second images);
determining, by the processor using a trained machine learning algorithm, a relative pose of the first camera ([0045] The calibration manager module 304 calibrates the cameras 44 on the trailer 8 to the vehicle 10. In this regard, the cameras 44 associated with the vehicle 10 are pre-calibrated relative to the vehicle coordinate system of the vehicle 10 such that the position and pose of the respective camera 44 and the field of view of that camera 44 is known to the calibration manager module 304. Generally, however, the position and field of view of the cameras 44 associated with the trailer 8 are not known to the calibration manager module 304. [0051] Generally, the calibration manager module 304 uses triangulation to estimate six extrinsic parameters for each camera 44, such as location (x, y, z) and pose (pitch, roll, yaw). At least three matching feature point pairs from the triangulation generates six equations between the image pixel (u,v) pairs of the matching feature points in the known camera 44 on the vehicle 10 and the camera 44 on trailer 8. The calibration manager module 304 solves these six equations to determine the unknown extrinsic parameters of location (x, y, z) and pose (pitch, roll, yaw) of the camera 44 on trailer 8. [0063] It should be noted that the use of foreground/background segmentation is just one example method for determining the pivot angle. Other computer vision techniques may be employed to determine the pivot angle. For example, deep learning based synthetic image generation, such as the Generative Adversarial Nets (GAN) image generation or the DLNN based image synthesis/enhancement technology, may be used);
and stitching, by the processor, the first adjusted data with the second adjusted data to generate a stitched image having a combined field of view based on the determined relative pose ([0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines. [0074] The view rendering module 312 stitches the image streams together at the boundary lines and blends the images of the image streams along the boundary line based on the blending coefficient 346 to generate a uniform, seamless image. It should be noted that the view rendering module 312 may also blend the images of the image streams based on the overlapping of the field of views of the cameras 44. The seamless image is the full view 348 of the rear 46 of the vehicle 10, as observed by the cameras 44, in which the trailer 8 is devoid in the view. The view rendering module 312 sets the seamless image as the full view 348 for the UI control module 314).
Zhang does not teach stitching using the trained machine learning algorithm.
Ferencz, in the same field of endeavor of vehicle camera systems, teaches stitching using the trained machine learning algorithm ([0237] At each frame or image, a short range model for the desired path is generated by the vehicle in a reference frame that is attached to the camera. The short range models may be stitched together to obtain a three dimensional model of the road in some coordinate frame. [0238] The second module is an end-to-end deep neural network, which may be trained to predict the correct short range path from an input image. In both modules, the road model may be detected in the image coordinate frame and transformed to a three dimensional space that may be virtually attached to the camera).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Zhang with the teachings of Ferencz to perform stitching using machine learning because "The accumulated error may be small enough over some local range scale, such as of the order of 100 meters. All this may be completed in a single drive over a particular road segment" [Ferencz 0241].
Zhang does not teach determining, by the processor using a trained machine learning algorithm, a relative pose of the first camera based on temporal motion data of the vehicle; adjusting the first data and the second data based on the determined relative pose, wherein the trained machine learning algorithm compensates for temporal misalignment between the first data and the second data.
Brizzi, in the same field of endeavor of vehicle camera systems, teaches determining, by the processor using a trained machine learning algorithm, a relative pose of the first camera based on temporal motion data of the vehicle ([0049] However, due to differences in the functional designs and/or physical placements of different sensor systems on a vehicle, the trajectory representations collected using different sensor systems are often based on different temporal and/or spatial reference frames. [0050] For instance, in terms of the temporal reference frames, the trajectory representations derived from sensor data captured by the different sensor systems may be based on different origin times (e.g., the point in time that a given sensor system considers to be “zero” for purposes of capturing the sensor data used to derive a trajectory). [0054] Pose #3 in the first sequence has a timestamp of 165 ms while Pose #3 in the second sequence as a timestamp of 135 ms, and so on, which is due to the fact that the origin time used by camera-based sensor system 102 during capture was 30 ms earlier in time than the origin time used by LiDAR-based sensor system 103 during capture (e.g., camera-based sensor system 102 was initialized and/or began capturing sensor data 30 ms earlier than LiDAR-based sensor system 103). Because of this difference in origin times, it is not possible to use the pose values' timestamps to match up the pose values included in the first sequence with their corresponding pose values in the second sequence. [0056] Further yet, FIG. 1 shows that the two different sequences of pose values are based on different global reference frames, which is due to the fact that camera-based sensor system 102 and LiDAR-based sensor system 103 may represent positions in the world in relation to different points of origin and/or global axes directions (which may be defined by the maps used when processing the sensor data to derive the trajectories). For example, as shown, the pose values included in the first sequence are represented according to a first global reference frame comprising a first point of origin G.sub.1 and a first set of axes directions, whereas the pose values included in the second sequence are represented according to a second reference frame comprising a second point of origin G.sub.2 and a second set of global axes directions);
adjusting the first data and the second data based on the determined relative pose, wherein the trained machine learning algorithm compensates for temporal misalignment between the first data and the second data ([0089] For instance, as shown in FIG. 3B, the pose values included in the first sequence are represented according to a first global reference frame comprising a first global point of origin G.sub.1 and a first set of global axes directions, whereas the pose values included in the second sequence are represented according to a second global reference frame comprising a second global point of origin G.sub.2 and a second set of global axes directions. In accordance with the disclosed technique, one possible way to align these different global reference frames is by using an optimization algorithm that iteratively adjusts the position of the pose values included in the first sequence (e.g., by translating the global point of origin and/or rotating the global axes according to which the first sequence of pose values are represented) until it identifies the adjustment that achieves the best match in the geometric shapes defined by the first and second sequences of pose values, and the pose values included in the first sequence may then be transformed in accordance with this identified adjustment. [0090] To address this time misalignment between the first and second representations of the given agent's trajectory (which are now based on a common global reference frame), the sequence of alignment functions may further involve a time alignment of the pose values included in the first and second sequences. In line with the discussion above, one possible way to align the pose values included the first and second sequences is by using an optimization algorithm that iteratively adjusts the timestamps of the pose values included in the first sequence until it identifies an “optimal” time offset that minimizes the positional error between the pose values included in the first and second sequences, and the pose values included in the first sequence may then be adjusted by this optimal time offset. [0108] such machine learning models can then be used by a vehicle's on-board computing system to further inform the perception, prediction, and/or planning operations for the vehicle).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Zhang with the teachings of Brizzi to determine a relative pose of a camera based on the temporal motion of the vehicle and adjust the data based on the pose because "In this way, the disclosed technique may enable these different representations of the agent's real-world trajectory to be compared to one another without the need for any prior design modifications or physical calibration of different sensor systems, which may provide various advantages—including but not limited to the ability to evaluate and validate new technology for deriving trajectories of agents from sensor data captured by a lower-fidelity sensor systems in a manner that is less costly, time consuming, or error prone than an approach that requires design modifications and physical calibration of different sensor systems" [Brizzi 0059].
Regarding claim 2, Zhang, Ferencz, and Brizzi teach the method of claim 1. Zhang further teaches wherein the first data comprises an image representation of a scene being observed in a first field of view of an object configured to be towed by a vehicle and the second data comprises an image representation of the scene being observed in a second field of view of the vehicle ([0011] FIG. 5A illustrates exemplary field of views for cameras coupled to the vehicle and the trailer of FIG. 1, which illustrates areas of overlap between the field of views).
Regarding claim 3, Zhang, Ferencz, and Brizzi teach the method of claim 1. Zhang further teaches wherein the stitching comprises performing a pixel extrapolation ([0051] Based on the estimated homography matrix, the calibration manager module 304 determines a distance from the respective camera 44 on the vehicle 10 to the respective common feature and a distance from the respective camera 44 on the trailer 8 to the respective common feature…At least three matching feature point pairs from the triangulation generates six equations between the image pixel (u,v) pairs of the matching feature points in the known camera 44 on the vehicle 10 and the camera 44 on trailer 8).
Zhang does not teach performing a sub-pixel extrapolation using the machine learning algorithm.
Ferencz teaches performing a sub-pixel extrapolation using the machine learning algorithm ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy. For example, each pixel may be analyzed to determine if it represents a road feature, such as a painted lane mark on the surface of a road. Using a trained model, or other algorithm, an estimated start point of the lane mark and end point of the lane mark may be identified. For example, for each pixel representing a dash along a dashed line between lanes, an estimated start point of the dash and end point of the dash may be identified. This process may be repeated for each pixel associated with the lane mark. As a result, a cloud of points representing the start point of the lane mark, and a cloud of points representing the end of the lane mark may be determined. These clouds of points may then be averaged to provide a start and end location of the lane mark with sub-pixel precision. This may be particularly effective when used in combination with the image warping techniques described above. [0367] In some embodiments, a trained machine learning model may be used to predict the start and end points of lane mark 3110. For further clarification: [0351] In some embodiments, a representation of a feature in the warped image may include more image pixels than the representation of the feature in the original captured image, which may make the feature easier to detect. For example, as discussed above, pixels 2804 and 2806 may be spaced further apart in warped image 2820 than in original image 2810. In some embodiments, as part of generating warped image 2820, additional pixels, such as pixel 2805, may be added to account for the increased spacing between pixels 2804 and 2806).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Zhang with the teachings of Ferencz to perform sub-pixel extrapolation using machine learning because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 4, Zhang, Ferencz, and Brizzi teach the method of claim 3. Zhang further teaches determining by the processor, a set of pixel shift values that represent relative positions of images in the first data and the second data ([0055] Generally, the calibration manager module 304 samples at least one of the cameras 44 on the front 45 of the vehicle 10. The calibration manager module 304 compares the image data from the calibration image data 318 to the image acquired from the vehicle camera image data 332 to determine whether one or more pixels have shifted relative to the horizon between the two images);
aligning, by the processor, the images based on the set of pixel shift values ([0056] Based on the position, the calibration manager module 304 determines whether the one or more pixels have shifted relative to the horizon between the calibration image data 318 and the image data. Based on a determined horizon position change, the calibration manager module 304 queries the tables datastore 302 and retrieves the pitch angle 322 associated with the horizon position change. [0100] If the pitch angle 322 is less than the pitch angle threshold, at 720, the method updates the calibration data 320 in the calibration datastore 300 based on the pitch angle 322. In one example, the method updates the known coordinate location for the cameras 44 coupled to the vehicle 10 based on the pitch angle 322. [0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines);
and combining the aligned images to produce the stitched image having the combined field of view ([0074] The view rendering module 312 stitches the image streams together at the boundary lines and blends the images of the image streams along the boundary line based on the blending coefficient 346 to generate a uniform, seamless image. It should be noted that the view rendering module 312 may also blend the images of the image streams based on the overlapping of the field of views of the cameras 44. The seamless image is the full view 348 of the rear 46 of the vehicle 10, as observed by the cameras 44).
Zhang does not teach performing the sub-pixel extrapolation using the trained machine learning algorithm.
Ferencz teaches performing the sub-pixel extrapolation using the machine learning algorithm ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy. For example, each pixel may be analyzed to determine if it represents a road feature, such as a painted lane mark on the surface of a road. Using a trained model, or other algorithm, an estimated start point of the lane mark and end point of the lane mark may be identified. For example, for each pixel representing a dash along a dashed line between lanes, an estimated start point of the dash and end point of the dash may be identified. This process may be repeated for each pixel associated with the lane mark. As a result, a cloud of points representing the start point of the lane mark, and a cloud of points representing the end of the lane mark may be determined. These clouds of points may then be averaged to provide a start and end location of the lane mark with sub-pixel precision. This may be particularly effective when used in combination with the image warping techniques described above. [0367] In some embodiments, a trained machine learning model may be used to predict the start and end points of lane mark 3110. For further clarification: [0351] In some embodiments, a representation of a feature in the warped image may include more image pixels than the representation of the feature in the original captured image, which may make the feature easier to detect. For example, as discussed above, pixels 2804 and 2806 may be spaced further apart in warped image 2820 than in original image 2810. In some embodiments, as part of generating warped image 2820, additional pixels, such as pixel 2805, may be added to account for the increased spacing between pixels 2804 and 2806).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Zhang with the teachings of Ferencz to perform sub-pixel extrapolation using machine learning because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 5, Zhang, Ferencz, and Brizzi teach the method of claim 4. Zhang further teaches wherein the determining the set of pixel shift values comprises determining an amount of overlap between the images that is less than an overlap threshold ([0057] The calibration manager module 304 compares the retrieved pitch angle 322 with a pitch angle threshold. The pitch angle threshold is a default or factory set maximum pitch angle for the vehicle 10 towing the trailer 8).
Zhang does not teach sub-pixels.
Ferencz teaches sub-pixels ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Zhang with the teachings of Ferencz to use sub-pixel accuracy because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 6, Zhang, Ferencz, and Brizzi teach the method of claim 4. Zhang further teaches wherein the determining the set of pixel shift values comprises: determining a geometric transformation estimate between the images; and determining a camera pose of the first camera based on the geometric transformation estimate ([0051] With three distances determined for each camera 44 on the trailer 8, for each camera 44 on the trailer 8, the calibration manager module 304 uses triangulation to estimate a three-dimensional coordinate location and pose of the respective camera 44 on the trailer 8 in the vehicle coordinate system), wherein the aligning is based on the camera pose of the first camera ([0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines).
Zhang does not teach sub-pixels.
Ferencz teaches sub-pixels ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Zhang with the teachings of Ferencz use sub-pixel accuracy because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 7, Zhang, Ferencz, and Brizzi teach the method of claim 1. Zhang further teaches wherein the obtaining the first data comprises receiving the first data from the first camera over a wireless network ([0040] Each of the cameras 44 on the trailer 8 are in communication with the controller 40 wirelessly, or through a wired connection to a communication architecture that facilitates the transfer of data, power, commands, etc., such as NTSC, LVDS, or Ethernet cables), and wherein the obtaining the second data comprises receiving the second data from the second camera over the wireless network ([0034] Each of the cameras 44 on the vehicle 10 are in communication with the controller 40 wirelessly, via the communication system 42, or through a wired connection to a communication architecture that facilitates the transfer of data, power, commands, etc.).
Regarding claim 8, Zhang, Ferencz, and Brizzi teach the method of claim 1. Zhang further teaches wherein the obtaining the first data comprises receiving the first data from the first camera over a wireless network ([0040] Each of the cameras 44 on the trailer 8 are in communication with the controller 40 wirelessly, or through a wired connection to a communication architecture that facilitates the transfer of data, power, commands, etc., such as NTSC, LVDS, or Ethernet cables), and wherein the obtaining the second data comprises receiving the second data from the second camera over a wired communication link between the second camera and the processor ([0034] Each of the cameras 44 on the vehicle 10 are in communication with the controller 40 wirelessly, via the communication system 42, or through a wired connection to a communication architecture that facilitates the transfer of data, power, commands, etc.).
Regarding claim 9, Zhang, Ferencz, and Brizzi teach the method of claim 1. Zhang further teaches providing, on a display, the stitched image ([0080] The view rendering module 312 stitches the images of the image streams together at the adjusted boundary lines and blends the images of the image streams along the adjusted boundary line based on the blending coefficient 346 to generate the uniform, seamless image. [0069] The view rendering module 312 generates a full view 348 for rendering on the display 54 that is unobstructed by the trailer 8 or devoid of the trailer 80).
Regarding claim 10, Zhang, Ferencz, and Brizzi teach the method of claim 1. Brizzi teaches wherein determining the relative pose of the first camera with respect to the second camera comprises accounting for changes in the relative pose of the first camera and a velocity vector of the vehicle over time using the trained machine learning algorithm ([0119] These differences in the temporal and/or spatial reference frames of the camera-based and LiDAR-based sets of trajectory representations may present several problems when attempting to compare datasets characterizing an instance of the cut-in scenario type. For instance, one such problem relates to the identification of the specific cut-in time for an instance of the cut-in scenario type, which is used to extract the particular trajectory information that forms the basis for deriving the characterizing data (e.g., the position, orientation, and velocity of each of vehicles 401 and 501 at the cut-in time). In line with the discussion above, this function may generally involve an evaluation of the trajectory representations for vehicle 401 and vehicle 501 (along with lane information) to identify the particular point in time when vehicle 501 crossed into the lane of vehicle 401. However, when there are two different sets of trajectory representations for vehicle 401 and vehicle 501, this function becomes more complicated, as the cut-in time for an instance of the cut-in scenario type occurs needs to be identified according to the temporal reference frame of both the camera-based set of trajectory representations and the LiDAR-based set of trajectory representations. [0124] Accordingly, before the first dataset characterizing instances of a scenario type that is collected using a vehicle's first sensor system can be evaluated against the second dataset characterizing such instances of the scenario type that is collected using the vehicle's second sensor system, the differences between the temporal and/or spatial reference frames of any trajectory representations used to derived such data may need to be reconciled. In this respect, as discussed above, aspects of the disclosed technique for aligning different representations of an agent's real-world trajectory that are based on different temporal and/or spatial reference frames (e.g., representations of an agent's real-world trajectory that are derived from different source data) may be used to satisfy this need. [0149] 2D sensor(s) 601a may have an arrangement that is capable of capturing 2D sensor data representing a 360° view of the vehicle's surrounding environment, one example of which may take the form of an array of 6-7 cameras. [0108] such machine learning models can then be used by a vehicle's on-board computing system to further inform the perception, prediction, and/or planning operations for the vehicle).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Zhang with the teachings of Brizzi to account for changes in the relative pose of the first camera and a velocity vector because "when there are two different sets of trajectory representations for vehicle 401 and vehicle 501, this function becomes more complicated, as the cut-in time for an instance of the cut-in scenario type occurs needs to be identified according to the temporal reference frame of both the camera-based set of trajectory representations and the LiDAR-based set of trajectory representations" [Brizzi 0119].
Regarding claim 11, Zhang, Ferencz, and Brizzi teach the method of claim 1. Ferencz teaches further comprising receiving, by the processor, a location signal that is output from one or more of the first camera or the second camera, the location signal indicating location information associated with the vehicle ([0329] The location identifiers may be generated by a vehicle, such as vehicles 1205, 1210, 1215, 1220, and 1225, based on images captured by the vehicle. For example, the identifiers may be determined based on acquisition, from a camera associated with a host vehicle, of at least one image representative of an environment of the host vehicle, analysis of the at least one image to detect the lane mark in the environment of the host vehicle, and analysis of the at least one image to determine a position of the detected lane mark relative to a location associated with the host vehicle. [0331] Server 1230 may update the model based on the various methods or processes described above with respect to FIG. 24E. In some embodiments, updating the autonomous vehicle road navigation model may include storing one or more indicators of position in real world coordinates of the detected lane mark. The autonomous vehicle road navigation model may also include a at least one target trajectory for a vehicle to follow along the corresponding road segment, as shown in FIG. 24E).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Zhang with the teachings of Ferencz "for mapping a lane mark for use in autonomous vehicle navigation" [Ferencz 0329].
Regarding claim 12, Zhang teaches a system ([0002] The present disclosure generally relates to vehicles and more particularly relates to systems and methods for a towing vehicle and a trailer with surround view imaging devices), comprising: memory; and at least one processor coupled to the memory ([0031] The controller 40 includes at least one processor 64 and a computer readable storage device or media 66…The computer readable storage device or media 66 may include volatile and nonvolatile storage in read-only memory (ROM)) and configured to: obtain first data from at least one camera of an object configured to be towed by a vehicle and second data from at least one camera of the vehicle ([0005] a method includes: receiving, from a first imaging device coupled to the trailer, a first image stream having a plurality of first images; receiving, from a second imaging device coupled to the towing vehicle, a second image stream having a plurality of second images);
determine, using a trained machine learning algorithm, a relative pose of the at least one camera of the object ([0045] The calibration manager module 304 calibrates the cameras 44 on the trailer 8 to the vehicle 10. In this regard, the cameras 44 associated with the vehicle 10 are pre-calibrated relative to the vehicle coordinate system of the vehicle 10 such that the position and pose of the respective camera 44 and the field of view of that camera 44 is known to the calibration manager module 304. Generally, however, the position and field of view of the cameras 44 associated with the trailer 8 are not known to the calibration manager module 304. [0051] Generally, the calibration manager module 304 uses triangulation to estimate six extrinsic parameters for each camera 44, such as location (x, y, z) and pose (pitch, roll, yaw). At least three matching feature point pairs from the triangulation generates six equations between the image pixel (u,v) pairs of the matching feature points in the known camera 44 on the vehicle 10 and the camera 44 on trailer 8. The calibration manager module 304 solves these six equations to determine the unknown extrinsic parameters of location (x, y, z) and pose (pitch, roll, yaw) of the camera 44 on trailer 8. [0063] It should be noted that the use of foreground/background segmentation is just one example method for determining the pivot angle. Other computer vision techniques may be employed to determine the pivot angle. For example, deep learning based synthetic image generation, such as the Generative Adversarial Nets (GAN) image generation or the DLNN based image synthesis/enhancement technology, may be used);
align images in the adjusted first data and the adjusted second data based on the determined relative pose ([0056] Based on the position, the calibration manager module 304 determines whether the one or more pixels have shifted relative to the horizon between the calibration image data 318 and the image data. Based on a determined horizon position change, the calibration manager module 304 queries the tables datastore 302 and retrieves the pitch angle 322 associated with the horizon position change. [0100] If the pitch angle 322 is less than the pitch angle threshold, at 720, the method updates the calibration data 320 in the calibration datastore 300 based on the pitch angle 322. In one example, the method updates the known coordinate location for the cameras 44 coupled to the vehicle 10 based on the pitch angle 322. [0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines);
and combine the aligned images to generate a stitched image having a combined field of view ([0074] The view rendering module 312 stitches the image streams together at the boundary lines and blends the images of the image streams along the boundary line based on the blending coefficient 346 to generate a uniform, seamless image. It should be noted that the view rendering module 312 may also blend the images of the image streams based on the overlapping of the field of views of the cameras 44. The seamless image is the full view 348 of the rear 46 of the vehicle 10, as observed by the cameras 44).
Zhang does not teach align using the trained machine learning algorithm.
Ferencz, in the same field of endeavor of vehicle camera systems, teaches align using the trained machine learning algorithm ([0237] At each frame or image, a short range model for the desired path is generated by the vehicle in a reference frame that is attached to the camera. The short range models may be stitched together to obtain a three dimensional model of the road in some coordinate frame. [0238] The second module is an end-to-end deep neural network, which may be trained to predict the correct short range path from an input image. In both modules, the road model may be detected in the image coordinate frame and transformed to a three dimensional space that may be virtually attached to the camera).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Zhang with the teachings of Ferencz to perform stitching using machine learning because "The accumulated error may be small enough over some local range scale, such as of the order of 100 meters. All this may be completed in a single drive over a particular road segment" [Ferencz 0241].
Zhang does not teach determine, using a trained machine learning algorithm, a relative pose of the first camera based on temporal motion data of the vehicle; adjust the first data and the second data based on the determined relative pose, wherein the trained machine learning algorithm compensates for temporal misalignment between the first data and the second data.
Brizzi, in the same field of endeavor of vehicle camera systems, teaches determine, using a trained machine learning algorithm, a relative pose of the first camera based on temporal motion data of the vehicle ([0049] However, due to differences in the functional designs and/or physical placements of different sensor systems on a vehicle, the trajectory representations collected using different sensor systems are often based on different temporal and/or spatial reference frames. [0050] For instance, in terms of the temporal reference frames, the trajectory representations derived from sensor data captured by the different sensor systems may be based on different origin times (e.g., the point in time that a given sensor system considers to be “zero” for purposes of capturing the sensor data used to derive a trajectory). [0054] Pose #3 in the first sequence has a timestamp of 165 ms while Pose #3 in the second sequence as a timestamp of 135 ms, and so on, which is due to the fact that the origin time used by camera-based sensor system 102 during capture was 30 ms earlier in time than the origin time used by LiDAR-based sensor system 103 during capture (e.g., camera-based sensor system 102 was initialized and/or began capturing sensor data 30 ms earlier than LiDAR-based sensor system 103). Because of this difference in origin times, it is not possible to use the pose values' timestamps to match up the pose values included in the first sequence with their corresponding pose values in the second sequence. [0056] Further yet, FIG. 1 shows that the two different sequences of pose values are based on different global reference frames, which is due to the fact that camera-based sensor system 102 and LiDAR-based sensor system 103 may represent positions in the world in relation to different points of origin and/or global axes directions (which may be defined by the maps used when processing the sensor data to derive the trajectories). For example, as shown, the pose values included in the first sequence are represented according to a first global reference frame comprising a first point of origin G.sub.1 and a first set of axes directions, whereas the pose values included in the second sequence are represented according to a second reference frame comprising a second point of origin G.sub.2 and a second set of global axes directions);
adjust the first data and the second data based on the determined relative pose, wherein the trained machine learning algorithm compensates for temporal misalignment between the first data and the second data ([0089] For instance, as shown in FIG. 3B, the pose values included in the first sequence are represented according to a first global reference frame comprising a first global point of origin G.sub.1 and a first set of global axes directions, whereas the pose values included in the second sequence are represented according to a second global reference frame comprising a second global point of origin G.sub.2 and a second set of global axes directions. In accordance with the disclosed technique, one possible way to align these different global reference frames is by using an optimization algorithm that iteratively adjusts the position of the pose values included in the first sequence (e.g., by translating the global point of origin and/or rotating the global axes according to which the first sequence of pose values are represented) until it identifies the adjustment that achieves the best match in the geometric shapes defined by the first and second sequences of pose values, and the pose values included in the first sequence may then be transformed in accordance with this identified adjustment. [0090] To address this time misalignment between the first and second representations of the given agent's trajectory (which are now based on a common global reference frame), the sequence of alignment functions may further involve a time alignment of the pose values included in the first and second sequences. In line with the discussion above, one possible way to align the pose values included the first and second sequences is by using an optimization algorithm that iteratively adjusts the timestamps of the pose values included in the first sequence until it identifies an “optimal” time offset that minimizes the positional error between the pose values included in the first and second sequences, and the pose values included in the first sequence may then be adjusted by this optimal time offset. [0108] such machine learning models can then be used by a vehicle's on-board computing system to further inform the perception, prediction, and/or planning operations for the vehicle).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Zhang with the teachings of Brizzi to determine a relative pose of a camera based on the temporal motion of the vehicle and adjust the data based on the pose because "In this way, the disclosed technique may enable these different representations of the agent's real-world trajectory to be compared to one another without the need for any prior design modifications or physical calibration of different sensor systems, which may provide various advantages—including but not limited to the ability to evaluate and validate new technology for deriving trajectories of agents from sensor data captured by a lower-fidelity sensor systems in a manner that is less costly, time consuming, or error prone than an approach that requires design modifications and physical calibration of different sensor systems" [Brizzi 0059].
Regarding claim 13, Zhang, Ferencz, and Brizzi teach the system of claim 12. Zhang further teaches wherein the at least one processor configured to align and combine the images is further configured to perform a pixel extrapolation ([0051] Based on the estimated homography matrix, the calibration manager module 304 determines a distance from the respective camera 44 on the vehicle 10 to the respective common feature and a distance from the respective camera 44 on the trailer 8 to the respective common feature…At least three matching feature point pairs from the triangulation generates six equations between the image pixel (u,v) pairs of the matching feature points in the known camera 44 on the vehicle 10 and the camera 44 on trailer 8).
Zhang does not teach performing a sub-pixel extrapolation using the machine learning algorithm.
Ferencz teaches performing a sub-pixel extrapolation using the machine learning algorithm ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy. For example, each pixel may be analyzed to determine if it represents a road feature, such as a painted lane mark on the surface of a road. Using a trained model, or other algorithm, an estimated start point of the lane mark and end point of the lane mark may be identified. For example, for each pixel representing a dash along a dashed line between lanes, an estimated start point of the dash and end point of the dash may be identified. This process may be repeated for each pixel associated with the lane mark. As a result, a cloud of points representing the start point of the lane mark, and a cloud of points representing the end of the lane mark may be determined. These clouds of points may then be averaged to provide a start and end location of the lane mark with sub-pixel precision. This may be particularly effective when used in combination with the image warping techniques described above. [0367] In some embodiments, a trained machine learning model may be used to predict the start and end points of lane mark 3110. For further clarification: [0351] In some embodiments, a representation of a feature in the warped image may include more image pixels than the representation of the feature in the original captured image, which may make the feature easier to detect. For example, as discussed above, pixels 2804 and 2806 may be spaced further apart in warped image 2820 than in original image 2810. In some embodiments, as part of generating warped image 2820, additional pixels, such as pixel 2805, may be added to account for the increased spacing between pixels 2804 and 2806).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Zhang with the teachings of Ferencz to perform sub-pixel extrapolation using machine learning because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 14, Zhang, Ferencz, and Brizzi teach the system of claim 13. Zhang further teaches determining a set of pixel shift values that represent relative positions of images in the first data and the second data ([0055] Generally, the calibration manager module 304 samples at least one of the cameras 44 on the front 45 of the vehicle 10. The calibration manager module 304 compares the image data from the calibration image data 318 to the image acquired from the vehicle camera image data 332 to determine whether one or more pixels have shifted relative to the horizon between the two images);
aligning the images based on the set of pixel shift values ([0056] Based on the position, the calibration manager module 304 determines whether the one or more pixels have shifted relative to the horizon between the calibration image data 318 and the image data. Based on a determined horizon position change, the calibration manager module 304 queries the tables datastore 302 and retrieves the pitch angle 322 associated with the horizon position change. [0100] If the pitch angle 322 is less than the pitch angle threshold, at 720, the method updates the calibration data 320 in the calibration datastore 300 based on the pitch angle 322. In one example, the method updates the known coordinate location for the cameras 44 coupled to the vehicle 10 based on the pitch angle 322. [0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines);
and combining the aligned images to produce the stitched image having the combined field of view ([0074] The view rendering module 312 stitches the image streams together at the boundary lines and blends the images of the image streams along the boundary line based on the blending coefficient 346 to generate a uniform, seamless image. It should be noted that the view rendering module 312 may also blend the images of the image streams based on the overlapping of the field of views of the cameras 44. The seamless image is the full view 348 of the rear 46 of the vehicle 10, as observed by the cameras 44).
Zhang does not teach performing the sub-pixel extrapolation using the trained machine learning algorithm.
Ferencz teaches performing the sub-pixel extrapolation using the trained machine learning algorithm ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy. For example, each pixel may be analyzed to determine if it represents a road feature, such as a painted lane mark on the surface of a road. Using a trained model, or other algorithm, an estimated start point of the lane mark and end point of the lane mark may be identified. For example, for each pixel representing a dash along a dashed line between lanes, an estimated start point of the dash and end point of the dash may be identified. This process may be repeated for each pixel associated with the lane mark. As a result, a cloud of points representing the start point of the lane mark, and a cloud of points representing the end of the lane mark may be determined. These clouds of points may then be averaged to provide a start and end location of the lane mark with sub-pixel precision. This may be particularly effective when used in combination with the image warping techniques described above. [0367] In some embodiments, a trained machine learning model may be used to predict the start and end points of lane mark 3110. For further clarification: [0351] In some embodiments, a representation of a feature in the warped image may include more image pixels than the representation of the feature in the original captured image, which may make the feature easier to detect. For example, as discussed above, pixels 2804 and 2806 may be spaced further apart in warped image 2820 than in original image 2810. In some embodiments, as part of generating warped image 2820, additional pixels, such as pixel 2805, may be added to account for the increased spacing between pixels 2804 and 2806).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Zhang with the teachings of Ferencz to perform sub-pixel extrapolation using machine learning because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 15, Zhang, Ferencz, and Brizzi teach the system of claim 14. Zhang further teaches wherein the at least one processor configured to perform the pixel extrapolation is further configured to: determine a set of pixel shift values that represent relative positions of images in the first data and the second data ([0055] Generally, the calibration manager module 304 samples at least one of the cameras 44 on the front 45 of the vehicle 10. The calibration manager module 304 compares the image data from the calibration image data 318 to the image acquired from the vehicle camera image data 332 to determine whether one or more pixels have shifted relative to the horizon between the two images);
align the images based on the set of pixel shift values ([0056] Based on the position, the calibration manager module 304 determines whether the one or more pixels have shifted relative to the horizon between the calibration image data 318 and the image data. Based on a determined horizon position change, the calibration manager module 304 queries the tables datastore 302 and retrieves the pitch angle 322 associated with the horizon position change. [0100] If the pitch angle 322 is less than the pitch angle threshold, at 720, the method updates the calibration data 320 in the calibration datastore 300 based on the pitch angle 322. In one example, the method updates the known coordinate location for the cameras 44 coupled to the vehicle 10 based on the pitch angle 322. [0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines);
and combine the aligned images to produce the stitched image having the combined field of view ([0074] The view rendering module 312 stitches the image streams together at the boundary lines and blends the images of the image streams along the boundary line based on the blending coefficient 346 to generate a uniform, seamless image. It should be noted that the view rendering module 312 may also blend the images of the image streams based on the overlapping of the field of views of the cameras 44. The seamless image is the full view 348 of the rear 46 of the vehicle 10, as observed by the cameras 44).
Zhang does not teach performing the sub-pixel extrapolation using the trained machine learning algorithm.
Ferencz teaches performing the sub-pixel extrapolation using the machine learning algorithm ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy. For example, each pixel may be analyzed to determine if it represents a road feature, such as a painted lane mark on the surface of a road. Using a trained model, or other algorithm, an estimated start point of the lane mark and end point of the lane mark may be identified. For example, for each pixel representing a dash along a dashed line between lanes, an estimated start point of the dash and end point of the dash may be identified. This process may be repeated for each pixel associated with the lane mark. As a result, a cloud of points representing the start point of the lane mark, and a cloud of points representing the end of the lane mark may be determined. These clouds of points may then be averaged to provide a start and end location of the lane mark with sub-pixel precision. [0367] In some embodiments, a trained machine learning model may be used to predict the start and end points of lane mark 3110). For further clarification: [0351] In some embodiments, a representation of a feature in the warped image may include more image pixels than the representation of the feature in the original captured image, which may make the feature easier to detect. For example, as discussed above, pixels 2804 and 2806 may be spaced further apart in warped image 2820 than in original image 2810. In some embodiments, as part of generating warped image 2820, additional pixels, such as pixel 2805, may be added to account for the increased spacing between pixels 2804 and 2806).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Zhang with the teachings of Ferencz to perform sub-pixel extrapolation using machine learning because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 16, Zhang, Ferencz, and Brizzi teach the system of claim 15. Zhang further teaches wherein the at least one processor configured to determine the set of pixel shift values is further configured to determine an amount of overlap between the images that is less than an overlap threshold ([0057] The calibration manager module 304 compares the retrieved pitch angle 322 with a pitch angle threshold. The pitch angle threshold is a default or factory set maximum pitch angle for the vehicle 10 towing the trailer 8).
Zhang does not teach sub-pixels.
Ferencz teaches sub-pixels ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Zhang with the teachings of Ferencz to use sub-pixel accuracy because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 17, Zhang, Ferencz, and Brizzi teach the system of claim 15. Zhang further teaches wherein the at least one processor configured to determine the set of pixel shift values is further configured to: determine a geometric transformation estimate between the images; and determine a camera pose of the at least one camera of the object based on the geometric transformation estimate ([0051] With three distances determined for each camera 44 on the trailer 8, for each camera 44 on the trailer 8, the calibration manager module 304 uses triangulation to estimate a three-dimensional coordinate location and pose of the respective camera 44 on the trailer 8 in the vehicle coordinate system), wherein the aligning is based on the camera pose of the at least one camera of the object ([0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines).
Zhang does not teach sub-pixels.
Ferencz teaches sub-pixels ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Zhang with the teachings of Ferencz use sub-pixel accuracy because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 18, Zhang teaches a vehicle ([0002] The present disclosure generally relates to vehicles and more particularly relates to systems and methods for a towing vehicle and a trailer with surround view imaging devices), comprising: comprising: a first set of cameras; and a processor (processor 64) configured to: receive first data from at least one camera of a second set of cameras of an object configured to be towed by the vehicle and second data from at least one camera of the first set of cameras of the vehicle ([0005] a method includes: receiving, from a first imaging device coupled to the trailer, a first image stream having a plurality of first images; receiving, from a second imaging device coupled to the towing vehicle, a second image stream having a plurality of second images);
determine, using a trained machine learning algorithm, a relative pose of the at least one camera of the second set of cameras ([0045] The calibration manager module 304 calibrates the cameras 44 on the trailer 8 to the vehicle 10. In this regard, the cameras 44 associated with the vehicle 10 are pre-calibrated relative to the vehicle coordinate system of the vehicle 10 such that the position and pose of the respective camera 44 and the field of view of that camera 44 is known to the calibration manager module 304. Generally, however, the position and field of view of the cameras 44 associated with the trailer 8 are not known to the calibration manager module 304. [0051] Generally, the calibration manager module 304 uses triangulation to estimate six extrinsic parameters for each camera 44, such as location (x, y, z) and pose (pitch, roll, yaw). At least three matching feature point pairs from the triangulation generates six equations between the image pixel (u,v) pairs of the matching feature points in the known camera 44 on the vehicle 10 and the camera 44 on trailer 8. The calibration manager module 304 solves these six equations to determine the unknown extrinsic parameters of location (x, y, z) and pose (pitch, roll, yaw) of the camera 44 on trailer 8. [0063] It should be noted that the use of foreground/background segmentation is just one example method for determining the pivot angle. Other computer vision techniques may be employed to determine the pivot angle. For example, deep learning based synthetic image generation, such as the Generative Adversarial Nets (GAN) image generation or the DLNN based image synthesis/enhancement technology, may be used);
determine a set of pixel shift values that represent relative positions of images in the adjusted first data and the adjusted second data based the determined relative pose ([0050] The calibration manager module 304 processes the images identified to overlap based on feature point or pattern detection to determine features, such as corner points, object textures or patterns, colors, etc. in each of the acquired images. The calibration manager module 304 determines which features are common features between the trailer camera image data 330 and the vehicle camera image data 332 via matching and tracking. [0051] The calibration manager module 304 stores the determined coordinate locations for each of the cameras 44 on the trailer 8 as the calibration data 320 in the calibration datastore 300. [0055] The calibration manager module 304 compares the image data from the calibration image data 318 to the image acquired from the vehicle camera image data 332 to determine whether one or more pixels have shifted relative to the horizon between the two images);
align the images based on the set of pixel shift values ([0056] Based on the position, the calibration manager module 304 determines whether the one or more pixels have shifted relative to the horizon between the calibration image data 318 and the image data. Based on a determined horizon position change, the calibration manager module 304 queries the tables datastore 302 and retrieves the pitch angle 322 associated with the horizon position change. [0100] If the pitch angle 322 is less than the pitch angle threshold, at 720, the method updates the calibration data 320 in the calibration datastore 300 based on the pitch angle 322. In one example, the method updates the known coordinate location for the cameras 44 coupled to the vehicle 10 based on the pitch angle 322. [0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines);
and combine the aligned images to produce a stitched image having a combined field of view ([0074] The view rendering module 312 stitches the image streams together at the boundary lines and blends the images of the image streams along the boundary line based on the blending coefficient 346 to generate a uniform, seamless image. It should be noted that the view rendering module 312 may also blend the images of the image streams based on the overlapping of the field of views of the cameras 44. The seamless image is the full view 348 of the rear 46 of the vehicle 10, as observed by the cameras 44).
Zhang does not teach determine, using the trained machine learning algorithm, a set of sub-pixel shift values.
Ferencz teaches determine, using the trained machine learning algorithm, a set of sub-pixel shift values ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy. For example, each pixel may be analyzed to determine if it represents a road feature, such as a painted lane mark on the surface of a road. Using a trained model, or other algorithm, an estimated start point of the lane mark and end point of the lane mark may be identified. For example, for each pixel representing a dash along a dashed line between lanes, an estimated start point of the dash and end point of the dash may be identified. This process may be repeated for each pixel associated with the lane mark. As a result, a cloud of points representing the start point of the lane mark, and a cloud of points representing the end of the lane mark may be determined. These clouds of points may then be averaged to provide a start and end location of the lane mark with sub-pixel precision. This may be particularly effective when used in combination with the image warping techniques described above. [0367] In some embodiments, a trained machine learning model may be used to predict the start and end points of lane mark 3110. For further clarification: [0351] In some embodiments, a representation of a feature in the warped image may include more image pixels than the representation of the feature in the original captured image, which may make the feature easier to detect. For example, as discussed above, pixels 2804 and 2806 may be spaced further apart in warped image 2820 than in original image 2810. In some embodiments, as part of generating warped image 2820, additional pixels, such as pixel 2805, may be added to account for the increased spacing between pixels 2804 and 2806).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the vehicle of Zhang with the teachings of Ferencz to determine sub-pixel shift values using machine learning because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Zhang does not teach determine, using a trained machine learning algorithm, a relative pose of the at least one camera of the second set of cameras based on temporal motion data of the vehicle; adjust the first data and the second data based on the determined relative pose, wherein the trained machine learning algorithm compensates for temporal misalignment between the first data and the second data.
Brizzi, in the same field of endeavor of vehicle camera systems, teaches determine, using a trained machine learning algorithm, a relative pose of the at least one camera of the second set of cameras based on temporal motion data of the vehicle ([0049] However, due to differences in the functional designs and/or physical placements of different sensor systems on a vehicle, the trajectory representations collected using different sensor systems are often based on different temporal and/or spatial reference frames. [0050] For instance, in terms of the temporal reference frames, the trajectory representations derived from sensor data captured by the different sensor systems may be based on different origin times (e.g., the point in time that a given sensor system considers to be “zero” for purposes of capturing the sensor data used to derive a trajectory). [0054] Pose #3 in the first sequence has a timestamp of 165 ms while Pose #3 in the second sequence as a timestamp of 135 ms, and so on, which is due to the fact that the origin time used by camera-based sensor system 102 during capture was 30 ms earlier in time than the origin time used by LiDAR-based sensor system 103 during capture (e.g., camera-based sensor system 102 was initialized and/or began capturing sensor data 30 ms earlier than LiDAR-based sensor system 103). Because of this difference in origin times, it is not possible to use the pose values' timestamps to match up the pose values included in the first sequence with their corresponding pose values in the second sequence. [0056] Further yet, FIG. 1 shows that the two different sequences of pose values are based on different global reference frames, which is due to the fact that camera-based sensor system 102 and LiDAR-based sensor system 103 may represent positions in the world in relation to different points of origin and/or global axes directions (which may be defined by the maps used when processing the sensor data to derive the trajectories). For example, as shown, the pose values included in the first sequence are represented according to a first global reference frame comprising a first point of origin G.sub.1 and a first set of axes directions, whereas the pose values included in the second sequence are represented according to a second reference frame comprising a second point of origin G.sub.2 and a second set of global axes directions);
adjust the first data and the second data based on the determined relative pose, wherein the trained machine learning algorithm compensates for temporal misalignment between the first data and the second data ([0089] For instance, as shown in FIG. 3B, the pose values included in the first sequence are represented according to a first global reference frame comprising a first global point of origin G.sub.1 and a first set of global axes directions, whereas the pose values included in the second sequence are represented according to a second global reference frame comprising a second global point of origin G.sub.2 and a second set of global axes directions. In accordance with the disclosed technique, one possible way to align these different global reference frames is by using an optimization algorithm that iteratively adjusts the position of the pose values included in the first sequence (e.g., by translating the global point of origin and/or rotating the global axes according to which the first sequence of pose values are represented) until it identifies the adjustment that achieves the best match in the geometric shapes defined by the first and second sequences of pose values, and the pose values included in the first sequence may then be transformed in accordance with this identified adjustment. [0090] To address this time misalignment between the first and second representations of the given agent's trajectory (which are now based on a common global reference frame), the sequence of alignment functions may further involve a time alignment of the pose values included in the first and second sequences. In line with the discussion above, one possible way to align the pose values included the first and second sequences is by using an optimization algorithm that iteratively adjusts the timestamps of the pose values included in the first sequence until it identifies an “optimal” time offset that minimizes the positional error between the pose values included in the first and second sequences, and the pose values included in the first sequence may then be adjusted by this optimal time offset. [0108] such machine learning models can then be used by a vehicle's on-board computing system to further inform the perception, prediction, and/or planning operations for the vehicle).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the vehicle of Zhang with the teachings of Brizzi to determine a relative pose of a camera based on the temporal motion of the vehicle and adjust the data based on the pose because "In this way, the disclosed technique may enable these different representations of the agent's real-world trajectory to be compared to one another without the need for any prior design modifications or physical calibration of different sensor systems, which may provide various advantages—including but not limited to the ability to evaluate and validate new technology for deriving trajectories of agents from sensor data captured by a lower-fidelity sensor systems in a manner that is less costly, time consuming, or error prone than an approach that requires design modifications and physical calibration of different sensor systems" [Brizzi 0059].
Regarding claim 19, Zhang, Ferencz, and Brizzi teach the vehicle of claim 18. Zhang further teaches wherein the processor configured to determine the set of pixel shift values is further configured to determine an amount of overlap between the images that is less than an overlap threshold ([0057] The calibration manager module 304 compares the retrieved pitch angle 322 with a pitch angle threshold. The pitch angle threshold is a default or factory set maximum pitch angle for the vehicle 10 towing the trailer 8).
Zhang does not teach sub-pixels.
Ferencz teaches sub-pixels ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the vehicle of Zhang with the teachings of Ferencz to use sub-pixel accuracy because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Regarding claim 20, Zhang, Ferencz, and Brizzi teach the vehicle of claim 18. Zhang further teaches wherein the processor configured to determine the set of pixel shift values is further configured to: determine a geometric transformation estimate between the images; and determine a camera pose of the at least one camera of the object based on the geometric transformation estimate ([0051] With three distances determined for each camera 44 on the trailer 8, for each camera 44 on the trailer 8, the calibration manager module 304 uses triangulation to estimate a three-dimensional coordinate location and pose of the respective camera 44 on the trailer 8 in the vehicle coordinate system), wherein the aligning is based on the camera pose of the at least one camera of the first set of cameras ([0071] The view rendering module 312 receives as input the calibration data 320. The view rendering module 312 processes the vehicle camera image data 332 and the trailer camera image data 330 based on the calibration data 320 and the known locations of the cameras 44 on the vehicle 10 to determine the areas of overlap in the images of the image streams. Based on the overlap, the view rendering module 312 defines stitching boundary lines).
Zhang does not teach sub-pixels.
Ferencz teaches sub-pixels ([0364] In some embodiments, one or more pixels of a road surface may be analyzed to identify lane marks or other road features with sub-pixel accuracy).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the vehicle of Zhang with the teachings of Ferencz use sub-pixel accuracy because "road features at greater distances may be identified at greater precision using the point cloud averaging techniques" [Ferencz 0364].
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jacqueline R Zak whose telephone number is (571)272-4077. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached at (571) 270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JACQUELINE R ZAK/Examiner, Art Unit 2666
/EMILY C TERRELL/Supervisory Patent Examiner, Art Unit 2666