DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is responsive to remarks filed on 05/15/2024. Claims 1-14 are pending in the instant application. Claims 1 and 8 are independent. An Office Action on the merits follows here below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1 and 2 are rejected under 35 U.S.C. 103 as being unpatentable over Dasgupta (US 20180158197 A1) in combination with Savvides (US 20190057588 A1).
Regarding Claim 1: Dasgupta discloses a method for extracting kinematic data for vehicles using an unmanned aerial vehicle (Refer to para [021]; “FIG. 1A shows an example configuration of an unmanned aerial vehicle (UAV) 100 within which certain techniques described herein may be applied.”) comprising: defining a region of interest on the ground (Refer to para [040]; “The depth computation can look specifically at pixels that are labeled to be part of an object of interest (e.g., a subject 102).”) providing a background image of the region of interest (Refer to para [050]; “This segmentation process output may allow a tracking system 140 to distinguish the objects represented in an image and the rest of the image (i.e., a background).”) capturing, by a camera, a series of images over time of the region of interest from a perspective above the region of interest (Refer to para [021]; “The example UAV 100 includes propulsion and control actuators 110 (e.g., powered rotors or aerodynamic control surfaces) for maintaining controlled flight, various sensors for automated navigation and flight control 112, and one or more image capture devices 114 and 115 for capturing images (including video) of the surrounding physical environment while in flight.”) from the series of images, detecting, by a computer processor, at least one vehicle moving in the region of interest (Refer to para [034]; “While the introduced techniques for object tracking are described in the context of an aerial vehicle such as the UAV 100 depicted in FIG. 1A, such techniques are not limited to this context. The described techniques may similarly be applied to detect, identify, and track objects using image capture devices mounted to other types of vehicles (e.g., fixed-wing aircraft, automobiles, watercraft, etc.), hand-held image capture devices (e.g., mobile devices with integrated cameras), or to stationary image capture devices (e.g., building mounted security cameras).”) for each image in the series of images, fitting, by the computer processor, a bounding box to the at least one moving vehicle (Refer to para [070]; “In some embodiments, the tracking system 140 may be configured to implement an algorithm that bounds the growth of uncertainty in the tracked objects location given this concept. In other words, when visual contact with a tracked object is lost at a particular position, the tracking system 140 can bound the uncertainty in the object's position to the last observed position and one or more possible escape paths given a last observed trajectory. A possible implementation of this concept may include generating, by the tracking system 140, an occupancy map that is carved out by stereo and the segmentations with a particle filter on possible escape paths.”) where the bounding box surrounds the at least one moving vehicle and the size of the bounding box is same across the series of images (Refer to para [071]; “In some embodiments, information regarding objects in the physical environment gathered and/or generated by a tracking system 140 can be utilized to generate and display “augmentations” to tracked objects, for example, via associated display devices. Devices configured for augmented reality (AR devices) can deliver to a user a direct or indirect view of a physical environment which includes objects that are augmented (or supplemented) by computer-generated sensory outputs such as sound, video, graphics, or any other data that may augment (or supplement) a user's perception of the physical environment. For example, data gathered or generated by a tracking system 140 regarding a tracked object in the physical environment can be displayed to a user in the form of graphical overlays via an AR device while the UAV 100 is in flight through the physical environment and actively tracking the object and/or as an augmentation to video recorded by the UAV 100 after the flight has completed.”) and determining, by the computer processor, kinematic data for the at least one moving vehicle using the bounding boxes (Refer to para [037]; “As images are received, the tracking system 140 may extract semantic information regarding certain objects captured in the images based on an analysis of the pixels in the images. Semantic information regarding a captured object can include information such as an object's category (i.e., class), location, shape, size, scale, pixel segmentation, orientation, inter-class appearance, activity, and pose. In an example embodiment, the tracking system 140 may identify general locations and categories of objects based on captured images and then determine or infer additional more detailed information about individual instances of objects based on further processing.”) where the kinematic data includes yaw angle for the at least one moving vehicle (Refer to para [026]; “For example, a mechanical gimbal mechanism may handle adjustments in the pitch of the image capture device 115, while adjustments in the roll and yaw are accomplished digitally by transforming (e.g., rotating, panning, etc.) the captured images so as to effectively provide at least three degrees of freedom in the motion of the image capture device 115 relative to the UAV 100.”).
While Dasgupta discloses a “tracking system … configured to implement an algorithm that bounds the growth of uncertainty in the tracked objects location…”, Dasgupta does not expressly disclose a “bounding box surrounds at least one moving vehicle.”
Savvides teaches “a video monitoring method or system includes modules capable of determining motion changes in a set of video frames to find potential objects and define one or more bounding boxes around the potential objects.” More specifically, Savvides teaches “an identified region of interest in a bounding box has its contained potential object classified or identified using machine learning. This can include use of convolutional or recurrent neural networks. Bounding boxes 214 are created to surround potential objects. Instead of immediately classifying objects in the bounding boxes 214, a bounding box filtering module 216 is used to eliminate bounding boxes unlikely to surround objects of interest. The remaining bounding boxes can then have contained objects classified and/or identified in a filtered detection step 218.” (at para [009 and 023], Savvides).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify Dasgupta by adding an image processor for calculating “motion changes [that] can be determined using frame subtraction and/or morphological processing” as taught by Savvides.
The suggestion/motivation for combining the teachings of Dasgupta and Savvides would have been in order to provide “…a change detection module takes in a raw frame and produces bounding boxes corresponding to recent changes in the scene. These changes correspond to both valid moving objects and false detections or noise. In one embodiment, an object of interest segmentation algorithm can use a background differentiation approach in order to estimate new objects that have entered the scene. Such an algorithm utilizes the difference between consecutive frames to identify moving objects in the scene. This difference image is then thresholded to determine bounding boxes for potential objects. Since the algorithm does not need to model the background directly, it responds quickly to changes. The bounding box filtering module 216 performs filtering based on the bounding box properties to remove false detections and keep valid detections.” (at para [024 and 025], Savvides).
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Dasgupta and Savvides in order to obtain the specified claimed elements of Claim 1. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to the claim in question.
Regarding Claim 2: Dasgupta discloses capturing a series of images using an unmanned aerial vehicle, where the unmanned aerial vehicle is equipped with the camera (Refer to para [110]; “UAV system 1300 may also include one or more image capture devices 1334. Image capture devices 1334 may be the same as the image capture device 114/115 of UAV 100 described with respect to FIG. 1A. FIG. 13 shows an image capture device 1334 coupled to an image capture controller 1332 in I/O subsystem 1360. The image capture device 1334 may include one or more optical sensors. For example, image capture device 1334 may include a charge-coupled device (CCD) or complementary metal-oxide semiconductor (CMOS) phototransistors. The optical sensors of image capture devices 1334 receive light from the environment, projected through one or more lens (the combination of an optical sensor and lens can be referred to as a “camera”) and converts the light to data representing an image. In conjunction with an imaging module located in memory 1316, the image capture device 1334 may capture images (including still images and/or video). In some embodiments, an image capture device 1334 may include a single fixed camera. In other embodiments, an image capture device 1340 may include a single adjustable camera (adjustable using a gimbal mechanism with one or more axes of motion). In some embodiments, an image capture device 1334 may include a camera with a wide-angle lens providing a wider FOV. In some embodiments, an image capture device 1334 may include an array of multiple cameras providing up to a full 360 degree view in all directions. In some embodiments, an image capture device 1334 may include two or more cameras (of any type as described herein) placed next to each other in order to provide stereoscopic vision. In some embodiments, an image capture device 1334 may include multiple cameras of any combination as described above. In some embodiments, the cameras of an image capture device 1334 may be arranged such that at least two cameras are provided with overlapping FOV at multiple angles around the UAV 100, thereby allowing for stereoscopic (i.e., 3D) image/video capture and depth recovery (e.g., through computer vision algorithms) at multiple angles around UAV 100. For example, UAV 100 may include four sets of two cameras each positioned so as to provide a stereoscopic view at multiple angles around the UAV 100. In some embodiments, a UAV 100 may include some cameras dedicated for image capture of a subject and other cameras dedicated for image capture for visual navigation (e.g., through visual inertial odometry).”).
Claims 8 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Symington (US 20230195970 A1) in combination with Savvides (US 20190057588 A1).
Regarding Claim 8: Symington discloses a method for detecting a vehicle passing through a region of interest (Refer to para [032]; “Perception stack 412 can detect and classify objects and determine their current and predicted locations, speeds, directions, and the like..”) comprising: capturing, by a top view camera, a first set of images of a region of interest on the ground from a perspective above the region of interest (Refer to para [033]; “Mapping and localization stack 414 can determine the AV's position and orientation (pose) using different methods from multiple systems (e.g., GPS, IMUs, cameras, LIDAR, RADAR, ultrasonic sensors, the HD geospatial database 422, etc.). For example, in some embodiments, the AV 402 can compare sensor data captured in real-time by the sensor systems 404-408 to data in the HD geospatial database 422 to determine its precise (e.g., accurate to the order of a few centimeters or less) position and orientation.”) from the first set of images (Refer to para [038]; “In some embodiments, the raw AV data can include HD LIDAR point cloud data, image data, RADAR data, GPS data, and other sensor data that the data center 450 can use for creating or updating AV geospatial data as discussed further below with respect to FIG. 2 and elsewhere in the present disclosure.”) creating a plurality of bounding boxes for each vehicle moving in the region of interest (Refer to para [037]; “In some embodiments, the HD maps and related data can comprise multiple layers, such as an areas layer, a lanes and boundaries layer, an intersections layer, a traffic controls layer, and so forth. The areas layer can include geospatial information indicating geographic areas that are drivable (e.g., roads, parking areas, shoulders, etc.) or not drivable (e.g., medians, sidewalks, buildings, etc.), drivable areas that constitute links or connections (e.g., drivable areas that form the same road) versus intersections (e.g., drivable areas where two or more roads intersect), and so on. The lanes and boundaries layer can include geospatial information of road lanes (e.g., lane centerline, lane boundaries, type of lane boundaries, etc.) and related attributes (e.g., direction of travel, speed limit, lane type, etc.).”) and extracting kinematic data for each vehicle moving in the region of interest using the plurality of bounding boxes (Refer to para [026]; “Because localization and control system measurements are typically more accurate than kinematic predictions, the predicted kinematics of the remote vehicle can be validated using the one or more ground-truth kinematic characteristics of the remote vehicle (step 308). In some implementations, validating the predicted kinematic characteristics can include comparing the predicted kinematic characteristics to the ground-truth kinematic characteristics, e.g., to calculate an error associated with the one or more predicted kinematic characteristics of the remote vehicle, and updating the kinematics model based on the error associated with the one or more predicted kinematic characteristics. By way of example, the error (e.g., the difference between the predicted and ground-truth kinematic characteristics) may be back propagated to one or more layers and/or weights of a machine-learning model. In this manner, the kinematics model can be updated to continually learn and thereby improve kinematic estimation accuracy.”) capturing, by a side view camera, a second set of images of the region of interest from a perspective on side of the region of interest (Refer to para [046]; “The data management platform 452 can receive LIDAR point cloud data, image data (e.g., still image, video, etc.), RADAR data, GPS data, and other sensor data (e.g., raw data) from one or more AVs 402, UAVs, satellites, third-party mapping services, and other sources of geospatially referenced data. The raw data can be processed, and map management system platform 462 can render base representations (e.g., tiles (2D), bounding volumes (3D), etc.) of the AV geospatial data to enable users to view, query, label, edit, and otherwise interact with the data.”) projecting the plurality of bounding boxes to viewpoint of the side view camera (Refer to para [032]; “Perception stack 412 can enable the AV 402 to “see” (e.g., via cameras, LIDAR sensors, infrared sensors, etc.), “hear” (e.g., via microphones, ultrasonic sensors, RADAR, etc.), and “feel” (e.g., pressure sensors, force sensors, impact sensors, etc.) its environment using information from the sensor systems 404-408, the mapping and localization stack 414, the HD geospatial database 422, other components of the AV, and other data sources (e.g., the data center 450, the client computing device 470, third-party data sources, etc.).”) and training a machine learning algorithm to detect moving vehicles in images captured by the side view camera (Refer to para [042]; “The AI/ML platform 454 can provide the infrastructure for training and evaluating machine learning algorithms for operating the AV 402, the simulation platform 456, the remote assistance platform 458, the ridesharing platform 460, the map management system platform 462, and other platforms and systems. Using the AI/ML platform 454, data scientists can prepare data sets from the data management platform 452; select, design, and train machine learning models; evaluate, refine, and deploy the models; maintain, monitor, and retrain the models; and so on.”) where the machine learning algorithm is trained using the second set of images and a ground truth, and the kinematic data and the plurality of bounding boxes projected to the viewpoint of the side view camera the serves as the ground truth (Refer to para [026]; “In some implementations, validating the predicted kinematic characteristics can include comparing the predicted kinematic characteristics to the ground-truth kinematic characteristics, e.g., to calculate an error associated with the one or more predicted kinematic characteristics of the remote vehicle, and updating the kinematics model based on the error associated with the one or more predicted kinematic characteristics. By way of example, the error (e.g., the difference between the predicted and ground-truth kinematic characteristics) may be back propagated to one or more layers and/or weights of a machine-learning model. In this manner, the kinematics model can be updated to continually learn and thereby improve kinematic estimation accuracy.”).
While Symington discloses “… maps and related data can comprise multiple layers, such as an areas layer, a lanes and boundaries layer, an intersections layer, a traffic controls layer, and so forth. The areas layer can include geospatial information indicating geographic areas that are drivable…”, Symington does not expressly disclose a “bounding box surrounds at least one moving vehicle.”
Savvides teaches “a video monitoring method or system includes modules capable of determining motion changes in a set of video frames to find potential objects and define one or more bounding boxes around the potential objects.” More specifically, Savvides teaches “an identified region of interest in a bounding box has its contained potential object classified or identified using machine learning. This can include use of convolutional or recurrent neural networks. Bounding boxes 214 are created to surround potential objects. Instead of immediately classifying objects in the bounding boxes 214, a bounding box filtering module 216 is used to eliminate bounding boxes unlikely to surround objects of interest. The remaining bounding boxes can then have contained objects classified and/or identified in a filtered detection step 218.” (at para [009 and 023], Savvides).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify Symington by adding an image processor for calculating “motion changes [that] can be determined using frame subtraction and/or morphological processing” as taught by Savvides.
The suggestion/motivation for combining the teachings of Symington and Savvides would have been in order to provide “…a change detection module takes in a raw frame and produces bounding boxes corresponding to recent changes in the scene. These changes correspond to both valid moving objects and false detections or noise. In one embodiment, an object of interest segmentation algorithm can use a background differentiation approach in order to estimate new objects that have entered the scene. Such an algorithm utilizes the difference between consecutive frames to identify moving objects in the scene. This difference image is then thresholded to determine bounding boxes for potential objects. Since the algorithm does not need to model the background directly, it responds quickly to changes. The bounding box filtering module 216 performs filtering based on the bounding box properties to remove false detections and keep valid detections.” (at para [024 and 025], Savvides).
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Symington and Savvides in order to obtain the specified claimed elements of Claim 8. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to the claim in question.
Regarding Claim 9: Symington discloses capturing the first set of images using an unmanned aerial vehicle, where the unmanned aerial vehicle is equipped with the top view camera (Refer to para [039]; “In some embodiments, use in either a manually operated or autonomous vehicle is possible. This can include, but is not limited to, use in conjunction with automobiles, commercial trucks, ships, airplanes, or aerial drones. Use with teleoperated or autonomous robots is also possible.”).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Symington (US 20230195970 A1) in combination with Savvides (US 20190057588 A1) and further in view of Fasola (US 20240017731 A1).
Regarding Claim 14: Symington in combination with Savvides discloses all the claimed elements as rejected. Symington in combination with Savvides does not expressly disclose homography localization calculations.
Fasola teaches “… Environment 100 includes an autonomous vehicle (AV) 105 (in different positions, 105A, 105B, 105C, and 105D) that can collect sensor data, some of which is based on various targets in environment 100.”
Fasola teaches “The data management platform can receive LIDAR point cloud data, image data (e.g., still image, video, etc.), RADAR data, GPS data, and other sensor data (e.g., raw data) from one or more AVs 402, Unmanned Aerial Vehicles (UAVs), satellites, third-party mapping services, and other sources of geospatially referenced data. The raw data can be processed, and map management platform 462 can render base representations (e.g., tiles (2D), bounding volumes (3D), etc.) of the AV geospatial data to enable users to view, query, label, edit, and otherwise interact with the data…” for projecting the plurality of bounding boxes to the viewpoint of the side view camera using homography (Refer to para [025]; “In some examples, the camera signals checker 215E of signal cameras is initiated at the straight segment of the path of the calibration check process. The signal cameras may comprise a hardware filter specific for viewing traffic light bulbs. A dual traffic light target set may be detected a plurality of times during a straight drive portion. Homography may be computed between dual traffic light targets each of the times, such that a homography matrix is established to check relative pose error”).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to modify Symington and Savvides by adding an image processor capable to provide “the training techniques and safety score prediction model for an autonomous vehicle of the present technology solves at least these problems and provide other benefits as will be apparent from the figures and description provided herein…” as taught by Fasola as rejected above.
The suggestion/motivation for combining the teachings of Symington, Savvides and Fasola would have been in order to “detect and classify objects and determine their current and predicted locations, speeds, directions, and the like.” (at para [035], Fasola).
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Symington, Savvides and Fasola in order to obtain the specified claimed elements of Claim 14. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to the claim in question.
Allowable Subject Matter
Claims 3-7, 10-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The prior art either singly or in combination does not teach, disclose or suggest at least the following claim limitation(s): “…creating a plurality of bounding boxes further comprises overlaying a pre-defined image of a vehicle on a given detected vehicle; changing orientation of the pre-defined image in relation to the given detected vehicle; for each orientation, determining a correlation metric between the pre-defined image and the periphery of the given detected vehicle; and drawing a bounding box around the given detected vehicle based on the pre-defined image having the correlation metric with highest value.”
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Emmons (US 20230057509 A1)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MIA M THOMAS whose telephone number is (571)270-1583. The examiner can normally be reached M-Th 8:30am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen (Steve) Koziol can be reached at (408) 918-7630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
MIA M. THOMAS
Primary Examiner
Art Unit 2665
/MIA M THOMAS/Primary Examiner
Art Unit 2665