Last updated: May 29, 2026
Application No. 18/599,715
OBJECT DETECTION SYSTEM AND OBJECT DETECTION METHOD

Non-Final OA §103
Filed
Mar 08, 2024
Priority
Mar 23, 2023 — JP 2023-046155
Examiner
HAUSMANN, MICHELLE M
Art Unit
2671
Tech Center
2600 — Communications
Assignee
NEC Corporation
OA Round
1 (Non-Final)
Interview Optional

— +21.3% interview lift. Examiner has a relatively high allowance rate (76%); +21.3% interview lift. A written response may suffice.
Based on 870 resolved cases, 2023–2026
Examiner Intelligence

HAUSMANN, MICHELLE M View full profile →
Grants 76% — above average
Career Allowance Rate
663 granted / 870 resolved
+14.2% vs TC avg
Strong +21% interview lift
Without
With
+21.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
22 currently pending
Career history
895
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
94.8%
+54.8% vs TC avg
§102
0.6%
-39.4% vs TC avg
§112
0.9%
-39.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 870 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 5, 8, and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundaresan et al. (US 20190114804 A1) in view of Gowda et al. (US 20170254906 A1).

Regarding claims 1, 8, and 9, Sundaresan et al. disclose an object detection system to which frames are input continuously (“The neural network detection system 104, the strong object tracker 106, and the lightweight object tracker 108 can be applied at different frame rates in order to provide a robust and real-time tracking system. As used herein, the term “real-time” refers to tracking objects in a sequence of images as the sequence of images is being captured and output”, [0058], A video sequence, [0059]), comprising: a memory storing instructions ([0042]); and a processor ([0043]) configured to execute the instructions to implement, an object detection method applied to a computer ([0047]) and a non-transitory computer-readable recording medium in which an object detection program is recorded, wherein the object detection program is to be installed in a computer ([0042]) to execute: a first detection unit that identifies labels of objects reflected in a input frame and locations of bounding boxes of the objects (In some cases, the features of a face are extracted from an image and compared with features stored in a database in an attempt to recognize the face. In some cases, the extracted features are fed to a classifier and the classifier will give the identity of the input features, [0004], a bounding region for an object can be used as a region of interest for performing processing of the object, people, vehicle, or other object counting and classification can be greatly simplified based on the results of object detection and tracking, [0049], For a classification network, the deep learning system can classify an object in an image or video frame using the determined high-level features. The output can be a single class or category, a probability of classes that best describes the object, or other suitable output. For example, the output can include probability values indicating probabilities (or confidence levels or confidence values) that the object includes one or more classes of objects (e.g., a probability the object is a person, a probability the object is a dog, a probability the object is a cat, or the like), [0053], Locations of the tracker bounding boxes associated with the tracked objects are adjusted from one frame to another frame by the lightweight object tracker 108 and by the strong object tracker 106 based on the estimated movement of the objects, [0061], Any of the above-described tracking techniques, or other tracking techniques, can be used to determine the updated location of a bounding box for lag compensation, [0075], trained using training data that includes both images and labels, [0114]) [identity of the face, class output, probability of a person vs dog, interpreted as labels of objects]; a history information generation unit that assigns the same ID to the bounding boxes that share the same object (In some examples, the lightweight object tracker 108 can generate and output trackers for each object that is being tracked. Each tracker output by the lightweight object tracker 108 can be represented by a tracker bounding region and can be assigned a tracker identifier (ID), [0050], objects 1 and 2, [0059], Once initialized, the lightweight object tracker 108 tracks objects across frames of the video sequence by estimating locations of the objects from one frame to another. The strong object tracker 106 tracks objects in a frame of the video sequence when the object detection results are available from the neural network detection system 104. Locations of the tracker bounding boxes associated with the tracked objects are adjusted from one frame to another frame by the lightweight object tracker 108 and by the strong object tracker 106 based on the estimated movement of the objects, [0061], the lightweight object tracker 108 can track object 1 and object 2 in frame 8., when frame 8 is obtained for processing by the object detection and tracking system 102, the lightweight object tracker 108 predicts where object 1 and object 2 are in frame 8 based on the bounding boxes of objects 1 and 2 from frame 7, [0082]) and generates history information that is information indicating a history of combination of a frame number and a location of a bounding box for each ID (In some examples, a bounding box of a tracker for an object in a current frame can be based on the bounding box of the tracker for the object in a previous frame. For instance, when the tracker is updated in the previous frame, updated information for the tracker can include the tracking information for the previous frame and also a prediction of a location of the tracker in the next frame (which is the current frame in this example). The prediction of the location of the tracker in the current frame can be determined based on the location of the tracker in the previous frame and various techniques. A history can be maintained for a tracker. For example, the history maintained for a tracker can include positions of the bounding boxes that the tracker has been tracking, their reliability scores (e.g., correlation or other reliability metric), confidence values of the detection associated with each bounding box, feature points for each box, and/or any other suitable information, [0051], “In one illustrative example, the neural network detection thread 203 may take 15 frames (approximately 0.5 seconds in a 30 fps system) to complete object detection for a frame number 1 of a video sequence. In such an example, the detection results will not be available for frame number 1 until frame number 15 or 16 of the video sequence,” [0069], the first image may include a frame 10 of a video sequence, and the second image may include a frame 20 of the video sequence, [0140]) [frame number 15, 20, interpreted as frame number]; a prediction unit that predicts regions of the bounding boxes in latest frame, based on the history information, according to a delay that is a time required for the first detection unit to identify the labels and the locations of the bounding boxes in the input frame (As depicted in FIG. 3B, if the bounding box 302 result were used for tracking the subject's face in frame 91 (without performing lag compensation), a bounding box 304 is produced for tracking the face. As shown, the bounding box 302 is offset from the subject's actual face due to movement of the face between frame 88 and frame 91, which leads to a poor tracking result. By performing lag compensation to generate the bounding box 306, the bounding box 306 accurately tracks the subject's face. For example, lag compensation is performed to predict that the subject's face has moved in an upward and slightly leftward direction. Any of the above-described tracking techniques, or other tracking techniques, can be used to determine the updated location of a bounding box for lag compensation, [0075], classification can include a class identifying the type of object (e.g., a person, a dog, a cat, or other object) and the localization can include a bounding box indicating the location of the object, [0110], predicted output is the same as the training label, [0118]); and a second detection unit that identifies labels of reflected objects and locations of bounding boxes, in predicted regions of the bounding boxes in the latest frame (“In some examples, a bounding box of a tracker for an object in a current frame can be based on the bounding box of the tracker for the object in a previous frame. For instance, when the tracker is updated in the previous frame, updated information for the tracker can include the tracking information for the previous frame and also a prediction of a location of the tracker in the next frame (which is the current frame in this example). The prediction of the location of the tracker in the current frame can be determined based on the location of the tracker in the previous frame and various techniques”, [0051], “Continuing with the above example, once the detection results of object 1 and object 2 are available at frame 7, the detection results are passed to the strong object tracker 106. The strong object tracker 106 shifts the two bounding boxes for objects 1 and 2 to the estimated positions of the objects 1 and 2 in frame 7. For example, frame 1 and frame 7 are compared using a tracking technique (e.g., template matching, KCF, camshift, Kalman filter, or any other type of tracking technique) to determine a predicted position in frame 7 for the bounding boxes associated with object 1 and object 2. The shifted results are then used to initialize the lightweight object tracker 108. Once initialized, the lightweight object tracker 108 has two objects to track from frame 7 onward, until new object detection results are available. In some cases, when the lightweight object tracker 108 is initialized, the thumbnails of the shifted bounding boxes in frame 7 are captured and saved (e.g., stored in memory). A thumbnail captured for an object in a frame for which object detection is performed is referred to herein as a “detection thumbnail.” As described in more detail below, the detection thumbnails can be used in the tracking reliability evaluation performed at block 408. For example, a thumbnail captured for an object in subsequent frame in which the object is tracked (using the lightweight object tracker 108) can be compared to a detection thumbnail captured for the object in a frame for which object detection was performed”, [0080], the strong object tracker 106 can predict where a bounding box for object 3 detected in the previous frame 8 should be located in the current frame 15, [0089]).

Sundaresan et al. do not disclose processing time for the second detection unit to identify the labels and the locations of the bounding boxes for one frame is shorter than processing time for the first detection unit to identify the labels and the locations of the bounding boxes for the one frame.

Gowda et al. teach processing time for the second detection unit to identify the labels and the locations of the bounding boxes for one frame is shorter than processing time for the first detection unit to identify the labels and the locations of the bounding boxes for the one frame (“Known rotation can be used to “predict” the location of the feature in the second image from the first, whereas the rotation corresponds to a visual re-projection of the original image. Any feature detected in the first image may be searched for in the second image, using the re-projected coordinate as an origin of the search. Instead of searching arbitrarily throughout the second image, we may first look at the predicted location, expanding the search boundary from there (e.g., in concentric circles or progressively-larger bounding boxes). In this way, the search should complete substantially faster than arbitrary matching, where the feature might appear anywhere in the second image. It will also be faster than searching from the same 2D location in the second location as it appears in the first, and rotation might substantially move the correct location in the frame”, [0036]).

Sundaresan et al. and Gowda et al. are in the same art of object detection and tracking (Sundaresan et al., abstract; Gowda et al., [0035], [0036]). The combination of Gowda et al. with Sundaresan et al. shows how the smaller search region can lead to a faster detection time. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the identification process of Gowda et al. with the invention of Sundaresan et al. as this was known at the time of filing, the combination would have predictable results, and as Gowda et al. indicate “The camera pose/inter-frame rotation is used as input to fundamental computer vision heuristics. By reducing the computer vision search space, the computational complexity is reduced substantially” ([0035]) and “In this way, the search should complete substantially faster than arbitrary matching, where the feature might appear anywhere in the second image. It will also be faster than searching from the same 2D location in the second location as it appears in the first, and rotation might substantially move the correct location in the frame” ([0036]) thereby demonstrating an improvement to computational efficiency to the invention of Sundaresan et al. and will improve applicability in fields such as drone deployment where computer weight is an issue (Gowda et al., [0003], [0004]).

Regarding claim 2, Sundaresan et al. and Gowda et al. disclose the object detection system according to claim 1. Sundaresan et al. further indicate the processor is further configured to execute the instructions to implement: a delay measurement unit that measures magnitude of the delay (“The neural network detection thread 203 may take a different amount of time (corresponding to a certain number of frames) for different images based on how many objects are in the image and how complex the scene depicted in the image is. In one illustrative example, the neural network detection thread 203 may take 15 frames (approximately 0.5 seconds in a 30 fps system) to complete object detection for a frame number 1 of a video sequence. In such an example, the detection results will not be available for frame number 1 until frame number 15 or 16 of the video sequence. Based on the 15 frame delay, the neural network detection thread 203 will not be available for frames 2-15 of the video sequence, and cannot be used again until frame 16. If the neural network detection thread 203 is not free for a current frame, at block 208, the camera thread 201 sends the current frame for tracking to the lightweight tracker thread 205. The camera thread stops at block 210 after sending a current frame to the lightweight tracker thread 205”, [0069], At block 230, the strong tracker thread 207 begins the tracker update to update the trackers that will be used by the lightweight tracker thread 205. At block 232, the strong object tracker 106 is applied to a current frame (e.g., frame 15 from the example above). The strong object tracker 106 can perform lag compensation to compensate for the movement of an object from a first frame (for which the neural network based object detection is applied) to a second frame (at which the results of the neural network detection are available). During the period of detection delay, the one or more objects detected for a given frame may move positions by the time the object detection results are available., [0072], As noted above, there is a delay from when the neural network detection system 104 starts performing object detection for a current frame to when the results of the object detection for the current frame are available. In one illustrative example that will be used to describe the process 400A, a first frame of a video sequence includes three objects, denoted herein as object 1, object 2, and object 3. When run on the first frame, the neural network detection system 104 detects two of the three objects. For example, objects 1 and 2 two are detected, but not object 3. As previously described, various conditions can prevent the neural network detection system 104 from detecting object 3, such as occlusion, lighting conditions, movement of the object, the size of the object, a combination thereof, and/or other suitable conditions. Based on the complexity of the neural network detection system 104, the object detection results for frame 1 are not available until frame 7. One of ordinary skill will appreciate that the neural network detection system 104 may take a different amount of time to perform detection for different images based on how many objects are in the image and the complexity of the scenes in the images (e.g., lighting, object movement, shadows, occlusions, among other conditions), [0079], For example, due to the lag in performing object detection by the trained network, the results of object tracking on the first image may not be available until the second image is obtained. In one illustrative example, the first image may include a frame 10 of a video sequence, and the second image may include a frame 20 of the video sequence, [0140]).

Regarding claim 5, Sundaresan et al. and Gowda et al. disclose the object detection system according to claim 1. Sundaresan et al. further indicate the prediction unit predicts the regions of the bounding boxes in the latest frame by Kalman filter based on the history information (“A Kalman filter based object tracker uses signal processing to predict the location of a moving object based on prior motion information. For example, the location of a tracker in a current frame can be predicted based on information from a previous frame. In some cases, the Kalman filter can measure a tracker's trajectory as well as predict its future location(s). For example, the Kalman filter framework can include two steps. The first step is to predict a tracker's state, and the second step is to use measurements to correct or update the state. In this case, the tracker from the last frame can predict its location in the current frame. When the current frame is received, the tracker can use the measurement of the object in the current frame to correct its location in the current frame, and then can predict its location in the next frame. The Kalman filter can rely on the measurement of the associated object(s) to correct the motion model for the object tracker and to predict the location of the tracker in the next frame”, [0066], “The strong object tracker 106 can take the bounding box 302 for the face detected in the previous frame 88 and can predict where that bounding box should be in the current frame 91. The strong object tracker 106 can use any suitable object tracking technique to determine an amount by which to shift the bounding box 302 based on the previous frame 88 for which detection was performed and the current frame 91 at which detection results are available. For example, the prediction can be performed using optical flow, KCF, camshift, template matching, Kalman filter, or any other type of tracking technique,” [0074], “Continuing with the above example, once the detection results of object 1 and object 2 are available at frame 7, the detection results are passed to the strong object tracker 106. The strong object tracker 106 shifts the two bounding boxes for objects 1 and 2 to the estimated positions of the objects 1 and 2 in frame 7. For example, frame 1 and frame 7 are compared using a tracking technique (e.g., template matching, KCF, camshift, Kalman filter, or any other type of tracking technique) to determine a predicted position in frame 7 for the bounding boxes associated with object 1 and object 2. The shifted results are then used to initialize the lightweight object tracker 108”, [0080]).

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundaresan et al. (US 20190114804 A1) and Gowda et al. (US 20170254906 A1) as applied to claim 2 above, further in view of Hong et al. (US 20230333557 A1).

Regarding claim 3, Sundaresan et al. and Gowda et al. disclose the object detection system according to claim 2. Sundaresan et al. further indicate the delay measurement unit measures a difference between the frame number of the latest frame and the frame number of the input frame in which the first detection unit identifies the labels of the objects and the locations of the bounding boxes, as the magnitude of the delay, and the prediction unit predicts the regions of the bounding boxes in the latest frame according to the difference ( “In one illustrative example, the neural network detection thread 203 may take 15 frames (approximately 0.5 seconds in a 30 fps system) to complete object detection for a frame number 1 of a video sequence. In such an example, the detection results will not be available for frame number 1 until frame number 15 or 16 of the video sequence,” [0069], classification can include a class identifying the type of object e.g., a person, a dog, a cat, and the localization can include a bounding box indicating the location of the object, [0110], the first image may include a frame 10 of a video sequence, and the second image may include a frame 20 of the video sequence, [0140]). 

To the extent the word “difference” is not used, another reference is added here.

Hong et al. teach the delay measurement unit measures a difference between the frame number of the latest frame and the frame number of the input frame in which the first detection unit identifies the labels of the objects and the locations of the bounding boxes, as the magnitude of the delay, and the prediction unit predicts the regions of the bounding boxes in the latest frame according to the difference (The type of object may indicate whether the object is one of candidates including a car, a bicycle, a person, a dog, or a cat, [0059], [0080], Meanwhile, a pairing procedure may be performed in consideration of transmission delay or signaling latency of the pairing information. For example, object information and/or object map (feature map) may be modified/updated in consideration of communication delay and/or signal processing delay between the first autonomous driving device and the second autonomous driving device. In this case, the object information and/or the object map may be modified/updated based on at least one of the delay time and the position, shape (size), speed and/or acceleration of the object detected in the second object information. For example, there is a case where a dt (ex. 1 ms to 50 ms) time delay occurs for transmission and/or processing of pairing information for a specific point in time t, and information on a specific object is included in the second object information. In this case, the position/shape of the specific object at the time t+dt may be calculated based on the position, velocity and dt of the specific object, and based on the position/shape of the specific object at the time t+dt, the modified/updated object information for the first autonomous driving device may be derived. The dt may correspond to a latency to be described later, [0135], Referring to FIG. 16 and FIG. 17, for example, the communication latency is L (ex. 5 ms to 50 ms), and the autonomous driving device may obtain a surrounding image in frame units of frame/sec, such as M (ex. 30, 60, 120, 240). For example, when a surrounding image is acquired/processed at 30 frames/sec, a time difference per frame (tpf) may be 33.3 ms. In this case, based on the variable L, if L is less than or equal to th1, the delay processing time may be calculated as a difference of 1 frame, and if L is greater than th1 and less than or equal to th2, the delay processing time may be calculated as a difference of 2 frames, [0140], “In this case, the location of the specific object after a certain point in time can be derived/predicted based on the location, speed (and acceleration) of the specific object and the delay processing time. The delay processing time may be described as ntpf, for example. The delay processing time may simply be referred to as time delay dt”, [0142]).

Sundaresan et al. and Gowda et al. and Hong et al. are in the same art of object detection and tracking (Sundaresan et al., abstract; Gowda et al., [0035], [0036]; Hong et al., abstract). The combination of Hong et al. with Sundaresan et al. and Gowda et al. incorporates a frame difference. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the frame difference of Hong et al. with the invention of Sundaresan et al. and Gowda et al. as this was known at the time of filing, the combination would have predictable results, and as Hong et al. indicate “Based on this, it is possible to increase the accuracy of object identification in the blind area” (abstract) “Another technical object of the present disclosure is to provide a method and apparatus for further increasing object identification accuracy while maintaining or reducing complexity” ([0010]) thereby providing an accuracy improvement to the combination of inventions.

Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundaresan et al. (US 20190114804 A1) and Gowda et al. (US 20170254906 A1) as applied to claim 1 above, further in view of Banerjee et al. (US 20220067417 A1).

Regarding claim 4, Sundaresan et al. and Gowda et al. disclose the object detection system according to claim 1. Sundaresan et al. partly indicate the prediction unit predicts the regions of the bounding boxes in the latest frame by linear prediction based on the history information (“A Kalman filter based object tracker uses signal processing to predict the location of a moving object based on prior motion information. For example, the location of a tracker in a current frame can be predicted based on information from a previous frame. In some cases, the Kalman filter can measure a tracker's trajectory as well as predict its future location(s). For example, the Kalman filter framework can include two steps. The first step is to predict a tracker's state, and the second step is to use measurements to correct or update the state. In this case, the tracker from the last frame can predict its location in the current frame. When the current frame is received, the tracker can use the measurement of the object in the current frame to correct its location in the current frame, and then can predict its location in the next frame. The Kalman filter can rely on the measurement of the associated object(s) to correct the motion model for the object tracker and to predict the location of the tracker in the next frame”, [0066], “The strong object tracker 106 can take the bounding box 302 for the face detected in the previous frame 88 and can predict where that bounding box should be in the current frame 91. The strong object tracker 106 can use any suitable object tracking technique to determine an amount by which to shift the bounding box 302 based on the previous frame 88 for which detection was performed and the current frame 91 at which detection results are available. For example, the prediction can be performed using optical flow, KCF, camshift, template matching, Kalman filter, or any other type of tracking technique”, [0074]) however does not use the word linear, therefore another reference is cited below.

Banerjee et al. teach the prediction unit predicts the regions of the bounding boxes in the latest frame by linear prediction based on the history information (“The event tracker uses a linear motion model to predict the bounding box location in the event frame e.sub.t.sup.l based on the state at time t−1+(N−1)/N. The observations are the bounding boxes detected by the event object detector at time t with the association of the observed and predicted bounding boxes done as described herein. The Kalman filter predicts the location of the bounding boxes, custom-character.sub.t+1.sup.e at time t+1/N. At time t+1/N, the observations (bounding boxes) are available from the event object detector to update the state of the event tracker. This operation of predict and update is repeated for N−1 times in between time t and t+1 before finally predicting the bounding boxes custom-character.sub.t+1.sup.e at time t+1. FIG. 7 shows predict/update operations performed by an object tracker to predict bounding boxes in accordance with an illustrative embodiment”, [0084], “The RoIs are fed into a Kalman Filter-based object tracker as an observation, which updates the state of the filter. The Kalman Filter then predicts the locations of the next RoIs for the next frame at time t+1, based on a linear motion model, denoted as bb.sub.t+1. Here, {custom-characterb.sub.t, bb.sub.t+1}∈.sup.4×P (P is the number of bounding boxes detected). These predicted RoIs for frame at t+1 are sent back to the chip. A copy of the distorted reconstructed frame {circumflex over (f)}.sub.t is kept in the host for creating the reconstructed frame f.sub.t+1 at time t+1”, [0133], “The object tracker uses a linear motion model to predict the bounding box locations in the next frame f.sub.t+1. It then associates the identities using linear assignment between the new detections from Faster R-CNN and the most recently predicted bounding boxes”, [0135]).

Sundaresan et al. and Gowda et al. and Banerjee et al. are in the same art of object detection and tracking (Sundaresan et al., abstract; Gowda et al., [0035], [0036]; Banerjee et al., abstract). The combination of Banerjee et al. with Sundaresan et al. and Gowda et al. incorporates linear prediction. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the linear prediction of Banerjee et al. with the invention of Sundaresan et al. and Gowda et al. as this was known at the time of filing, the combination would have predictable results, and as Banerjee et al. indicate “Compared to traditional cameras, event sensing provides several benefits such as low latency operation of individual pixels, high dynamic range, reduced redundant capturing of static scenes, and low power consumption” ([0003]) “Regardless, the Kalman Filter predicts RoIs for the next frame based on these imperfect detections. The chip then acquires the next frame based on these imperfections and sends them to the host. Portions of the object inside the RoI will be less distorted and portions outside the RoI will be highly distorted as per the weight ratio” ([0146]) demonstrating the combination of inventions should have high performance even with input data containing distortion, and have low power requirements, indicating a commercial benefit.
Claim(s) 6-7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundaresan et al. (US 20190114804 A1) and Gowda et al. (US 20170254906 A1) as applied to claim 1 above, further in view of Miller et al. (US 20210046927 A1).

Regarding claim 6, Sundaresan et al. and Gowda et al. disclose the object detection system according to claim 1. Sundaresan et al. and Gowda et al. do not disclose the processor is further configured to execute the instructions to implement: an ordering unit that performs ordering on the regions of the bounding boxes predicted by the prediction unit, and wherein the second detection unit selects a region of a bounding box in order determined by the ordering unit, and identifies a label of an object reflected and a location of the object in the region.

Miller et al. teach an ordering unit that performs ordering on the regions of the bounding boxes predicted by the prediction unit, and wherein the second detection unit selects a region of a bounding box in order determined by the ordering unit (“Computation time and resources for mitigating collisions can be costly (i.e., can consume significant processor time and/or memory) for vehicle computers. Prioritizing target vehicles that are mostly likely to collide with a host vehicle can reduce overall computational costs for the host vehicle computer. Using dimensions of the target vehicles, such as a length and a width of the target vehicles, allows for the host vehicle to specify distance thresholds around the target vehicles to determine whether to perform threat assessments. The distance thresholds form a bounding box around the target vehicle, and the host vehicle computer can select the target vehicle for threat assessment when a portion of the host vehicle is within the bounding box. Using a relative heading angle between the host vehicle and the target vehicles, the host vehicle computer can determine respective lateral and longitudinal distances between a coordinate system of the host vehicle and a coordinate system of the target vehicle. The host vehicle computer can use these lateral and longitudinal distances to determine whether to select the target vehicle for threat assessment. By predicting the distances based on predicted heading angles, the host vehicle computer can determine whether, for a specified future time, the computer should select the target vehicle for threat assessment”, [0029]), and identifies a label of an object reflected and a location of the object in the region (“Next, in a block 415, the computer 105 identifies one or more points P on the host vehicle 101. As described above, when the distance from the point P of the host vehicle 101 is within a distance threshold of the target vehicle 200, the host vehicle 101 may collide with the target vehicle 200 and the computer 105 can perform a threat assessment for the target vehicle 200. The point P can be, e.g., a corner point, a center point of a front bumper, etc. The computer 105 can identify a plurality of points P, as described above and shown in FIG. 2”, [0057]).

Sundaresan et al. and Gowda et al. and Miller et al. are in the same art of object detection and tracking (Sundaresan et al., abstract; Gowda et al., [0035], [0036]; Miller et al., [0038]). The combination of Miller et al. with Sundaresan et al. and Gowda et al. incorporates ordering. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the ordering of Miller et al. with the invention of Sundaresan et al. and Gowda et al. as this was known at the time of filing, the combination would have predictable results, and as Miller et al. indicate “Computation time and resources for mitigating collisions can be costly (i.e., can consume significant processor time and/or memory) for vehicle computers. Prioritizing target vehicles that are mostly likely to collide with a host vehicle can reduce overall computational costs for the host vehicle computer” ([0029]) and “The computer 105 can perform a collision avoidance action based on the threat number. In this context, a “collision avoidance action” is an action including actuation of one or more components 120 to avoid and/or mitigate a collision with the target vehicle 200. The collision avoidance action can include, e.g., providing a forward collision warning on a display screen in an interior of the vehicle 101, actuating a brake 120 to stop the host vehicle 101, actuating a propulsion 120 to avoid the target vehicle 200, actuating a steering component 120 to avoid the target vehicle 200, etc., as described above” ([0053]) suggesting a computational efficiency and user safety benefit to the combination of inventions.

Regarding claim 7, Sundaresan et al. and Gowda et al. and Miller et al. disclose the object detection system according to claim 1. Miller et al. further teach the ordering unit determines order of the regions of the bounding boxes that meet a condition that distance between the predicted regions of two bounding boxes is equal to or less than a predetermined threshold and direction of movement of the two bounding boxes is facing each other, is earlier than order of the regions of the bounding boxes that does not meet the condition (“Computation time and resources for mitigating collisions can be costly (i.e., can consume significant processor time and/or memory) for vehicle computers. Prioritizing target vehicles that are mostly likely to collide with a host vehicle can reduce overall computational costs for the host vehicle computer. Using dimensions of the target vehicles, such as a length and a width of the target vehicles, allows for the host vehicle to specify distance thresholds around the target vehicles to determine whether to perform threat assessments. The distance thresholds form a bounding box around the target vehicle, and the host vehicle computer can select the target vehicle for threat assessment when a portion of the host vehicle is within the bounding box. Using a relative heading angle between the host vehicle and the target vehicles, the host vehicle computer can determine respective lateral and longitudinal distances between a coordinate system of the host vehicle and a coordinate system of the target vehicle. The host vehicle computer can use these lateral and longitudinal distances to determine whether to select the target vehicle for threat assessment. By predicting the distances based on predicted heading angles, the host vehicle computer can determine whether, for a specified future time, the computer should select the target vehicle for threat assessment”, [0029]).

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent M Rudolph can be reached at (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MICHELLE M ENTEZARI HAUSMANN/Primary Examiner, Art Unit 2671
Read full office action
Prosecution Timeline

Mar 08, 2024
Application Filed
Mar 26, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/267,598
Patent 12638400
Method for monitoring and/or controlling phase separation in chemical processes and samples
2y 11m to grant Granted May 26, 2026
18/348,495
Patent 12639803
SYSTEMS AND METHODS FOR MATERIAL ACCRETION DETECTION AND REMOVAL
2y 10m to grant Granted May 26, 2026
18/136,006
Patent 12629121
METHOD OF DETERMINING VESSEL FLUID FLOW VELOCITY
3y 1m to grant Granted May 19, 2026
18/034,833
Patent 12626375
HOMOGRAPHY MATRIX GENERATION APPARATUS, CONTROL METHOD, AND COMPUTER-READABLE MEDIUM
3y 0m to grant Granted May 12, 2026
18/179,635
Patent 12620252
INFORMATION SOURCE DETECTION USING UNIQUE WATERMARKS
3y 2m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
98%
With Interview (+21.3%)
3y 0m (~9m remaining)
Median Time to Grant
Low
PTA Risk
Based on 870 resolved cases by this examiner. Grant probability derived from career allowance rate.