Last updated: April 19, 2026
Application No. 18/523,789
OBJECT DETECTION SYSTEM

Final Rejection §102
Filed
Nov 29, 2023
Examiner
PICON-FELICIANO, RUBEN
Art Unit
3747
Tech Center
3700 — Mechanical Engineering & Manufacturing
Assignee
Zimeno Inc.
OA Round
2 (Final)
Interview Optional

— +13.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 708 resolved cases, 2023–2026
Examiner Intelligence

PICON-FELICIANO, RUBEN View full profile →
Grants 68% — above average
Career Allow Rate
483 granted / 708 resolved
-1.8% vs TC avg
Moderate +13% lift
Without
With
+13.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
61 currently pending
Career history
769
Total Applications
across all art units
Statute-Specific Performance

§101
1.0%
-39.0% vs TC avg
§103
46.3%
+6.3% vs TC avg
§102
37.2%
-2.8% vs TC avg
§112
13.0%
-27.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 708 resolved cases
Office Action

§102
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

2.	This Office Action is sent in response to Applicant's Communication received on July 09, 2025.

Response to Arguments
       Applicant’s arguments/remarks filed July 09, 2025, with respect to claims 9-20 rejections under 35 U.S.C. 112(b) have been fully considered and are persuasive. Accordingly, said claims 9-20 rejections under 35 U.S.C. 112(b) have been withdrawn.

       Applicant’s arguments/remarks filed July 09, 2025, with respect to claims 2, 4-7, 11-12 and 16-19 rejections under 35 U.S.C. 102(a)(1) as being anticipated by (Zhang – US 2022/0237414 A1) have been fully considered and are persuasive. Accordingly, said claims 2, 4-7, 11-12 and 16-19 rejections under 35 U.S.C. 102(a)(1) as being anticipated by (Zhang – US 2022/0237414 A1) have been withdrawn.

Further on, claims 2, 4-7, 11-12 and 16-19 are indicated below as allowable subject matter.

       Applicant's arguments/remarks filed July 09, 2025 with respect to claims 1, 3, 8-10, 13-15 and 20 rejections under 35 U.S.C. 102(a)(1) as being anticipated by (Zhang – US 2022/0237414 A1) have been fully considered but they are not persuasive as explained below.

Applicant respectfully asserts that cited prior art fails to meet the limitations “…in response to the confidence value exceeding a predetermined threshold, estimate a location of a second bounding box in the second image frame based on the location of the first bounding box and non-zero movement of the vehicle; update the object detection model based on the estimated location of the second bounding box in the second image frame; predict a location of a third bounding box in the third image frame using the updated object detection model…” as required in at least independent claims 1, 9 and 13.

The Examiner respectfully submits the following:

Zhang discloses apply a confidence value to the predicted location of first bounding box ([0058, 0206]: “In at least one embodiment, a system assigns confidence values to an output of an image segmentation task in which confidence is computed as part of inferencing. In at least one embodiment, during inferencing, numerical values for each of multiple possible classes are computed. In at least one embodiment, numerical values are then used to indicate confidence in classifications. In at least one embodiment, systems such as object detection and collision in an autonomous vehicle can then use an image and confidence values together to determine more confidently objects in an image” and “a neural network that outputs a measure of confidence for each object detection. In at least one embodiment, confidence may be represented or interpreted as a probability, or as providing a relative “weight” of each detection compared to other detections. In at least one embodiment, a confidence measure enables a system to make further decisions regarding which detections should be considered as true positive detections rather than false positive detections. In at least one embodiment, a system may set a threshold value for confidence and consider only detections exceeding threshold value as true positive detections”), in response to the confidence level exceeding a predetermined threshold ([0099]: “In at least one embodiment, one or more systems define a threshold that indicates whether a confidence value is acceptable, in which values lower than said threshold are unacceptable, and values higher than or equal to said threshold are acceptable. In at least one embodiment, for example, a threshold is defined as a value of 0.5, in which a confidence value of 0.6 is determined to be acceptable, and a confidence value of 0.4 is determined to be unacceptable”), estimate a location of a second bounding box in the second image frame based on the location of the first bounding box and non-zero movement of the vehicle (autonomous vehicle 1100: Fig. 11A and [0057, 0161, 0204]) {[0087, 0167, 0213, 0237]: “In at least one embodiment, stereo disparity algorithms refer to algorithms that determine apparent pixel differences or motions between stereo images, in which confidence information is usable to determine a reliability or quality of determined apparent pixel differences or motions” and “For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weights of information provided by adjacent frames. In at least one embodiment, where an image or portion of an image does not include motion, temporal noise reduction performed by video image compositor may use information from a previous image to reduce noise in a current image” and “In at least one embodiment, outputs may include information such as vehicle velocity, speed, time, map data (e.g., a High Definition map (not shown in FIG. 11A)), location data (e.g., vehicle's 1100 location, such as on a map), direction, location of other vehicles (e.g., an occupancy grid), information about objects and status of objects as perceived by controller(s) 1136, etc. For example, in at least one embodiment, HMI display 1134 may display information about presence of one or more objects (e.g., a street sign, caution sign, traffic light changing, etc.), and/or information about driving maneuvers vehicle has made, is making, or will make (e.g., changing lanes now, taking exit 34B in two miles, etc.)” and “For example, according to at least one embodiment of technology, a PVA is used to perform computer stereo vision. In at least one embodiment, a semi-global matching-based algorithm may be used in some examples, although this is not intended to be limiting. In at least one embodiment, applications for Level 3-5 autonomous driving use motion estimation/stereo matching on-the-fly (e.g., structure from motion, pedestrian recognition, lane detection, etc.). In at least one embodiment, a PVA may perform computer stereo vision functions on inputs from two monocular cameras” and “In at least one embodiment, IMU sensor(s) 1166 may be implemented as a miniature, high performance GPS-Aided Inertial Navigation System (“GPS/INS”) that combines micro-electro-mechanical systems (“MEMS”) inertial sensors, a high-sensitivity GPS receiver, and advanced Kalman filtering algorithms to provide estimates of position, velocity, and attitude. In at least one embodiment, IMU sensor(s) 1166 may enable vehicle 1100 to estimate its heading without requiring input from a magnetic sensor by directly observing and correlating changes in velocity from a GPS to IMU sensor(s) 1166. In at least one embodiment, IMU sensor(s) 1166 and GNSS sensor(s) 1158 may be combined in a single integrated unit”}.

Further on, Zhang discloses in at least one embodiment, that a neural network may take as its input at least some subset of parameters, such as bounding box dimensions, ground plane estimate obtained (e.g., from another subsystem), output from IMU sensor(s) 1166 that correlates with vehicle 1100 orientation, distance, 3D location estimates of object obtained from neural network and/or other sensors (e.g., LIDAR sensor(s) 1164 or RADAR sensor(s) 1160), among others. In at least one embodiment, confidence values 116 are visualized in connection with an input image 102. In at least one embodiment, confidence values 116 comprise a confidence value for each pixel of an input image 102, in which confidence values of confidence values 116 are associated with a color gradient (e.g., lower values correspond to darker colors, higher values correspond to lighter colors, and/or variations thereof), in which confidence values 116 are visualized through a confidence map which visualizes each pixel of an input image 102 with its corresponding confidence value color on said color gradient. In at least one embodiment, confidence values 116 are visualized through a confidence map using numerical values of confidence values 116. Further information regarding a confidence map can be found in description of FIG. 2.

Still further, it is noted that Zhang Fig. 2 illustrates an example 200 of segmentation values and confidence values, according to at least one embodiment. In at least one embodiment, an input image 202, a system for segmentation and confidence value determination 204, segmentation values 206, and confidence values 208 are in accordance with those discussed in connection with FIG. 1. an input image 202 is an image captured from one or more image and/or video capturing devices that depicts one or more objects. In at least one embodiment, an input image 202 is at least an RGB image, BW image, grayscale image, and/or variations thereof. In at least one embodiment, an input image 202 is an image from a view of a vehicle, such as an autonomous vehicle, semi-autonomous vehicle, manual vehicle, and/or variations thereof. In at least one embodiment, an input image 202 is an image from one or more medical imaging devices. In at least one embodiment, an input image 202 is an image, captured by a drone, of an aerial or overhead view of terrain. In at least one embodiment, an input image 202 is an image, captured by a robotic device, of crops in one or more agriculture environments, such as farming or other crop cultivation environments. In at least one embodiment, referring to FIG. 2, an input image 202 depicts a car object, a sky object, and a road object.

Accordingly, the segmentation pixel values for the “car” in Zhang Fig. 2 are interpreted as the “second bounding box” with confidence values and non-zero movement of the vehicle (i.e., car).

Disposition of Claims
      Claims 1-20 are pending in this application.
      Claims 2, 4-7, 11-12 and 16-19 are objected as allowable subject matter.
      Claims 1, 3, 8-10, 13-15 and 20 are rejected.




Allowable Subject Matter
       Claims 2, 4-7, 11-12 and 16-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 3, 8-10, 13-15 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by (Zhang – US 2022/0237414 A1).

Regarding claim 1, Zhang discloses:
An object detection system (neural network 104, autonomous vehicle 1100, stereo cameras 1168 and central processing units CPUs 1106 among others components and features: [0067]: “In at least one embodiment, a neural network 104 comprises a region-based semantic segmentation network that performs semantic segmentation based on object detection results. In at least one embodiment, a neural network 104 comprises a region-based semantic segmentation network that utilizes selective search to extract object proposals, compute features for said object proposals, and classify regions using class-specific linear SVMs. In at least one embodiment, a neural network 104 comprises a fully convolutional network-based semantic segmentation network that learns a mapping from pixels to pixel, and may or may not extract region proposals”) comprising:
a vehicle (autonomous vehicle 1100: Fig. 11A and [0057, 0161]);
a camera (Any one of stereo cameras 1168, wide-view cameras 1170 {e.g., fisheye cameras}, infrared cameras 1172, surround cameras 1174 {e.g., 360-degree cameras}, long-range cameras {not shown in FIG. 11A}, mid-range cameras {not shown in FIG. 11A}: Fig. 11A and [0166]) carried by the vehicle (1100) to output a stream of image frames ([0106, 0213, 0214, 0220, 0342, 0377]: “In at least one embodiment, example 400 depicts an image 402 captured from a view of a vehicle. In at least one embodiment, an image 402 is an image captured from one or more cameras of a vehicle, such as an autonomous vehicle, semi-autonomous vehicle, manual vehicle, and/or variations thereof. In at least one embodiment, an image 402 depicts a view from a vehicle that is travelling through an environment”) including
a first image frame (first stream image frame: [0106, 0213, 0214, 0220, 0342, 0377]),
a second image frame (second stream image frame: [0106, 0213, 0214, 0220, 0342, 0377]), and
a third image frame (third stream image frame: [0106, 0213, 0214, 0220, 0342, 0377]);
a processor (central processing units CPUs 1106, graphics processing units GPUs 1108, processors 1110: [0183-0185]); and
a non-transitory computer-readable medium (computer-readable storage medium is a non-transitory computer-readable medium: [0095]) containing instructions to direct the processor (central processing units CPUs 1106, graphics processing units GPUs 1108, processors 1110: [0183-0185]) to:
apply an object detection model to the first image frame to predict a location of a first bounding box of an object in the first image frame ([0192, 0206]: “In at least one embodiment, a CNN may include a region-based or regional convolutional neural networks (“RCNNs”) and Fast RCNNs (e.g., as used for object detection) or other type of CNN” and “In at least one embodiment, neural network may take as its input at least some subset of parameters, such as bounding box dimensions, ground plane estimate obtained (e.g., from another subsystem), output from IMU sensor(s) 1166 that correlates with vehicle 1100 orientation, distance, 3D location estimates of object obtained from neural network and/or other sensors (e.g., LIDAR sensor(s) 1164 or RADAR sensor(s) 1160), among others”);
apply a confidence value to the predicted location of first bounding box ([0058, 0206]: “In at least one embodiment, a system assigns confidence values to an output of an image segmentation task in which confidence is computed as part of inferencing. In at least one embodiment, during inferencing, numerical values for each of multiple possible classes are computed. In at least one embodiment, numerical values are then used to indicate confidence in classifications. In at least one embodiment, systems such as object detection and collision in an autonomous vehicle can then use an image and confidence values together to determine more confidently objects in an image” and “a neural network that outputs a measure of confidence for each object detection. In at least one embodiment, confidence may be represented or interpreted as a probability, or as providing a relative “weight” of each detection compared to other detections. In at least one embodiment, a confidence measure enables a system to make further decisions regarding which detections should be considered as true positive detections rather than false positive detections. In at least one embodiment, a system may set a threshold value for confidence and consider only detections exceeding threshold value as true positive detections”);
in response to the confidence level exceeding a predetermined threshold ([0099]: “In at least one embodiment, one or more systems define a threshold that indicates whether a confidence value is acceptable, in which values lower than said threshold are unacceptable, and values higher than or equal to said threshold are acceptable. In at least one embodiment, for example, a threshold is defined as a value of 0.5, in which a confidence value of 0.6 is determined to be acceptable, and a confidence value of 0.4 is determined to be unacceptable”), estimate a location of a second bounding box in the second image frame based on the location of the first bounding box and non-zero movement of the vehicle (autonomous vehicle 1100: Fig. 11A and [0057, 0161, 0204]) {[0087, 0167, 0213, 0237]: “In at least one embodiment, stereo disparity algorithms refer to algorithms that determine apparent pixel differences or motions between stereo images, in which confidence information is usable to determine a reliability or quality of determined apparent pixel differences or motions” and “For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weights of information provided by adjacent frames. In at least one embodiment, where an image or portion of an image does not include motion, temporal noise reduction performed by video image compositor may use information from a previous image to reduce noise in a current image” and “In at least one embodiment, outputs may include information such as vehicle velocity, speed, time, map data (e.g., a High Definition map (not shown in FIG. 11A)), location data (e.g., vehicle's 1100 location, such as on a map), direction, location of other vehicles (e.g., an occupancy grid), information about objects and status of objects as perceived by controller(s) 1136, etc. For example, in at least one embodiment, HMI display 1134 may display information about presence of one or more objects (e.g., a street sign, caution sign, traffic light changing, etc.), and/or information about driving maneuvers vehicle has made, is making, or will make (e.g., changing lanes now, taking exit 34B in two miles, etc.)” and “For example, according to at least one embodiment of technology, a PVA is used to perform computer stereo vision. In at least one embodiment, a semi-global matching-based algorithm may be used in some examples, although this is not intended to be limiting. In at least one embodiment, applications for Level 3-5 autonomous driving use motion estimation/stereo matching on-the-fly (e.g., structure from motion, pedestrian recognition, lane detection, etc.). In at least one embodiment, a PVA may perform computer stereo vision functions on inputs from two monocular cameras” and “In at least one embodiment, IMU sensor(s) 1166 may be implemented as a miniature, high performance GPS-Aided Inertial Navigation System (“GPS/INS”) that combines micro-electro-mechanical systems (“MEMS”) inertial sensors, a high-sensitivity GPS receiver, and advanced Kalman filtering algorithms to provide estimates of position, velocity, and attitude. In at least one embodiment, IMU sensor(s) 1166 may enable vehicle 1100 to estimate its heading without requiring input from a magnetic sensor by directly observing and correlating changes in velocity from a GPS to IMU sensor(s) 1166. In at least one embodiment, IMU sensor(s) 1166 and GNSS sensor(s) 1158 may be combined in a single integrated unit”};
update the object detection model based on the estimated location of the second bounding box in the second image frame ([0101, 0213, 0545]: “For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weights of information provided by adjacent frames. In at least one embodiment, where an image or portion of an image does not include motion, temporal noise reduction performed by video image compositor may use information from a previous image to reduce noise in a current image”);
predict a location of a third bounding box in the third image frame using the updated object detection model ([0101, 0213, 0545]: “In at least one embodiment, AI-assisted annotation 3710 may be used to aid in generating annotations corresponding to imaging data 3708 to be used as ground truth data for retraining or updating a machine learning model”); and
control an operation of the vehicle (autonomous vehicle 1100: Fig. 11A and [0057, 0161]) based on the predicted location of the third bounding box ([0101, 0213, 0545]: “In at least one embodiment, a system of a transportation environment, such as a system of a vehicle, performs remedial actions such as activating one or more acceleration, braking, or warning vehicle systems, transmitting one or more notifications to other various systems, refining determined segmentation values and confidence values, re-determining determined segmentation values and confidence values, and/or variations thereof, based on unacceptable determined confidence values. In at least one embodiment, for example, a system of a vehicle that is travelling in a transportation environment (e.g., a road) obtains or otherwise receives an image captured from a view of said vehicle (e.g., through a camera mounted on said vehicle), processes said image to determine segmentation values and confidence values of pixels of said image, determines that a region of pixels of said image corresponding to an object that has been predicted to be a road object has a corresponding unacceptable confidence value, and performs a remedial action of activating one or more braking systems of said vehicle. Further information regarding a system of a transportation environment can be found in description of FIG. 4”).

Regarding claim 9, Zhang discloses:
A non-transitory computer-readable medium (computer-readable storage medium is a non-transitory computer-readable medium: [0095]) containing instructions to direct a processor (central processing units CPUs 1106, graphics processing units GPUs 1108, processors 1110: [0183-0185]) to:
apply an object detection model to a first image frame to predict a location of a first bounding box of an object in the first image frame ([0192, 0206]: “In at least one embodiment, a CNN may include a region-based or regional convolutional neural networks (“RCNNs”) and Fast RCNNs (e.g., as used for object detection) or other type of CNN” and “In at least one embodiment, neural network may take as its input at least some subset of parameters, such as bounding box dimensions, ground plane estimate obtained (e.g., from another subsystem), output from IMU sensor(s) 1166 that correlates with vehicle 1100 orientation, distance, 3D location estimates of object obtained from neural network and/or other sensors (e.g., LIDAR sensor(s) 1164 or RADAR sensor(s) 1160), among others”);
apply a confidence value to the predicted location of first bounding box ([0058, 0206]: “In at least one embodiment, a system assigns confidence values to an output of an image segmentation task in which confidence is computed as part of inferencing. In at least one embodiment, during inferencing, numerical values for each of multiple possible classes are computed. In at least one embodiment, numerical values are then used to indicate confidence in classifications. In at least one embodiment, systems such as object detection and collision in an autonomous vehicle can then use an image and confidence values together to determine more confidently objects in an image” and “a neural network that outputs a measure of confidence for each object detection. In at least one embodiment, confidence may be represented or interpreted as a probability, or as providing a relative “weight” of each detection compared to other detections. In at least one embodiment, a confidence measure enables a system to make further decisions regarding which detections should be considered as true positive detections rather than false positive detections. In at least one embodiment, a system may set a threshold value for confidence and consider only detections exceeding threshold value as true positive detections”);
in response to the confidence level exceeding a predetermined threshold ([0099]: “In at least one embodiment, one or more systems define a threshold that indicates whether a confidence value is acceptable, in which values lower than said threshold are unacceptable, and values higher than or equal to said threshold are acceptable. In at least one embodiment, for example, a threshold is defined as a value of 0.5, in which a confidence value of 0.6 is determined to be acceptable, and a confidence value of 0.4 is determined to be unacceptable”), estimate a location of a second bounding box of the object in a second image frame based on the location of the first bounding box and non-zero movement of a vehicle (autonomous vehicle 1100: Fig. 11A and [0057, 0161, 0204]) {[0087, 0167, 0213, 0237]: “In at least one embodiment, stereo disparity algorithms refer to algorithms that determine apparent pixel differences or motions between stereo images, in which confidence information is usable to determine a reliability or quality of determined apparent pixel differences or motions” and “For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weights of information provided by adjacent frames. In at least one embodiment, where an image or portion of an image does not include motion, temporal noise reduction performed by video image compositor may use information from a previous image to reduce noise in a current image” and “In at least one embodiment, outputs may include information such as vehicle velocity, speed, time, map data (e.g., a High Definition map (not shown in FIG. 11A)), location data (e.g., vehicle's 1100 location, such as on a map), direction, location of other vehicles (e.g., an occupancy grid), information about objects and status of objects as perceived by controller(s) 1136, etc. For example, in at least one embodiment, HMI display 1134 may display information about presence of one or more objects (e.g., a street sign, caution sign, traffic light changing, etc.), and/or information about driving maneuvers vehicle has made, is making, or will make (e.g., changing lanes now, taking exit 34B in two miles, etc.)” and “For example, according to at least one embodiment of technology, a PVA is used to perform computer stereo vision. In at least one embodiment, a semi-global matching-based algorithm may be used in some examples, although this is not intended to be limiting. In at least one embodiment, applications for Level 3-5 autonomous driving use motion estimation/stereo matching on-the-fly (e.g., structure from motion, pedestrian recognition, lane detection, etc.). In at least one embodiment, a PVA may perform computer stereo vision functions on inputs from two monocular cameras” and “In at least one embodiment, IMU sensor(s) 1166 may be implemented as a miniature, high performance GPS-Aided Inertial Navigation System (“GPS/INS”) that combines micro-electro-mechanical systems (“MEMS”) inertial sensors, a high-sensitivity GPS receiver, and advanced Kalman filtering algorithms to provide estimates of position, velocity, and attitude. In at least one embodiment, IMU sensor(s) 1166 may enable vehicle 1100 to estimate its heading without requiring input from a magnetic sensor by directly observing and correlating changes in velocity from a GPS to IMU sensor(s) 1166. In at least one embodiment, IMU sensor(s) 1166 and GNSS sensor(s) 1158 may be combined in a single integrated unit”}; and
update the object detection model based on the estimated location of the second bounding box in the second image frame ([0101, 0213, 0545]: “For example, in at least one embodiment, where motion occurs in a video, noise reduction weights spatial information appropriately, decreasing weights of information provided by adjacent frames. In at least one embodiment, where an image or portion of an image does not include motion, temporal noise reduction performed by video image compositor may use information from a previous image to reduce noise in a current image”);
predict a location of a third bounding box of the object in a third image frame using the updated object detection model ([0101, 0213, 0545]: “In at least one embodiment, AI-assisted annotation 3710 may be used to aid in generating annotations corresponding to imaging data 3708 to be used as ground truth data for retraining or updating a machine learning model”); and
control an operation of the vehicle (autonomous vehicle 1100: Fig. 11A and [0057, 0161]) based on the predicted location of the third bounding box ([0101, 0213, 0545]: “In at least one embodiment, a system of a transportation environment, such as a system of a vehicle, performs remedial actions such as activating one or more acceleration, braking, or warning vehicle systems, transmitting one or more notifications to other various systems, refining determined segmentation values and confidence values, re-determining determined segmentation values and confidence values, and/or variations thereof, based on unacceptable determined confidence values. In at least one embodiment, for example, a system of a vehicle that is travelling in a transportation environment (e.g., a road) obtains or otherwise receives an image captured from a view of said vehicle (e.g., through a camera mounted on said vehicle), processes said image to determine segmentation values and confidence values of pixels of said image, determines that a region of pixels of said image corresponding to an object that has been predicted to be a road object has a corresponding unacceptable confidence value, and performs a remedial action of activating one or more braking systems of said vehicle. Further information regarding a system of a transportation environment can be found in description of FIG. 4”).

Regarding claim 13, Zhang discloses:
An image segmentation system comprising:
a vehicle (autonomous vehicle 1100: Fig. 11A and [0057, 0161]);
a camera (Any one of stereo cameras 1168, wide-view cameras 1170 {e.g., fisheye cameras}, infrared cameras 1172, surround cameras 1174 {e.g., 360-degree cameras}, long-range cameras {not shown in FIG. 11A}, mid-range cameras {not shown in FIG. 11A}: Fig. 11A and [0166]) carried by the vehicle (autonomous vehicle 1100: Fig. 11A and [0057, 0161]) to output an image;
a sensor to output a point cloud corresponding to the image ([0235]: “In at least one embodiment, LIDAR technologies, such as 3D flash LIDAR, may also be used. In at least one embodiment, 3D flash LIDAR uses a flash of a laser as a transmission source, to illuminate surroundings of vehicle 1100 up to approximately 200 m. In at least one embodiment, a flash LIDAR unit includes, without limitation, a receptor, which records laser pulse transit time and reflected light on each pixel, which in turn corresponds to a range from vehicle 1100 to objects. In at least one embodiment, flash LIDAR may allow for highly accurate and distortion-free images of surroundings to be generated with every laser flash. In at least one embodiment, four flash LIDAR sensors may be deployed, one at each side of vehicle 1100. In at least one embodiment, 3D flash LIDAR systems include, without limitation, a solid-state 3D staring array LIDAR camera with no moving parts other than a fan (e.g., a non-scanning LIDAR device). In at least one embodiment, flash LIDAR device may use a 5-nanosecond class I (eye-safe) laser pulse per frame and may capture reflected laser light as a 3D range point cloud and co-registered intensity data”);
a processor (central processing units CPUs 1106, graphics processing units GPUs 1108, processors 1110: [0183-0185]); and
a non-transitory computer-readable medium (computer-readable storage medium is a non-transitory computer-readable medium: [0095]) containing instructions to direct the processor (central processing units CPUs 1106, graphics processing units GPUs 1108, processors 1110: [0183-0185]) to:
apply a segmentation model to the image to output a first predicted segmentation map including pixel labels ([0059]: “In at least one embodiment, a system for segmentation and confidence value determination is applicable to any system that obtains images and performs pixel labeling. In at least one embodiment, a system for image segmentation and confidence value determination is applicable to systems of various environments, such as transportation environments, medical environments, geo-sensing environments, and/or agriculture environments”);
fuse the first predicted segmentation map and the point cloud (Fig. 2 and [0079]: “In at least one embodiment, segmentation values 112 comprise a segmentation value for each pixel of an input image 102, in which each segmentation value is associated with a color, in which segmentation values 112 are visualized through a segmentation map which visualizes each pixel of an input image 102 with its associated segmentation value color. In at least one embodiment, a segmentation value of a pixel indicates a prediction of a class that an object corresponding to said pixel belongs to. Further information regarding a segmentation map can be found in description of FIG. 2”);
label pixels in the point cloud ([0079, 0091]);
relabel the pixel labels of the predicted segmentation map based on the labeled pixels in the point cloud to produce a second predicted segmentation map ([0091]: “In at least one embodiment, segmentation values 206 comprise a segmentation value for each pixel of an input image 202. In at least one embodiment, referring to FIG. 2, a system for segmentation and confidence value determination 204 is implemented with segmentation classes including at least a car object, a sky object, and a road object, in which segmentation values 206 comprise segmentation values indicating which pixels of an input image 202 correspond to a car object, a sky object, and a road object. In at least one embodiment, referring to FIG. 2, segmentation values 206 are visualized as a segmentation map. In at least one embodiment, referring to FIG. 2, a segmentation map visualization of segmentation values 206 depicts pixels of an input image 202 that correspond to a car object as solid color, pixels of input image 202 that correspond to a sky object as a first patterned color, and pixels of input image 202 that correspond to a road object as a second patterned color”);
compute a first quantity objectness score for an object in the first predicted segmentation map ([0061, 0066]: “FIG. 1 illustrates an example 100 of a system for segmentation and confidence value determination, according to at least one embodiment. In at least one embodiment, a system for segmentation and confidence value determination comprises at least a neural network 104, a score distribution transformation 106, a score normalization 108, a segmentation determination 110, and a confidence determination 114. In at least one embodiment, a system for segmentation and confidence value determination receives or otherwise obtains an input image 102, and determines segmentation values 112 and confidence values 116”);
compute a second quantity objectness score for an object in a depth refined segmentation map ([0061, 0066]: “In at least one embodiment, a neural network 104 comprises one or more image segmentation neural networks. In at least one embodiment, an image segmentation neural network refers to a neural network that partitions regions of an image into multiple segments (e.g., sets of pixels). In at least one embodiment, an image segmentation neural network processes an image to locate objects of said image. In at least one embodiment, an image segmentation neural network processes an image to assign labels to every pixel in said image such that pixels with a same label share various characteristics. In at least one embodiment, a neural network 104 (e.g., through one or more image segmentation neural networks) processes pixels of an image to determine scores for classes for each pixel, in which a score for a particular class for a particular pixel indicates a probability that said particular pixel corresponds to an object that belongs to said particular class. In at least one embodiment, a class, also referred to as an object class or segmentation class, refers to a classification or identifier of an object. In at least one embodiment, a neural network 104 identifies or otherwise classifies one or more objects of an input image 102 and generates confidence values (e.g., via determined scores) associated with said one or more objects”);
use the depth refined segmentation map to adjust the segmentation model (Fig. 9 and [0070, 0082]: “In at least one embodiment, a neural network 104 is trained by one or more systems and/or training frameworks that cause neural network 104 to process training images, compare training labels with values determined by neural network 104 from training images through one or more loss functions to calculate loss, and updating neural network 104 to minimize calculated loss. In at least one embodiment, a neural network 104 is trained by calculating loss and adjusting one or more weights, functions, processes, structural components, and/or variations thereof of neural network 104 such that loss is minimized. In at least one embodiment, a neural network 104 is trained with processes described in connection with FIG. 9”);
adjust the segmentation model with additional constraints in a loss function to output a prediction that mimics the depth refined segmentation map (Fig. 9 and [0070, 0082]: “In at least one embodiment, a neural network 104 is trained by one or more systems and/or training frameworks that cause neural network 104 to process training images, compare training labels with values determined by neural network 104 from training images through one or more loss functions to calculate loss, and updating neural network 104 to minimize calculated loss. In at least one embodiment, a neural network 104 is trained by calculating loss and adjusting one or more weights, functions, processes, structural components, and/or variations thereof of neural network 104 such that loss is minimized. In at least one embodiment, a neural network 104 is trained with processes described in connection with FIG. 9” and “In at least one embodiment, a confidence determination 114 determines confidence values 116, which is a collection of final confidence values determined for pixels for an input image 102. In at least one embodiment, confidence values 116 are unmodified maximum probability values. In at least one embodiment, confidence values 116 are probability values passed through a monotonic linear function, denoted by F, such that output probabilities satisfy various defined constraints. In at least one embodiment, confidence is represented mathematically by a following formula, although any variation thereof can be utilized: Confidence=F(max(SoftMax)), in which F denotes a monotonic linear function. In at least one embodiment, each value of confidence values 116 corresponds to a value of segmentation values 112. In at least one embodiment, each pixel of an input image 102 is associated with a particular value of segmentation values 112 and a corresponding value of confidence values 116. In at least one embodiment, for a particular pixel, associated segmentation value, and corresponding confidence value, said associated segmentation value indicates a prediction of a class that an object corresponding to said particular pixel belongs to, and said corresponding confidence value indicates a probability determined by one or more neural networks that said prediction is correct”);
segment a second image using the updated segmentation map ([0101]: “In at least one embodiment, a system of a transportation environment, such as a system of a vehicle, performs remedial actions such as activating one or more acceleration, braking, or warning vehicle systems, transmitting one or more notifications to other various systems, refining determined segmentation values and confidence values, re-determining determined segmentation values and confidence values, and/or variations thereof, based on unacceptable determined confidence values. In at least one embodiment, for example, a system of a vehicle that is travelling in a transportation environment (e.g., a road) obtains or otherwise receives an image captured from a view of said vehicle (e.g., through a camera mounted on said vehicle), processes said image to determine segmentation values and confidence values of pixels of said image, determines that a region of pixels of said image corresponding to an object that has been predicted to be a road object has a corresponding unacceptable confidence value, and performs a remedial action of activating one or more braking systems of said vehicle. Further information regarding a system of a transportation environment can be found in description of FIG. 4”); and
control an operation of the vehicle based on the segmenting of the second image ([0101]: “In at least one embodiment, a system of a transportation environment, such as a system of a vehicle, performs remedial actions such as activating one or more acceleration, braking, or warning vehicle systems, transmitting one or more notifications to other various systems, refining determined segmentation values and confidence values, re-determining determined segmentation values and confidence values, and/or variations thereof, based on unacceptable determined confidence values. In at least one embodiment, for example, a system of a vehicle that is travelling in a transportation environment (e.g., a road) obtains or otherwise receives an image captured from a view of said vehicle (e.g., through a camera mounted on said vehicle), processes said image to determine segmentation values and confidence values of pixels of said image, determines that a region of pixels of said image corresponding to an object that has been predicted to be a road object has a corresponding unacceptable confidence value, and performs a remedial action of activating one or more braking systems of said vehicle. Further information regarding a system of a transportation environment can be found in description of FIG. 4”).

Regarding claim 3, Zhang disclose the system according to claim 1, and further on Zhang also discloses:
wherein the updating of the object detection model is based on a plurality of estimated locations of bounding boxes in a plurality of respective image frames (Zhang [0176, 0544]).

Regarding claim 8, Zhang disclose the system according to claim 1, and further on Zhang also discloses:
wherein the vehicle comprises at least one of a global positioning satellite (GPS) system and 
wherein the instructions are configured to direct the processor to 
(1) determine and store geographic coordinates of an obstacle based upon signals from GPS system and the predicted location of the third bounding box (Zhang [0166, 0167]).

Regarding claim 10, Zhang disclose the non-transitory computer-readable medium according to claim 9, and further on Zhang also discloses:
wherein the updating of the object detection model is based on a plurality of estimated locations of bounding boxes in a plurality of respective image frames (Zhang [0163, 0164, 0176, 0182, 0244, 0544]).

Regarding claim 14, Zhang disclose the system according to claim 13, and further on Zhang also discloses:
wherein the sensor comprises a second camera, the camera and the second camera forming a stereo camera (Zhang [0163, 0164, 0176, 0182, 0244, 0544]).

Regarding claim 15, Zhang disclose the system according to claim 13, and further on Zhang also discloses:
wherein the sensor comprises a LIDAR sensor (Zhang [0163, 0164, 0176, 0182, 0244, 0544]).

Regarding claim 20, Zhang disclose the system according to claim 13, and further on Zhang also discloses:
wherein the vehicle comprises at least one of a global positioning satellite (GPS) system and 
wherein the instructions are configured to direct the processor to 
(1) determine and store geographic coordinates of an obstacle based on the segmenting of the second image (Zhang [0163, 0164, 0176, 0182, 0244, 0544]).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ruben Picon-Feliciano whose telephone number is (571)-272-4938. The examiner can normally be reached on Monday-Thursday within 11:30 am-7:30 pm ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lindsay M. Low can be reached on (571)272-1196.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RUBEN PICON-FELICIANO/Examiner, Art Unit 3747

/LINDSAY M LOW/Supervisory Patent Examiner, Art Unit 3747
Read full office action
Prosecution Timeline

Nov 29, 2023
Application Filed
Apr 05, 2025
Non-Final Rejection — §102
Jul 09, 2025
Response Filed
Oct 10, 2025
Final Rejection — §102
Apr 02, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/914,627
Patent 12601670
CONTROLLING A VISCOSITY OF FUEL IN A FUEL CONTROL SYSTEM WITH A VIBRATORY METER
2y 5m to grant Granted Apr 14, 2026
18/608,959
Patent 12594915
BRAKE FORCE DISTRIBUTION DEVICE FOR VEHICLE AND METHOD THEREOF
2y 5m to grant Granted Apr 07, 2026
18/409,479
Patent 12583384
SYSTEM AND METHOD FOR CONTROLLING A VEHICLE CONDITION CHECK LIGHT USING A DWL MODE
2y 5m to grant Granted Mar 24, 2026
18/555,968
Patent 12583423
METHOD FOR DRIVE CONTROL
2y 5m to grant Granted Mar 24, 2026
18/722,444
Patent 12576901
SYSTEM AND METHOD FOR HAPTIC CALIBRATION
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
68%
Grant Probability
82%
With Interview (+13.3%)
3y 1m
Median Time to Grant
Moderate
PTA Risk
Based on 708 resolved cases by this examiner. Grant probability derived from career allow rate.