Last updated: April 19, 2026
Application No. 18/129,172
MULTIHEAD DEEP LEARNING MODEL FOR OBJECTS IN 3D SPACE

Final Rejection §103
Filed
Mar 31, 2023
Examiner
MA, MICHELLE HAU
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Rivian Ip Holdings LLC
OA Round
3 (Final)
Interview Optional

— +36.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 21 resolved cases, 2023–2026
Examiner Intelligence

MA, MICHELLE HAU View full profile →
Grants 81% — above average
Career Allow Rate
17 granted / 21 resolved
+19.0% vs TC avg
Strong +36% interview lift
Without
With
+36.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.0%
-37.0% vs TC avg
§103
84.2%
+44.2% vs TC avg
§102
6.4%
-33.6% vs TC avg
§112
5.5%
-34.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 21 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed June 24, 2025 has been entered. Claims 1-20 remain pending in the application. Applicant’s amendments to the Specification and Drawings have overcome each and every objection previously set forth in the Non-Final Office Action mailed March 3, 2025. 
Response to Arguments
Applicant’s arguments, see Page 15, filed June 24, 2025, with respect to the rejection(s) of claims 1, 11, and 20 under 25 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Xia et al. (Semantic Segmentation without Annotating Segments).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Tran (US 10928830 B1) in view of Xia et al. (Semantic Segmentation without Annotating Segments), hereinafter Xia.
Regarding claim 1, Tran teaches a method (Col. 1 lines 62-67) comprising: 
generating a bounding area around an object (Col. 14 lines 45-52 – “For example, to identify traffic lights, a traffic light detector is applied to the camera images. Visual SLAM is used to process multiple camera images to get a coarse location of the traffic light in 3D. Lidar points in the local neighborhood of this location are matched and processed to produce the bounding box and orientation of the traffic light and its sub-components”) in an image captured by one or more sensors of a vehicle (Col. 1 lines 62-67, Col. 4 lines 59-67 – “capturing roadway images using a plurality of cameras…The sensor can be a multi-focal camera and a radar on a front of the car”);
performing semantic segmentation to differentiate between the object and a traversable space (Col. 14 lines 24-43 – “Segmentation algorithms identify 3D points in the point cloud for building a model of the ground, defined as the driveable surface part of the map. These ground points are used to build a parametric model of the ground in small sections. The ground map is key for aligning the subsequent layers of the map, such as the semantic map. The semantic map layer builds on the geometric map layer by adding semantic objects such as traffic 2D and 3D objects, lane boundaries, intersections, crosswalks, parking spots, stop signs, traffic lights, etc. that are used for driving”);
and generating a three-dimensional model of an environment comprised of the object and the traversable space based on the semantic segmentation (Col. 5 lines 2-3, Col. 14 lines 19-30 – “The sensors can generate a 3D model of an environment. The 3D model can be a high definition map…The voxelized geometric map is produced by segmenting the point cloud into voxels that are as small as 5 cm×5 cm×5 cm. During real-time operation, the geometric map is the most efficient way to access point cloud information. Segmentation algorithms identify 3D points in the point cloud for building a model of the ground”; Note: the geometric map is a layer that is part of the high definition map),
wherein the three-dimensional model is used for one or more of processing or transmitting instructions useable by one or more driver assistance features of the vehicle (Col. 45 lines 21-44 – “After the adjusting, aggregating, by a processor, the plurality of 3D models to generate a comprehensive 3D model; combining the comprehensive 3D model with detailed map information; and using the combined comprehensive 3D model with detailed map information to maneuver the vehicle”). 
Tran does not teach a two-dimensional image captured by one or more sensors of a vehicle nor performing semantic segmentation of a two-dimensional image based on the bounding area. However, Xia teaches a two-dimensional image (Fig. 2 - see modified screenshot of Fig. 2 below, which shows that semantic segmentation is performed on a 2D image) and performing semantic segmentation of a two-dimensional image based on the bounding area (Fig. 2 Caption on Page 2, Paragraph 3 in 1st Col. of Page 2 – “First, the object bounding boxes with detection scores are extracted from the test image. Then, a voting based scheme is applied to estimate object shape guidance. By making use of the shape guidance, a graph-cut-based figure ground segmentation provides a mask for each bounding box. Finally, these masks are merged and post-processed to obtain the final result…we propose an efficient, learning-free design for semantic segmentation when the object bounding boxes are available”; Note: see modified screenshot of Fig. 2 below). 

    PNG
    media_image1.png
    582
    1126
    media_image1.png
    Greyscale

Modified screenshot of Fig. 2 (taken from Xia)
Since a two-dimensional image can be captured by a multi-focal camera or radar and since semantic segmentation can be performed on a two-dimensional image, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Xia to perform semantic segmentation of a 2D image based on bounding areas because “Numerous semantic segmentation methods utilize the object bounding box as a prior. The bounding boxes are provided by either user interaction or object detectors. These methods tend to exploit the provided bounding box merely to exclude its exterior from segmentation” (Xia: Paragraph 3 in 2nd Col. of Page 2). In other words, using bounding areas is a common practice in semantic segmentation that provides benefits such as indicating focus areas or indicating unimportant areas. Therefore, the bounding boxes generated by Tran could be used to perform semantic segmentation on a camera 2D image, as disclosed in Xia, to make it easier to identify objects. 
	Regarding claim 5, Tran in view of Xia teaches the method of claim 1. Tran further teaches wherein the three-dimensional model comprises a three- dimensional bounding area around one or more of the object or the traversable space (Col. 22 lines 34-39 – “The frustum is a volumetric construct in the 3D map which helps filter out points in the 3D space not close to the bike lane thus would not correspond to the bike lane. The frustum is constructed so as to match the shape of the bounding box, e.g., a square frustum for a square bounding box or a circular frustum for a circular bounding box”; Note: The frustrum is equivalent to a 3D bounding box).
Claims 2, 11-12, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia and Wang (CN 110910453 B), hereinafter Wang.
Regarding claim 2, Tran in view of Xia teaches the method of claim 1. Tran does not teach wherein the two-dimensional image is captured by a monocular camera. However, Wang teaches wherein the two-dimensional image is captured by a monocular camera (Paragraph 0016 – “synchronously collecting environmental images by using multiple vehicle-mounted monocular cameras”; Note: monocular cameras can only capture 2D images). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the monocular cameras from Wang because monocular cameras are cheaper and require less processing power than other types of cameras, such as binocular ones (Wang: Paragraph 0011). Thus, using monocular cameras would make the method more efficient and cost less. 
Regarding claim 11, Tran teaches a system (Fig. 2A, Col. 7 line 52) comprising: processing circuitry, communicatively coupled to the camera (Col. 5 lines 11-17 – “a processor coupled to the lidar, radar, multi-focal camera and thermal imagers”), configured to: 
generate a bounding area around an object identified (Col. 14 lines 45-52 – “For example, to identify traffic lights, a traffic light detector is applied to the camera images. Visual SLAM is used to process multiple camera images to get a coarse location of the traffic light in 3D. Lidar points in the local neighborhood of this location are matched and processed to produce the bounding box and orientation of the traffic light and its sub-components”) in an image captured by one or more sensors of a vehicle (Col. 1 lines 62-67, Col. 4 lines 59-67 – “capturing roadway images using a plurality of cameras…The sensor can be a multi-focal camera and a radar on a front of the car”);
perform semantic segmentation to differentiate between the object and a traversable space (Col. 14 lines 24-43 – “Segmentation algorithms identify 3D points in the point cloud for building a model of the ground, defined as the driveable surface part of the map. These ground points are used to build a parametric model of the ground in small sections. The ground map is key for aligning the subsequent layers of the map, such as the semantic map.The semantic map layer builds on the geometric map layer by adding semantic objects such as traffic 2D and 3D objects, lane boundaries, intersections, crosswalks, parking spots, stop signs, traffic lights, etc. that are used for driving”);
and generate a three-dimensional model of an environment comprised of the object and the traversable space based on the semantic segmentation (Col. 5 lines 2-3, Col. 14 lines 19-30 – “The sensors can generate a 3D model of an environment. The 3D model can be a high definition map…The voxelized geometric map is produced by segmenting the point cloud into voxels that are as small as 5 cm×5 cm×5 cm. During real-time operation, the geometric map is the most efficient way to access point cloud information. Segmentation algorithms identify 3D points in the point cloud for building a model of the ground”; Note: the geometric map is a layer that is part of the high definition map),
wherein the three-dimensional model is used for one or more of processing or transmitting instructions useable by one or more driver assistance features of the vehicle (Col. 45 lines 21-44 – “After the adjusting, aggregating, by a processor, the plurality of 3D models to generate a comprehensive 3D model; combining the comprehensive 3D model with detailed map information; and using the combined comprehensive 3D model with detailed map information to maneuver the vehicle”). 
Tran does not teach a monocular camera coupled to processing circuitry nor a two-dimensional image captured by one or more sensors of a vehicle. However, Wang teaches a monocular camera and a two-dimensional image (Paragraph 0016 – “synchronously collecting environmental images by using multiple vehicle-mounted monocular cameras”; Note: monocular cameras can only capture two-dimensional images). Since monocular cameras are a type of camera, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the monocular camera of Wang, coupled to the processor of Tran, because monocular cameras are cheaper and require less processing power than other types of cameras, such as binocular ones (Wang: Paragraph 0011). Thus, using monocular cameras would make the method more efficient and cost less. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the two-dimensional image in Wang because two-dimensional images can be captured by monocular cameras or other types of sensors of a vehicle. Furthermore, Tran modified by Wang still does not teach performing semantic segmentation of a two-dimensional image based on the bounding area. However, Xia teaches performing semantic segmentation of a two-dimensional image based on the bounding area (Fig. 2 Caption on Page 2, Paragraph 3 in 1st Col. of Page 2 – “First, the object bounding boxes with detection scores are extracted from the test image. Then, a voting based scheme is applied to estimate object shape guidance. By making use of the shape guidance, a graph-cut-based figure ground segmentation provides a mask for each bounding box. Finally, these masks are merged and post-processed to obtain the final result…we propose an efficient, learning-free design for semantic segmentation when the object bounding boxes are available”; Note: see modified screenshot of Fig. 2 above). Since semantic segmentation can be performed on a two-dimensional image, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Xia to perform semantic segmentation of a 2D image based on bounding areas because “Numerous semantic segmentation methods utilize the object bounding box as a prior. The bounding boxes are provided by either user interaction or object detectors. These methods tend to exploit the provided bounding box merely to exclude its exterior from segmentation” (Xia: Paragraph 3 in 2nd Col. of Page 2). In other words, using bounding areas is a common practice in semantic segmentation that provides benefits such as indicating focus areas or indicating unimportant areas. Therefore, the bounding boxes generated by Tran could be used to perform semantic segmentation on a camera 2D image, as disclosed in Xia, to make it easier to identify objects. 
Regarding claim 12, Tran in view of Xia and Wang teaches the system of claim 11. Tran does not teach wherein the two-dimensional image is captured by a monocular camera. However, Wang teaches wherein the two-dimensional image is captured by a monocular camera (Paragraph 0016 – “synchronously collecting environmental images by using multiple vehicle-mounted monocular cameras”; Note: monocular cameras can only capture 2D images). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the monocular cameras from Wang because monocular cameras are cheaper and require less processing power than other types of cameras, such as binocular ones (Wang: Paragraph 0011). Thus, using monocular cameras would make the method more efficient and cost less. 
Regarding claim 15, Tran in view of Xia and Wang teaches the system of claim 11. Tran further teaches wherein the processing circuitry configured to generate the three-dimensional model is further configured to generate a three-dimensional bounding area around one or more of the object or the traversable space (Col. 22 lines 34-39 – “The frustum is a volumetric construct in the 3D map which helps filter out points in the 3D space not close to the bike lane thus would not correspond to the bike lane. The frustum is constructed so as to match the shape of the bounding box, e.g., a square frustum for a square bounding box or a circular frustum for a circular bounding box”; Note: The frustrum is equivalent to a 3D bounding box).
Claims 3 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia and Farooqi et al (US 10289925 B2), hereinafter Farooqi.
Regarding claim 3, Tran in view of Xia teaches the method of claim 1. Tran does not teach modifying the two-dimensional image to differentiate between the object and the traversable space by incorporating one or more of a change in a color of pixels comprising one or more of the object or the traversable space or a label corresponding to a predefined classification of pixels comprising one or more of the object or the traversable space; and assigning values to pixels corresponding to the object, wherein the values correspond to one or more of a heading, a depth within a three-dimensional space, or a regression value. However, Farooqi teaches modifying the two-dimensional image to differentiate between the object and the traversable space by incorporating one or more of a change in a color of pixels comprising one or more of the object or the traversable space or a label corresponding to a predefined classification of pixels comprising one or more of the object or the traversable space (Col. 5 lines 60-67, Col. 6 lines 1-34 – “Color segmentation is provided herein as an example type of localization/image cropping and, unless otherwise specified, is intended to be one of numerous techniques that can be utilized. Initially, an original image 405 (which can be a two dimensional RGB image or a multi-dimensional image, etc.) is provided that is subsequently cropped 410. This cropping can be based on an object identification or other techniques to reduce the amount of pixels/data that is separate from the object (i.e., portions of the image that are clearly not part of the object are removed, etc.)…This color thresholding groups together pixels having colors within pre-defined bands so that the overall number of colors are reduced…areas surrounded on at least two sides by pixels of a common band can be modified to be common with the nearest pixel grouping”; Note: the pixels of an object are modified to have a specific color); and assigning values to pixels corresponding to the object, wherein the values correspond to one or more of a heading, a depth within a three-dimensional space, or a regression value (Col. 5 lines 34-53 –“This segmentation is performed, for example, by grouping all pixels having similar depth values (i.e., depth values within a pre-defined range of values relative to one another) into one of two or more groups”; Note: values are based on depth). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Farooqi to modify the two-dimensional image and assign values to pixels for the benefit of highlighting the object of interest and removing parts of the image that are not part of the object (Farooqi: Col. 5 lines 60-67, Col. 6 lines 1-5). Additionally, incorporating the teachings of Farooqi would help distinguish between the background and foreground of the image (Farooqi: Col. 5 lines 42-49).
Regarding claim 20, Tran teaches that the processing circuitry (Col. 5 lines 11-17)
generates a bounding area around an object (Col. 14 lines 45-52 – “For example, to identify traffic lights, a traffic light detector is applied to the camera images. Visual SLAM is used to process multiple camera images to get a coarse location of the traffic light in 3D. Lidar points in the local neighborhood of this location are matched and processed to produce the bounding box and orientation of the traffic light and its sub-components”) in an image captured by one or more sensors of a vehicle (Col. 1 lines 62-67, Col. 4 lines 59-67 – “capturing roadway images using a plurality of cameras…The sensor can be a multi-focal camera and a radar on a front of the car”);
performs semantic segmentation to differentiate between the object and a traversable space (Col. 14 lines 24-43 – “Segmentation algorithms identify 3D points in the point cloud for building a model of the ground, defined as the driveable surface part of the map. These ground points are used to build a parametric model of the ground in small sections. The ground map is key for aligning the subsequent layers of the map, such as the semantic map.The semantic map layer builds on the geometric map layer by adding semantic objects such as traffic 2D and 3D objects, lane boundaries, intersections, crosswalks, parking spots, stop signs, traffic lights, etc. that are used for driving”);
and generates a three-dimensional model of an environment comprised of the object and the traversable space based on the semantic segmentation (Col. 5 lines 2-3, Col. 14 lines 19-30 – “The sensors can generate a 3D model of an environment. The 3D model can be a high definition map…The voxelized geometric map is produced by segmenting the point cloud into voxels that are as small as 5 cm×5 cm×5 cm. During real-time operation, the geometric map is the most efficient way to access point cloud information. Segmentation algorithms identify 3D points in the point cloud for building a model of the ground”; Note: the geometric map is a layer that is part of the high definition map),
wherein the three-dimensional model is used for one or more of processing or transmitting instructions useable by one or more driver assistance features of the vehicle (Col. 45 lines 21-44 – “After the adjusting, aggregating, by a processor, the plurality of 3D models to generate a comprehensive 3D model; combining the comprehensive 3D model with detailed map information; and using the combined comprehensive 3D model with detailed map information to maneuver the vehicle”). 
Tran does not teach a non-transitory computer readable medium comprising computer readable instructions nor a two-dimensional image captured by one or more sensors of a vehicle. However, Farooqi teaches a non-transitory computer readable medium comprising computer readable instructions (Col. 10 lines 62-67, Col. 11 lines 1-3) and a two-dimensional image (Col. 5 lines 61-63 – “an original image 405 (which can be a two dimensional RGB image or a multi-dimensional image, etc.) is provided”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the non-transitory computer readable medium of Farooqi for the benefit of having a persistent and reliable way to store and access instructions. It also would have been obvious to incorporate two-dimensional images since two-dimensional images can be captured by the multifocal camera and radar of Tran. Furthermore, Tran modified by Farooqi still does not teach performing semantic segmentation of a two-dimensional image based on the bounding area. However, Xia teaches performing semantic segmentation of a two-dimensional image based on the bounding area (Fig. 2 Caption on Page 2, Paragraph 3 in 1st Col. of Page 2 – “First, the object bounding boxes with detection scores are extracted from the test image. Then, a voting based scheme is applied to estimate object shape guidance. By making use of the shape guidance, a graph-cut-based figure ground segmentation provides a mask for each bounding box. Finally, these masks are merged and post-processed to obtain the final result…we propose an efficient, learning-free design for semantic segmentation when the object bounding boxes are available”; Note: see modified screenshot of Fig. 2 above). Since semantic segmentation can be performed on a two-dimensional image, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Xia to perform semantic segmentation of a 2D image based on bounding areas because “Numerous semantic segmentation methods utilize the object bounding box as a prior. The bounding boxes are provided by either user interaction or object detectors. These methods tend to exploit the provided bounding box merely to exclude its exterior from segmentation” (Xia: Paragraph 3 in 2nd Col. of Page 2). In other words, using bounding areas is a common practice in semantic segmentation that provides benefits such as indicating focus areas or indicating unimportant areas. Therefore, the bounding boxes generated by Tran could be used to perform semantic segmentation on a camera 2D image, as disclosed in Xia, to make it easier to identify objects. 
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia and Kobayashi (US 20200258299 A1), hereinafter Kobayashi.
Regarding claim 4, Tran in view of Xia teaches the method of claim 1. Tran does not teach generating for display the three-dimensional model. However, Kobayashi teaches generating for display the three-dimensional model (Paragraph 0050 – “The 3D model(s) of one or more objects out of a large number of objects present in the shooting space is(are) transmitted in response to a request from the reproduction side, and is(/are) reproduced and displayed on the reproduction side”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate Kobayashi’s teaching of generating for display the three-dimensional model so that the user can easily view the three-dimensional model. 
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia and Dougherty et al. (US 20200312021 A1), hereinafter Dougherty. 
Regarding claim 6, Tran in view of Xia teaches the method of claim 5. Tran does not teach wherein the three-dimensional bounding area modifies a display of one or more of the object or the traversable space to include one or more of a color-based demarcation or a text label. However, Dougherty teaches wherein the three-dimensional bounding area modifies a display of one or more of the object or the traversable space to include one or more of a color-based demarcation or a text label (Fig. 10, Paragraph 0069, 0071 – “The machine learning model may further output a bounding box, segmentation mask, or other means for denoting the location of the object within the image frame…the computing device determines the size of the bounding box and compares the determined size to the threshold size for a bounding box to determine whether to associate the image frame with the found amenity or whether to log the found amenity”; Note: An object can be logged due to its bounding box. If it is logged, it will be displayed with a text label as shown in the modified screenshot of Fig. 10 below). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Dougherty to include one or more of a color-based demarcation or a text label in the display for the benefit of indicating to the user what objects have been detected and which objects may be relevant to the user (Dougherty: Paragraph 0013).

    PNG
    media_image2.png
    689
    629
    media_image2.png
    Greyscale

Modified screenshot of Fig. 10 (Taken from Dougherty)
Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia and Kehl et al. (US 20200160033 A1), hereinafter Kehl.
Regarding claim 7, Tran in view of Xia teaches the method of claim 1. Tran does not teach wherein the bounding area is generated in response to identifying a predefined object in the two-dimensional image. However, Kehl teaches wherein the bounding area is generated in response to identifying a predefined object in the two-dimensional image (Paragraph 0052-0054 – “Each feature map 285 may identify objects in the image 280 …The region-of-interest module 225 may determine the one or more regions-of-interest 287 in the image 280 based on the associated feature maps 285 … At 430, the lifting module 227 generates a 3D representation 291 for each region-of-interest 287 … Each 3D representation 291 may be an 8 point box and may have been generated by the lifting module 227 by calculating one or more of a height and width for the associated region-of-interest 287, …”; Note: A bounding box is generated after identifying an object using a predefined feature map). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate Kehl’s teaching of generating the bounding area in response to identifying a predefined object for the benefit of using the bounding area for detecting and avoiding objects that are known to be hazards and for route planning (Kehl: Paragraph 0007, 0022).
Regarding claim 8, Tran in view of Xia and Kehl teaches the method of claim 7. Tran does not teach wherein the predefined object is one of a vehicle, a pedestrian, a structure, a driving lane indicator, or a solid object impeding travel along a trajectory from a current vehicle position. However, Kehl teaches wherein the predefined object is a vehicle (Paragraph 0031 – “The region-of-interest module 225 may determine the one or more regions-of-interest 287 using the pixels of the image 280 and one or more of the feature maps 285 generated by the feature module 223. The regions-of-interest 287 may be regions of the image 280 that depict vehicles. Other types of objects may be depicted in each region-of-interest 287”; Note: Objects are predefined by feature maps, and in a region of interest, those objects may be vehicles). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Kehl, where the predefined object is a vehicle, for the benefit of making it quicker easier to detect and avoid known objects, especially hazardous ones (Kehl: Paragraph 0007, 0022), on the road. 
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia and Hirano et al. (CN 112319466 A), hereinafter Hirano.
Regarding claim 9, Tran in view of Xia teaches the method of claim 1. Tran does not teach wherein the three-dimensional model comprises a characterization of movement of the object relative to the vehicle and the traversable space based on one or more values assigned to pixels corresponding to the object in the two-dimensional image, wherein the one or more values correspond to one or more of a heading, a depth within a three-dimensional space around the vehicle, or a regression value. However, Hirano teaches wherein the three-dimensional model comprises a characterization of movement of the object relative to the vehicle and the traversable space based on one or more values assigned to pixels corresponding to the object in the two-dimensional image (Fig. 36B-36C, Paragraph 0100 – “FIG. 36C provides two sets of three-dimensional shapes with varying colors and gradients to represent predicted trajectories”; Note: color is a type of value assigned to pixels. Additionally, trajectory represents movement), wherein the one or more values correspond to one or more of a heading, a depth within a three-dimensional space around the vehicle, or a regression value (Paragraph 0099 – “Three-dimensional shapes can similarly be designed with different gradients or colors to better indicate direction of travel, speed”; Note: Direction is the equivalent to heading. The assigned color values of pixels are dependent on direction). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Hirano, wherein the three-dimensional model comprises a characterization of movement, for the benefit of showing the user the possible trajectory of the object in order to help the user or vehicle prevent collisions with the object (Hirano: Paragraph 0006).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia, Shin et al. (US 20230386043 A1), and Wang, hereinafter Xia, Shin, and Wang respectively.
Regarding claim 10, Tran in view of Xia teaches the method of claim 1. Tran does not teach wherein the bounding area is a second bounding area, wherein the two-dimensional image is a second two-dimensional image, and wherein generating the second bounding area comprises: generating a first bounding area around an object for a first two-dimensional image captured by a first monocular camera; processing data corresponding to pixels within the first bounding area to generate object characterization data; and generating the second bounding area around an object identified in the second two-dimensional image captured by a second monocular camera based on the object characterization data. However, Shin teaches wherein the bounding area is a second bounding area, wherein the two-dimensional image is a second two-dimensional image (Paragraph 0052 – “the object detection apparatus may generate a virtual bounding box of the first object in the second frame image”; Note: The generated bounding box can be considered a second bounding area, as it is generated in the second frame image. Additionally, a frame image is a 2D image), and wherein generating the second bounding area comprises: generating a first bounding area around an object for a first two-dimensional image captured by a first camera (Paragraph 0047-0048 – “the object detection apparatus may acquire a plurality of frame images…the object detection apparatus may detect a first bounding box of a first object… from the first frame image”); processing data corresponding to pixels within the first bounding area to generate object characterization data (Paragraph 0051, 0075-0077 – “the object detection apparatus may assign a first identification value to the first bounding box…When bounding boxes corresponding to the detected objects are generated for each of the plurality of frame images as described above, …  data 303 of the bounding box may be generated with respect to an identification value of each object”; Note: After generating a first bounding box, an identification value is obtained, which can be considered equivalent to characterization data); and generating the second bounding area around an object identified in the second two-dimensional image captured by a second camera based on the object characterization data (Paragraph 0052-0054 – “the object detection apparatus may generate a virtual bounding box of the first object in the second frame image…a third bounding box detected from the second frame image is mapped to the virtual bounding box… when the size of the area where the third bounding box overlaps the virtual bounding box is not less than a reference value, the object detection apparatus may assign the first identification value to the third bounding box”; Note: although the prior art refers to a “third” bounding box, it can be considered equivalent to a second bounding box since it corresponds to a bounding box in the second frame image. Additionally, the identification value can be considered equivalent to the characterization data of the object). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Shin to have a second bounding area for the benefit of increasing the accuracy of identifying an object (Shin: Paragraph 0003-0004). Furthermore, Tran modified by Shin still does not teach a first monocular camera capturing a first two-dimensional image nor a second monocular camera capturing a second two-dimensional image. However, Wang teaches capturing two-dimensional images by a first and second monocular camera (Paragraph 0016 – “synchronously collecting environmental images by using multiple vehicle-mounted monocular cameras”; Note: There are multiple monocular cameras so it is implied that there is at least a first and second monocular camera. Additionally, monocular cameras can only capture two-dimensional images). Since monocular cameras are a type of camera that can capture two-dimensional images, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the monocular cameras of Wang because monocular cameras are cheaper and require less processing power than other types of cameras, such as binocular ones (Wang: Paragraph 0011). Thus, using monocular cameras would make the method more efficient and cost less.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia, Wang, and Farooqi. 
Regarding claim 13, Tran in view of Xia and Wang teaches the system of claim 11. Tran does not teach wherein the processing circuitry is further configured to: modify the two-dimensional image to visually differentiate between the object and the traversable space by incorporating one or more of a change in a color of pixels comprising one or more of the object or the traversable space or a label corresponding to a predefined classification of pixels comprising one or more of the object or the traversable space; and assigning values to pixels corresponding to the object, wherein the values correspond to one or more of a heading, a depth within a three-dimensional space, or a regression value. However, Farooqi teaches wherein the processing circuitry is further configured to: modify the two-dimensional image to visually differentiate between the object and the traversable space by incorporating one or more of a change in a color of pixels comprising one or more of the object or the traversable space or a label corresponding to a predefined classification of pixels comprising one or more of the object or the traversable space (Col. 5 lines 60-67, Col. 6 lines 1-34 – “Color segmentation is provided herein as an example type of localization/image cropping and, unless otherwise specified, is intended to be one of numerous techniques that can be utilized. Initially, an original image 405 (which can be a two dimensional RGB image or a multi-dimensional image, etc.) is provided that is subsequently cropped 410. This cropping can be based on an object identification or other techniques to reduce the amount of pixels/data that is separate from the object (i.e., portions of the image that are clearly not part of the object are removed, etc.)…This color thresholding groups together pixels having colors within pre-defined bands so that the overall number of colors are reduced…areas surrounded on at least two sides by pixels of a common band can be modified to be common with the nearest pixel grouping”; Note: the pixels of an object are modified to have a specific color); and assigning values to pixels corresponding to the object, wherein the values correspond to one or more of a heading, a depth within a three-dimensional space, or a regression value (Col. 5 lines 34-53 –“This segmentation is performed, for example, by grouping all pixels having similar depth values (i.e., depth values within a pre-defined range of values relative to one another) into one of two or more groups”; Note: values are based on depth). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Farooqi to modify the two-dimensional image and assign values to pixels for the benefit of highlighting the object of interest and removing parts of the image that are not part of the object (Farooqi: Col. 5 lines 60-67, Col. 6 lines 1-5). Additionally, incorporating the teachings of Farooqi would help distinguish between the background and foreground of the image (Farooqi: Col. 5 lines 42-49).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia, Wang, and Kobayashi.
Regarding claim 14, Tran in view of Xia and Wang teaches the system of claim 11. Tran does not teach a display, wherein the processing circuitry is further configured to modify an output of the display with one or more elements of the three-dimensional model. However, Kobayashi teaches a display, wherein the processing circuitry is further configured to modify an output of the display with one or more elements of the three- dimensional model (Paragraph 0050 – “The 3D model(s) of one or more objects out of a large number of objects present in the shooting space is(are) transmitted in response to a request from the reproduction side, and is(/are) reproduced and displayed on the reproduction side”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate Kobayashi’s teaching of generating for display the three-dimensional model so that the user can easily view the three-dimensional model.
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia, Wang, and Dougherty. 
Regarding claim 16, Tran in view of Xia and Wang teaches the system of claim 15. Tran does not teach wherein the three-dimensional bounding area modifies a display of one or more of the object or the traversable space to include one or more of a color-based demarcation or a text label. However, Dougherty teaches wherein the three-dimensional bounding area modifies a display of one or more of the object or the traversable space to include one or more of a color-based demarcation or a text label (Fig. 10, Paragraph 0069, 0071 – “The machine learning model may further output a bounding box, segmentation mask, or other means for denoting the location of the object within the image frame…the computing device determines the size of the bounding box and compares the determined size to the threshold size for a bounding box to determine whether to associate the image frame with the found amenity or whether to log the found amenity”; Note: An object can be logged due to its bounding box. If it is logged, it will be displayed with a text label as shown in the modified screenshot above). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Dougherty to include one or more of a color-based demarcation or a text label in the display for the benefit of indicating to the user what objects have been detected and which objects may be relevant to the user (Dougherty: Paragraph 0013).
Claim 17 and 18 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia, Wang, Kirchner (US 11321591 B2), and Kehl, hereinafter Xia, Wang, Kirchner, and Kehl respectively.
Regarding claim 17, Tran in view of Xia and Wang teaches the system of claim 11. Tran does not teach wherein the processing circuitry is further configured to: identify one or more objects in the two-dimensional image; compare the one or more objects to predefined objects stored in memory; identify the one or more objects as respective predefined objects; and in response to identifying the one or more objects as the respective predefined objects, generate one or more respective bounding areas around the respective predefined objects. However, Kirchner teaches wherein the processing circuitry is further configured to: identify one or more objects in the two-dimensional image (Col. 15 lines 35-48 – “the sensor 12 detects data to define at least one digital representation of the object 14…  this may involve capturing video footage, which comprises many images (frames)”; Note: frame images are 2D); compare the one or more objects to predefined objects stored in memory (Col. 16 lines 11-43 – “the comparator derives a likelihood value (β) from a relative position, or other statistical relationship, of the compared feature vector (α) and the reference data. The likelihood value (β) indicates a likelihood that the compared feature vector (α) is the same as, or similar enough to, the signature which the technique 501 has learnt, from the training data 32, as being defined by the defined object … a likelihood value (β) derived by at least two different techniques 501, 502, 503 are combined to derive the composite value (θ)”); identify the one or more objects as respective predefined objects (Col. 16 lines 59-62 – “the system 10 decides, based on the composite value (θ), whether the object 14 detected by the at least one sensor and defined in the digital representation is the defined object”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Kirchner to identify objects as predefined objects for the benefit of reducing inaccuracies and having more consistent identification of objects, specifically hazards, which would help users avoid harm (Kirchner: Col. 28 lines 11-28). Furthermore, Tran modified by Kirchner still does not teach, in response to identifying the one or more objects as the respective predefined objects, generating one or more respective bounding areas around the respective predefined objects. However, Kehl teaches, in response to identifying the one or more objects as the respective predefined objects, generating one or more respective bounding areas around the respective predefined objects (Paragraph 0052-0054 – “Each feature map 285 may identify objects in the image 280…The region-of-interest module 225 may determine the one or more regions-of-interest 287 in the image 280 based on the associated feature maps 285 … At 430, the lifting module 227 generates a 3D representation 291 for each region-of-interest 287 … Each 3D representation 291 may be an 8 point box and may have been generated by the lifting module 227 by calculating one or more of a height and width for the associated region-of-interest 287 …”; Note: A bounding box is generated after identifying an object using a predefined feature map). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Kehl to generate bounding areas in response to identifying predefined objects for the benefit of using the bounding area for detecting and avoiding objects that are known to be hazards and for route planning (Kehl: Paragraph 0007, 0022).
Regarding claim 18, Tran in view of Xia, Kirchner, and Kehl teaches the system of claim 17. Tran does not teach wherein the predefined object is one of a vehicle, a pedestrian, a structure, a driving lane indicator, or a solid object impeding travel along a trajectory from a current vehicle position. However, Kehl teaches wherein the predefined object is a vehicle (Paragraph 0031 – “The region-of-interest module 225 may determine the one or more regions-of-interest 287 using the pixels of the image 280 and one or more of the feature maps 285 generated by the feature module 223. The regions-of-interest 287 may be regions of the image 280 that depict vehicles. Other types of objects may be depicted in each region-of-interest 287”; Note: Objects are predefined by feature maps, and in a region of interest, those objects may be vehicles). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Kehl, where the predefined object is a vehicle, for the benefit of making it quicker easier to detect and avoid known objects, especially hazardous ones (Kehl: Paragraph 0007, 0022), on the road. 
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Tran in view of Xia, Wang, and Hirano.
Regarding claim 19, Tran in view of Xia and Wang teaches the method of claim 11. Tran does not teach wherein the three-dimensional model comprises a characterization of movement of the object relative to the vehicle and the traversable space based on one or more values assigned to pixels corresponding to the object in the two-dimensional image, wherein the one or more values correspond to one or more of a heading, a depth within a three-dimensional space around the vehicle, or a regression value. However, Hirano teaches wherein the three-dimensional model comprises a characterization of movement of the object relative to the vehicle and the traversable space based on one or more values assigned to pixels corresponding to the object in the two-dimensional image (Fig. 36B-36C, Paragraph 0100 – “FIG. 36C provides two sets of three-dimensional shapes with varying colors and gradients to represent predicted trajectories”; Note: color is a type of value assigned to pixels. Additionally, trajectory represents movement), wherein the one or more values correspond to one or more of a heading, a depth within a three-dimensional space around the vehicle, or a regression value (Paragraph 0099 – “Three-dimensional shapes can similarly be designed with different gradients or colors to better indicate direction of travel, speed”; Note: Direction is the equivalent to heading. The assigned color values of pixels are dependent on direction). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Tran to incorporate the teachings of Hirano, wherein the three-dimensional model comprises a characterization of movement, for the benefit of showing the user the possible trajectory of the object in order to help the user or vehicle prevent collisions with the object (Hirano: Paragraph 0006).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Rong et al. (US 20210150226 A1) teaches a method of generating bounding boxes in 3D scen
Read full office action
Prosecution Timeline

Mar 31, 2023
Application Filed
Feb 25, 2025
Non-Final Rejection — §103
Jun 12, 2025
Examiner Interview Summary
Jun 12, 2025
Applicant Interview (Telephonic)
Jun 24, 2025
Response Filed
Jul 28, 2025
Non-Final Rejection — §103
Oct 09, 2025
Interview Requested
Oct 16, 2025
Applicant Interview (Telephonic)
Oct 16, 2025
Examiner Interview Summary
Nov 21, 2025
Response Filed
Dec 16, 2025
Final Rejection — §103
Feb 12, 2026
Interview Requested
Feb 18, 2026
Examiner Interview Summary
Feb 18, 2026
Applicant Interview (Telephonic)
Apr 07, 2026
Request for Continued Examination
Apr 11, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/528,488
Patent 12602750
DIFFERENTIABLE EMULATION OF NON-DIFFERENTIABLE IMAGE PROCESSING FOR ADJUSTABLE AND EXPLAINABLE NON-DESTRUCTIVE IMAGE AND VIDEO EDITING
2y 5m to grant Granted Apr 14, 2026
17/832,771
Patent 12597208
BUILDING INFORMATION MODELING SYSTEMS AND METHODS
2y 5m to grant Granted Apr 07, 2026
18/250,082
Patent 12573217
SERVER, METHOD AND COMPUTER PROGRAM FOR GENERATING SPATIAL MODEL FROM PANORAMIC IMAGE
2y 5m to grant Granted Mar 10, 2026
18/481,308
Patent 12561851
HIGH-RESOLUTION IMAGE GENERATION USING DIFFUSION MODELS
2y 5m to grant Granted Feb 24, 2026
18/193,076
Patent 12536734
Dynamic Foveated Point Cloud Rendering System
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

4-5
Expected OA Rounds
81%
Grant Probability
99%
With Interview (+36.4%)
2y 7m
Median Time to Grant
High
PTA Risk
Based on 21 resolved cases by this examiner. Grant probability derived from career allow rate.
MULTIHEAD DEEP LEARNING MODEL FOR OBJECTS IN 3D SPACE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email