Last updated: April 19, 2026
Application No. 18/670,416
DEPTH-BASED VEHICLE ENVIRONMENT VISUALIZATION USING GENERATIVE AI

Non-Final OA §102§DP
Filed
May 21, 2024
Examiner
SONNERS, SCOTT E
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
Interview Optional

— +12.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 375 resolved cases, 2023–2026
Examiner Intelligence

SONNERS, SCOTT E View full profile →
Grants 69% — above average
Career Allow Rate
258 granted / 375 resolved
+6.8% vs TC avg
Moderate +12% lift
Without
With
+12.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
25 currently pending
Career history
400
Total Applications
across all art units
Statute-Specific Performance

§101
7.9%
-32.1% vs TC avg
§103
39.2%
-0.8% vs TC avg
§102
29.4%
-10.6% vs TC avg
§112
14.1%
-25.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 375 resolved cases
Office Action

§102 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Benedek et al1 (“Benedek”).
Regarding claim 1, Benedek teaches one or more processors comprising: processing circuitry to (see Benedek, paragraphs 0042-0044 teaching “a system for generating a three-dimensional model, said system comprising a scanning device adapted for generating a point set corresponding to a scene comprising at least one object shape, a point set dividing module adapted for dividing the point set corresponding to the scene into a foreground point set corresponding to the foreground of the scene, and comprising a subset corresponding to the at least one object shape of the point set corresponding to the scene, and into a background point set corresponding to the background of the scene, an object shape subset dividing module adapted for dividing the foreground point into each of at least one object shape subset corresponding to the at least one object shape, a background modelling module adapted for generating a background three-dimensional model on the basis of the background point set, an optical model-generating module adapted for generating from the optical recordings a three-dimensional model of at least one substituting object shape assignable to each of the at least one object shape, and a model combining module adapted for generating a combined three-dimensional model on the basis of the background three-dimensional model and the three-dimensional model of at least one substituting object shape substituting each of the at least one object shape subset, respectively” where such modules of such a system performing processing on the data as below are processors comprising processing circuitry to carry out the complex operations only possible using a processor with processing circuitry and where for example paragraph 0150 teaches “combination of various formats of data coming from different sources is performed by software developed for this purpose, by which the data are brought to a common format” and for example “Displaying is preferably carried out by a programme based on a VTK Visualisation Kit” such that here since the software has been developed to perform the functions, the modules of the system are processors with processing circuitry to perform the techniques as explained below):
compute, based at least on sensor data generated using one or more sensors of an ego-machine in an environment (see Benedek, paragraphs 0082-0086 teaching “generating a three-dimensional model, a so-called combined three-dimensional model. The combined three-dimensional model comprises parts reconstructed on the basis of a point set corresponding to a scene generated by a scanning device, and parts modelled by three-dimensional models generated on the basis of optical recordings” where “a point set corresponding to a scene is generated by means of a scanning device where the scene comprises at least one object shape” and “a combined three-dimensional model is generated on the basis of the background three-dimensional model and the three-dimensional model of at least one substituting object shape substituting each of the at least one object shape subset, respectively” such that here the computing is based on the point set generated by a scanning device sensor of an ego-machine in an environment as the camera may be for example attached to a moving ego-machine where “LIDAR device converts the distance data into a point set corresponding to a scene in a known manner, and the LIDAR device is located in the centre of the point set” such that this device functions egocentrically with respect to the scene as the perspective is from the egocentric perspective of the device capturing the perspective of the scene and not outside cameras imaging the device and scene ), and based at least on one or more three-dimensional (3D) representations of one or more detected dynamic objects in the environment, a 3D surface topology of the environment (note that a “detected dynamic object” comprises either an object that has dynamic characteristics or that is capable of being dynamic such that it is not required that an object necessarily is moving dynamically in the environment when detected; further note that “(3D) representations” of such objects in the environment comprises any type of 3D representation such as their natural 3D representation as existing in the real world or could be some other computed representation of such objects; see Benedek, paragraphs 0082-0086 teaching “generating a three-dimensional model, a so-called combined three-dimensional model. The combined three-dimensional model comprises parts reconstructed on the basis of a point set corresponding to a scene generated by a scanning device, and parts modelled by three-dimensional models generated on the basis of optical recordings” and “a point set corresponding to a scene is generated by means of a scanning device where the scene comprises at least one object shape. Then, in an operational step S120, the point set corresponding to a scene is divided into a foreground point set comprising a subset of at least one object shape corresponding to the foreground of the scene, and into a background point set corresponding to a scene background. In an operational step S130, from the foreground point set, at least one object shape subset corresponding to each of the at least one object shape, respectively, is separated from the foreground point set. In an operational step S140, the three-dimensional model of the scene background is generated on the basis of the background point set. Then, in an operational step S150, a three-dimensional model of at least one substituting object shape assignable to each of the at least one object shape, respectively, is generated from optical recordings. And finally, in an operational step S160, a combined three-dimensional model is generated on the basis of the background three-dimensional model and the three-dimensional model of at least one substituting object shape substituting each of the at least one object shape subset, respectively” such that here at least one object is represented in the 3D dimensions in the environment and its 3D shape is determined along with the 3D shape of the background environment such that the 3D model generated is a computed 3D surface topology of the environment and note for example as in paragraph 0012 “object of the invention is to provide a method and a system by which the three-dimensional model of a scene can be generated substantially, that is almost, in real time in a way that some object shapes, for example, moving people, cars or other significant shapes, significant from the aspect of the scene and located in the foreground of the scene, and the three-dimensional model of some static objects are processed on the basis of optical recordings” such that here the objects are dynamic objects; see also paragraphs 0016-0019 teaching “a topographic model is generated by modelling the topographic features of the scene, after the division of the point set corresponding to the scene into a foreground point set and a background point set” and “topographic model is made on the basis of an approximating plane fitted onto the topographic features of the scene” and “topographic model is made on the basis of a parameterised surface which is fitted onto the topographic features of the scene and follows the unevenness of the topographic features”); and
generate a visualization of the environment based at least on generating graphical content for the one or more 3D representations in the 3D surface topology using the sensor data (see Benedek, paragraphs 0082-0086 as explained above where the purpose as explained above is “generating a three-dimensional model” and this 3D model is displayed as a visualization of the environment based on generating graphical content for the one or more 3D representations in the 3D surface topology using the sensor data where “the three-dimensional model of the scene background is generated on the basis of the background point set. Then, in an operational step S150, a three-dimensional model of at least one substituting object shape assignable to each of the at least one object shape, respectively, is generated from optical recordings. And finally, in an operational step S160, a combined three-dimensional model is generated on the basis of the background three-dimensional model and the three-dimensional model of at least one substituting object shape substituting each of the at least one object shape subset, respectively” where this generating of the background and foreground 3D models is generating a visualization of the environment to display as for example paragraph 0148 teaches “combined three-dimensional model obtained by the method of the invention is preferably displayed” and paragraph 0151 teaches “Displaying is preferably carried out” and “requirement imposed on displaying is that it should support the combination of static and dynamic models, allowing their multiplication, while using the calculation capacity optimally. The combined three-dimensional model obtained by the method according to the invention may be displayed in a way that it can be seen, rotated and edited from any arbitrary point of view. In the display environment based on the VTK Visualisation Kit, user interactions with the model, like for example, shifting and scaling are permitted”).
Regarding claim 2, Benedek teaches all that is required as applied to claim 1 above and further teaches the processing circuitry further to mask the one or more detected dynamic objects during a first pass of generating the visualization (note that to mask such objects is interpreted as any manner of setting such data apart, logically, functionally, or otherwise, from other data where the selection of such data from the full set may be considered a mask of such data when used and for example the masked object may be processed or the data in which the mask objects have been masked may be processed; see Benedek, paragraphs 0082-0086 teaching “in an operational step S120, the point set corresponding to a scene is divided into a foreground point set comprising a subset of at least one object shape corresponding to the foreground of the scene, and into a background point set corresponding to a scene background. In an operational step S130, from the foreground point set, at least one object shape subset corresponding to each of the at least one object shape, respectively, is separated from the foreground point set. In an operational step S140, the three-dimensional model of the scene background is generated on the basis of the background point set” such that here this dividing out of the foreground points of the scene functions to mask the detected dynamic objects which is during a first pass of generating the visualization where a pass is some attempt at processing relating to the method or is a passing of such data used in generating the visualization, and for example also “at least one object shape subset corresponding to each of the at least one object shape, respectively, is separated from the foreground point set” such that this also functions to mask the object through such separation giving the ability to process that masked object, such as for generating a visualization of the masked object or background).
Regarding claim 3, Benedek teaches all that is required as applied to claim 1 above and further teaches the processing circuitry further to: compute a first 3D surface topology of the environment representing a static portion of the environment (see Benedek, paragraphs 0082-0086 teaching “a point set corresponding to a scene is generated by means of a scanning device where the scene comprises at least one object shape. Then, in an operational step S120, the point set corresponding to a scene is divided into a foreground point set comprising a subset of at least one object shape corresponding to the foreground of the scene, and into a background point set corresponding to a scene background. In an operational step S130, from the foreground point set, at least one object shape subset corresponding to each of the at least one object shape, respectively, is separated from the foreground point set. In an operational step S140, the three-dimensional model of the scene background is generated on the basis of the background point set. Then, in an operational step S150, a three-dimensional model of at least one substituting object shape assignable to each of the at least one object shape, respectively, is generated from optical recordings. And finally, in an operational step S160, a combined three-dimensional model is generated on the basis of the background three-dimensional model and the three-dimensional model of at least one substituting object shape substituting each of the at least one object shape subset, respectively” such that a first 3D surface topology of the environment representing a static portion of the environment corresponds to the background point set which is modeled as a background three-dimensional model where such model represents the 3D surface topology of the environment; see also paragraphs 0016-0019 teaching a “a topographic model is generated by modelling the topographic features of the scene, after the division of the point set corresponding to the scene into a foreground point set and a background point set” and “topographic model is made on the basis of an approximating plane fitted onto the topographic features of the scene” and “topographic model is made on the basis of a parameterised surface which is fitted onto the topographic features of the scene and follows the unevenness of the topographic features” such that the background is a static portion of the environment and its 3D surface topology is computed where this is part of the background 3D modeling as in paragraphs 0082-0086 explained above where “the three-dimensional model of the scene background is generated on the basis of the background point set”); and update the first 3D surface topology based at least on inserting the one or more 3D representations of the one or more detected dynamic objects into the first 3D surface topology (see Benedek, paragraphs 0016-0019 as explained above where “a projected foreground point set is generated by projecting the foreground point set to the topographic model, at least one projected object shape subset corresponding to each of the at least one object shape, respectively, is generated by dividing the projected foreground point set by means of shape filtering and/or dimensional fitting, and the at least one object shape subset is determined on the basis of the at least one projected object shape subset. In the present embodiment of the method, the object shape subsets are generated on the basis of projection to a topographic model” such that this projecting is such inserting of the 3D representations of the dynamic objects into the first 3D surface topology such that this creates the combined 3D representation as in paragraphs 0082-0086 teaching “in an operational step S150, a three-dimensional model of at least one substituting object shape assignable to each of the at least one object shape, respectively, is generated from optical recordings. And finally, in an operational step S160, a combined three-dimensional model is generated on the basis of the background three-dimensional model and the three-dimensional model of at least one substituting object shape substituting each of the at least one object shape subset, respectively” such that this inserts the 3D representation of the dynamic object into the first 3D surface topology).
Regarding claim 4, Benedek teaches all that is required as applied to claim 1 above and further teaches wherein the one or more detected dynamic objects includes at least one rigid object of one or more classes of rigid objects (note that this requires at least one rigid object of one class of rigid objects where such classes need not be generated nor objects actually classified so long as the object could be considered a rigid object which would put it in a class of rigid objects, and note that a rigid object is considered an object that is rigid in some way where rigid is taken to mean something is unable to bend or be forced out of shape or is not flexible, without action by some outside force where for example a vehicle or building or tree or other unmoveable objects are examples of rigid objects; see Benedek, paragraphs 0152-0156 teaching vehicles detected as dynamic objects which are rigid objects of a class of rigid objects where “the invention described below is covered in association with such a scene, where the object shapes in the foreground may be human figures and also vehicles. These foreground object shapes may be stationary and also mobile similarly to the discussion above”), wherein the processing circuitry is further to generate one or more 3D representations of the at least one rigid object based at least on warping one or more detected depth values corresponding to the at least one rigid object using one or more detected trajectories corresponding to the at least one rigid object (note that “warping” is considered any spatial transformation, distortion, shifting, re-projection, or the like of data elements from one coordinate state or coordinate system to another state or system, where such could comprise geometric transformations applied to data points to align them, correct for motion, or map them onto a different surface or reference frame and warping a depth value in the context of the claims could involve capturing depth data points for tracked objects at different times and locations and mathematically transforming and thus warping such points to align them into a single, coherent 3D representation; see Benedek, paragraphs 0152-0156 as explained above teaching vehicles detected as dynamic objects which are rigid objects of a class of rigid objects and paragraphs 0163-0178 teaching that to generate the 3D representation of such objects this involves a warping of detected depth values through registering such depth values of tracked objects over time in the “point set registration” where “the time series of the point set corresponding to a scene is generated by a scanning device, which has been in different places when recording the various point sets, i.e. the time series of the point set corresponding to a scene is generated in a way that the scanning device is moved. Such a situation can be conceived for example if the scanning device is fitted on top of a vehicle, and during the movement of the vehicle the members of the time series of the point set are recorded on an ongoing basis. In such cases, at least in one part of the members of the time series of the point set corresponding to a scene, a so-called point set registration is carried out, i.e. the point sets coming from various points are transformed to a common co-ordinate system, and through this a registered point set is established” and “After projection to the common co-ordinate system, a dense point set is obtained about the scene” and for example “by the registration of the point sets, the quality of information obtainable from each object shape has been substantially improved, and the registered point set subset corresponding to the static type of object shapes is much denser than the object shape subsets shown on the left hand side. This means that to the static object shapes, in such a way, a very high resolution combined object shape subset may be assigned in the point set” such that here as the trajectory of the points are detected as belonging to the same object these points may be warped in order to register them such that a 3D representation relating to the object and where it appears in the warped depth data can be determined).
Regarding claim 5, Benedek teaches all that is required as applied to claim 1 above and further teaches the processing circuitry further to fuse, into the 3D surface topology, at least a first 3D representation of at least a first detected dynamic object of the one or more detected dynamic objects generated based at least on (see Benedek, paragraph 0082 teaching “In an operational step S130, from the foreground point set, at least one object shape subset corresponding to each of the at least one object shape, respectively, is separated from the foreground point set. In an operational step S140, the three-dimensional model of the scene background is generated on the basis of the background point set. Then, in an operational step S150, a three-dimensional model of at least one substituting object shape assignable to each of the at least one object shape, respectively, is generated from optical recordings. And finally, in an operational step S160, a combined three-dimensional model is generated on the basis of the background three-dimensional model and the three-dimensional model of at least one substituting object shape substituting each of the at least one object shape subset, respectively” and see paragraph 0148 teaching “last step of the method according to the invention is the integration of the point set corresponding to the scene and the three-dimensional models obtained on the basis of the optical recordings, i.e. generating a combined three-dimensional model by means of the three-dimensional model of the background and the three-dimensional model of at least one substituting object shape. The combined three-dimensional model obtained by the method of the invention is preferably displayed” where the generating of the combined 3D model fuses into the surface topology a 3D version of the object that is substituted and fused into the representation where detected dynamic objects are treated as explained further below): tracking a trajectory of the first detected dynamic object (see Benedek, paragraphs 0103-0110 teaching “to track the object shapes, i.e. to determine their trajectories will be disclosed” where “It is attempted with the STA module to fit each actually detected object shape candidate to the considered trajectories on the basis of the positions of the weighted centres of the projected object shapes” and “On the basis of these two data, i.e. in accordance with the earlier movement of an object shape, the estimate given for the new position and according to the actually measured position, a distance matrix D is written” and “on the basis of the location points, consequently at least one trajectory is determined by performing the following steps cyclically. A next location point in sequence is assigned for the at least one object shape, the assigned location point is corrected after examination with a Kalman-filter, and finalising the corrected location point, and a proposal is made by means of a Kalman-filter, for the next location point in sequence, for the at least one object shape” such that “the input of the procedural steps described above is the time series of the point set corresponding to the scene, where each point is marked by a foreground or a background tag, i.e. each level of the point set corresponding to a scene time series is divided into a foreground point set and a background point set. As the output of this step, the object shape subsets of the foreground point set are obtained, and the object shape subsets corresponding to the same object shape preferably maintain the very same tag in the complete time series of the point set corresponding to a scene, i.e. in the time series of the point set corresponding to a scene, the object shape subsets corresponding to each object shape can be tracked” and “for each element of the time series of the point set corresponding to a scene, the projected object shape subsets corresponding to the object shapes can be obtained, i.e. again a time series of the projected object shape subsets are generated. On the basis of the at least one projected object shape subset, the location points series of at least one object shape on the topographic model is determined, and on the basis of the series of location points, at least one trajectory is defined for each of the at least one object shape. In the embodiments where the trajectory is determined, as the location point of at least one object shape—in the case of human figures principally the locations of the feet of the object shape on the ground plane—preferably the weighted centre of the projected object shape subset corresponding to the given object shape is selected” such that here this tracks a trajectory of the first and all relevant detected dynamic objects); identifying one or more detected depth values representing the first detected object in a previous time slice, and warping the one or more detected depth values using the trajectory (see Benedek, paragraphs 0103-0110 as explained above where the detected depth values correspond to the 3D point sets of the objects above and depth values from previous time slices are identified as where “on the basis of the location points, consequently at least one trajectory is determined by performing the following steps cyclically. A next location point in sequence is assigned for the at least one object shape, the assigned location point is corrected after examination with a Kalman-filter, and finalising the corrected location point, and a proposal is made by means of a Kalman-filter, for the next location point in sequence, for the at least one object shape” and as in paragraphs 0163-0178 teaching that to generate the 3D representation of such objects this involves a warping of detected depth values through registering such depth values of tracked objects over time in the “point set registration” where “the time series of the point set corresponding to a scene is generated by a scanning device, which has been in different places when recording the various point sets, i.e. the time series of the point set corresponding to a scene is generated in a way that the scanning device is moved. Such a situation can be conceived for example if the scanning device is fitted on top of a vehicle, and during the movement of the vehicle the members of the time series of the point set are recorded on an ongoing basis. In such cases, at least in one part of the members of the time series of the point set corresponding to a scene, a so-called point set registration is carried out, i.e. the point sets coming from various points are transformed to a common co-ordinate system, and through this a registered point set is established” and “After projection to the common co-ordinate system, a dense point set is obtained about the scene” and for example “by the registration of the point sets, the quality of information obtainable from each object shape has been substantially improved, and the registered point set subset corresponding to the static type of object shapes is much denser than the object shape subsets shown on the left hand side. This means that to the static object shapes, in such a way, a very high resolution combined object shape subset may be assigned in the point set” such that here as the trajectory of the points are detected as belonging to the same object these points may be warped in order to register them such that a 3D representation relating to the object and where it appears in the warped depth data can be determined).
Regarding claim 6, Benedek teaches all that is required as applied to claim 1 above and further teaches wherein the one or more detected dynamic objects includes at least one non-rigid object of one or more classes of non-rigid objects (see Benedek, paragraphs 0010-0012 teaching that detected dynamic objects can be non-rigid objects belonging to one or more classes of non-rigid objects such as “stationary or moving people” where people are considered non-rigid objects), and wherein the processing circuitry is further to generate one or more 3D representations of the at least one non-rigid object based at least on inserting, for the at least one non-rigid object, a 3D representation of a two-dimensional (2D) surface at a location in the 3D surface topology corresponding to a detected location of the at least one non-rigid object in the environment (note that a “3D representation of a two-dimensional (2D) surface” is extremely broad and would encompass the display of any 3D data on a 2D display surface as ultimately each 3D point is represented by a 2D pixel corresponding to the visible surface of the 3D point from whatever camera is responsible for capturing the scene, and for example would correspond to polygon facets of a 3D mesh model which are 2D triangle surfaces of a 3D representation for example and further note this would encompass a texture map applied to a surface representing a 3D object as well, and for example would also cover more explicitly defined or simple 2D shapes in a 3D environment such as an explicitly defined 2D billboard type object placed into a 3D space; see Benedek, paragraph 0152-0156 teaching “dynamic three-dimensional models may be multiplied not only in space, but also in time” and “a walking person can be displayed” and “the object shapes in the foreground may be human figures and also vehicles. These foreground object shapes may be stationary and also mobile similarly to the discussion above” and as in paragraphs 0165-0174 a 3D representation of the non-rigid object may be substituted for the point sets corresponding to the non-rigid object where “a combined three-dimensional model of the scene can be generated, in which object shape subsets corresponding to vehicles and/or object shape subsets corresponding to human figures are substituted by substituting three-dimensional models made on the basis of optical recordings” and “on the basis of the time stamps, at least one, stationary shape associated, static combined object shape subset and/or at least one, moving shape associated, dynamic object shape subset is separated in the registered point set” and “On the different time levels, object shape subsets 100 a, 100 b, 100 c are associated one by one with the given object shape” and “from the aspect of substituting the three-dimensional model of the substituting object shape, it does not have a significance how the trajectory serving as a basis for the fitting of the three-dimensional model was obtained, and it is only to be determined how the three-dimensional model should be fitted to the trajectory (with its point of contact on the ground or with the centre of its volume)” such that this 3D representation is of a 2D surface at a location in the 3D surface topology corresponding to a detected location of the object in the environment as the insertion or substituting is done at the locations of the detected dynamic object being tracked, and note that as in paragraphs 0141-0148 it is evidenced that the 3D representation may comprise a 3D representation of a 2D surface as the “triangular lattice” and “textured” triangular lattices of the 3D objects that are inserted are 3D representations of such 2D triangular textured lattices as appearing in the 3D space as generated).
Regarding claim 7, Benedek teaches all that is required as applied to claim 1 above and further teaches the processing circuitry further to fuse into the 3D surface topology at least a first 3D representation of a flat surface at a location corresponding to a detected centroid of a corresponding one of the one or more detected dynamic objects (note that a “3D surface representation of a flat surface” corresponds to a 3D surface representation of a 2D surface as a flat surface may be considered a 2D surface, though a flat surface may also be described in three-dimensions as well if for example the surface points all belong to only two of the described dimensions of the coordinate system, and thus as explained above this would encompass the display of any 3D data on a 2D display surface as ultimately each 3D point is represented by a 2D pixel corresponding to the visible surface of the 3D point from whatever camera is responsible for capturing the scene, and for example would correspond to polygon facets of a 3D mesh model which are 2D triangle surfaces of a 3D representation for example and further note this would encompass a texture map applied to a surface representing a 3D object as well, and for example would also cover more explicitly defined or simple 2D shapes in a 3D environment such as an explicitly defined 2D billboard type object placed into a 3D space; note that a centroid is considered any central point of a set of data where the centrality is not limited to any specific aspect but may be functionally a centroid if used as the center of some object or set of points or the like; see Benedek, paragraphs 0032-0033 teaching “he time series of at least one object shape subset is generated on the basis of the time stamps, from the at least one dynamic object shape subset, and a trajectory to the time series of the at least one object shape in the time series of the at least one object shape subset is assigned on the basis of the weighted centres of the at least one object shape subset. In the present embodiment of the invention, the trajectory of a dynamic object shape can be determined in a way other than that of the embodiments above, and the three-dimensional model of the substituting object shape can be substituted to this trajectory” such that here the “centres” are detected centroids and these positions are used to fuse or substitute the 3D models of the substituting object shape into the 3D surface topology where such representation is of a flat surface at a location corresponding to the centroid as the 3D model comprises 3D representations of flat surfaces paragraphs as in 0141-0148 where it is evidenced that the 3D representation may comprise a 3D representation of a 2D or flat surface as the “triangular lattice” and “textured” triangular lattices of the 3D objects that are inserted are 3D representations of such 2D, flat, triangular textured lattices as appearing in the 3D space as generated).
Regarding claim 8, Benedek teaches all that is required as applied to claim 1 above and further teaches wherein the generating graphical content for the one or more 3D representations is based at least on a segmented set of the sensor data classified as corresponding to the one or more detected dynamic objects (see Benedek, paragraphs 0082-0086 teaching “a point set corresponding to a scene is generated by means of a scanning device where the scene comprises at least one object shape. Then, in an operational step S120, the point set corresponding to a scene is divided into a foreground point set comprising a subset of at least one object shape corresponding to the foreground of the scene, and into a background point set corresponding to a scene background. In an operational step S130, from the foreground point set, at least one object shape subset corresponding to each of the at least one object shape, respectively, is separated from the foreground point set. In an operational step S140, the three-dimensional model of the scene background is generated on the basis of the background point set. Then, in an operational step S150, a three-dimensional model of at least one substituting object shape assignable to each of the at least one object shape, respectively, is generated from optical recordings. And finally, in an operational step S160, a combined three-dimensional model is generated on the basis of the background three-dimensional model and the three-dimensional model of at least one substituting object shape substituting each of the at least one object shape subset, respectively” such that here this division into different point sets of the sensor data is segmenting of the sensor data which is classified as background objects or dynamic foreground objects which are then processed accordingly).
Regarding claim 9, Benedek teaches all that is required as applied to claim 1 above and further teaches wherein the one or more processors are comprised in at least one of (note that in each “system for” limitation below it is not required that the technique be actually used or being used in such a system so long as the system is capable of being used for such a purpose in any manner as such systems do not breathe any new life or meaning into the claim limitations; note that the method and processors can be considered to be comprised in numerous of the systems below, but as the claim is recited in the alternative only one limitation will be specifically addressed):
a control system for an autonomous or semi-autonomous machine;
a perception system for an autonomous or semi-autonomous machine;
a system for performing simulation operations;
a system for performing digital twin operations;
a system for performing light transport simulation;
a system for performing collaborative content creation for 3D assets;
a system for performing deep learning operations;
a system for performing real-time streaming;
a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content (see Benedek, paragraphs 0082-0086 as explained above where the “combined three-dimensional model” may be considered any or all of augmented reality content, virtual reality content, or mixed reality content as the real world data is used to generate and display virtual reality content and the virtual reality content is mixed with the real world data such that the output is augmented, virtual, and/or mixed reality content and is thus generated and displayed as well);
a system implemented using an edge device;
a system implemented using a robot;
a system for performing conversational AI operations;
a system for generating synthetic data;
a system for generating synthetic data using AI;
a system incorporating one or more virtual machines (VMs);
a system implemented at least partially in a data center; or
a system implemented at least partially using cloud computing resources.
Regarding claims 10-18, the instant claims correspond to a “system comprising one or more processors” where the system performs the same functions as recited with regard to the device of “One or more processors” as in claims 1-9, respectively. Such processors can already be considered a system for performing such functions. In light of this, the limitations of claims 10-18 correspond to the limitations of claims 1-9, respectively; thus they are rejected on the same grounds as claims 1-9, respectively.
Note that claim 11 contains a further recitation that the “first pass” is “a first pass of texturizing the detected 3D surface topology” where claim 2 only requires a “first pass of generating the visualization.” This limitation is also taught by Benedek as in the rejection of claim 2 where the first pass may be considered a first pass of texturizing the detected 3D surface topology (note that to mask such objects is interpreted as any manner of setting such data apart, logically, functionally, or otherwise, from other data where the selection of such data from the full set may be considered a mask of such data when used and for example the masked object may be processed or the data in which the mask objects have been masked may be processed; see Benedek, paragraphs 0082-0086 teaching “in an operational step S120, the point set corresponding to a scene is divided into a foreground point set comprising a subset of at least one object shape corresponding to the foreground of the scene, and into a background point set corresponding to a scene background. In an operational step S130, from the foreground point set, at least one object shape subset corresponding to each of the at least one object shape, respectively, is separated from the foreground point set. In an operational step S140, the three-dimensional model of the scene background is generated on the basis of the background point set” such that here this dividing out of the foreground points of the scene functions to mask the detected dynamic objects which is during a first pass of generating the visualization where a pass is some attempt at processing relating to the method or is a passing of such data used in generating the visualization, and for example also “at least one object shape subset corresponding to each of the at least one object shape, respectively, is separated from the foreground point set” such that this also functions to mask the object through such separation giving the ability to process that masked object, such as for generating a visualization of the masked object or background and such first pass may be considered a first pass of texturizing the detected 3D surface topology as this data is first passed in order to mask such objects in order to substitute them with a texturized version of the detected 3D surface topology).
Regarding claims 19-20, the instant claims recite a method comprising the same functions as performed by the device as in claims 1 and 9, respectively, such that the device performing such a method as explained as in claim 1 such as the device of Benedek also is a device performing such a method as in claims 19-20. In light of this, the limitations of claims 19-20 correspond to the limitations of claims 1 and 9, respectively; thus they are rejected on the same grounds as claims 1 and 9, respectively.
Claims 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over at least claims 1 and 11 of copending Application No. 18670373 (reference application) in view of Benedek. Although the claims at issue are not identical, they are not patentably distinct from each other as explained below and with reference to the following table.
Conflicting Application No. 18670373
Pending Application 18670416
Claim 1. 
One or more processors comprising:
processing circuitry to:
compute, based at least on sensor data generated using one or more sensors of an ego-machine in an environment, a three-dimensional (3D) surface topology of the environment; and

generate a two-dimensional visualization that applies the sensor data to the 3D surface topology.
Claim 1. 
One or more processors comprising:
processing circuitry to:
compute, based at least on sensor data generated using one or more sensors of an ego-machine in an environment, and based at least on one or more three-dimensional (3D) representations of one or more detected dynamic objects in the environment, a 3D surface topology of the environment; and
generate a visualization of the environment based at least on generating graphical content for the one or more 3D representations in the 3D surface topology using the sensor data.

Thus, it can be seen that the instant claim 1 differs through no recitation of a “two-dimensional representation” and through the conflicting claim not requiring “based at least on one or more three-dimensional (3D) representations of one or more detected dynamic objects in the environment”. However the visual representation in the instant claim 1 must at least comprise a two-dimensional representation and thus there is no distinction there. The remaining feature missing from conflicting claim 1 to make the claim scope co-extensive is rendered obvious by the teachings of Benedek which teach all of the limitations of claim 1 including generating a 2D representation and that it is based on 3D representations of detected dynamic objects.  Thus modifying the conflicting claim 1 to arrive at the claimed invention for each dependent claim using the applicable techniques taught above by Benedek would have been obvious for one of ordinary skill in the art before the effective filing date of the invention as adding such features is known as explained above and doing so would yield predictable results and result in an improved system.  Note that the abbreviated rationale above is provided in the interest of brevity and given that the claims are likely subject to further amendment. Note that the dependent claims also recite similar subject matter and thus are rejected on similar non-statutory double patenting grounds as the parent claims with Benedek modifying the conflicting independent claim to arrive at each dependent claim.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT E SONNERS whose telephone number is (571)270-7504. The examiner can normally be reached Mon-Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SCOTT E SONNERS/Examiner, Art Unit 2613         

/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613                                                                                                                                                                                                                                                                                                                                                                                                       

        1 US PGPUB No. 20160093101
Read full office action
Prosecution Timeline

May 21, 2024
Application Filed
Jan 08, 2026
Non-Final Rejection — §102, §DP
Feb 19, 2026
Examiner Interview Summary
Feb 19, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/619,549
Patent 12561816
MOTION CAPTURE USING CONCAVE REFLECTOR STRUCTURES
2y 5m to grant Granted Feb 24, 2026
18/627,336
Patent 12561845
DISTORTION INFORMATION FOR EACH ITERATION OF VERTICES RECONSTRUCTION
2y 5m to grant Granted Feb 24, 2026
18/153,020
Patent 12524957
METHOD OF GENERATING THREE-DIMENSIONAL MODEL AND DATA PROCESSING DEVICE PERFORMING THE SAME
2y 5m to grant Granted Jan 13, 2026
17/799,604
Patent 12518408
VIDEO-BASED TRACKING SYSTEMS AND METHODS
2y 5m to grant Granted Jan 06, 2026
18/353,581
Patent 12519919
METHOD AND SYSTEM FOR CONVERTING SINGLE-VIEW IMAGE TO 2.5D VIEW FOR EXTENDED REALITY (XR) APPLICATIONS
2y 5m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
69%
Grant Probability
81%
With Interview (+12.0%)
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 375 resolved cases by this examiner. Grant probability derived from career allow rate.