Last updated: April 19, 2026
Application No. 18/484,193
View Synthesis Pipeline for Rendering Passthrough Images

Non-Final OA §103
Filed
Oct 10, 2023
Examiner
BADER, ROBERT N.
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Meta Platforms Technologies, LLC
OA Round
1 (Non-Final)
This examiner grants 44% of cases after interview

— +26.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 393 resolved cases, 2023–2026
Examiner Intelligence

BADER, ROBERT N. View full profile →
Grants 44% of resolved cases
Career Allow Rate
173 granted / 393 resolved
-18.0% vs TC avg
Strong +26% interview lift
Without
With
+26.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
32 currently pending
Career history
425
Total Applications
across all art units
Statute-Specific Performance

§101
9.9%
-30.1% vs TC avg
§103
48.7%
+8.7% vs TC avg
§102
13.9%
-26.1% vs TC avg
§112
19.5%
-20.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 393 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 6, 8-14, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2021/023335 A1 (hereinafter Garcia) in view of U.S. Patent Application Publication 2021/0174570 A1 (hereinafter Bleyer).
	Regarding claim 1, the limitations “A method comprising, by a computing system: accessing a depth map and a first image of a scene generated using one or more sensors an artificial reality device; generating, based on the first image, a plurality of segmentation masks respectively associated with a plurality of object types, wherein each segmentation mask identifies pixels in the first image that correspond to the object type associated with that segmentation mask” are taught by Garcia (Garcia, e.g. abstract, paragraphs 23-89, describes a system for rendering passthrough images for a viewer of an HMD, i.e. the claimed artificial reality device.  Garcia, e.g. paragraphs 26, 27, 32, 33, 61, teaches that the system includes sensors, e.g. stereo cameras, capturing first image(s) and generating a depth map based on the first images.  Further, Garcia, e.g. paragraphs 52-54, 62, teaches that the captured images are processed by a machine learning model to generate one or more segmentation masks for one or more object types, corresponding to the claimed segmentation masks identifying pixels in the first image(s) that correspond to a plurality of object types.  Finally, it is noted with respect to claims 14 and 18, Garcia, e.g. paragraphs 83-85, Garcia teaches that the processors may execute instructions stored on non-transitory media to perform the method.)
	The limitations “segmenting, using the plurality of segmentation masks, the depth map into a plurality of segmented depth maps respectively associated with the plurality of object types; generating a plurality of meshes using, respectively, the plurality of segmented depth maps” are taught by Garcia (Garcia, e.g. paragraphs 27, 33, 41, 55, 62, teaches that 3D models are generated for the one or more objects identified by the plurality of segmentation masks, i.e. as in paragraphs 27, 33, the dense 3D point cloud identifies the distance of each correspondence point relative to the cameras, where the segmentation mask identifies depths for pixels in the mask using the corresponding points in the dense 3D point cloud as in paragraph 55, resulting in generating a separate 3D model for each identified object using the depths of points identified in the 3D point cloud by the segmentation mask, as in paragraphs 41, 55, 62.  That is, as claimed, the depth map is segmented using the segmentation mask to generate a plurality of segmented depth maps, i.e. each of Garcia’s object models include a segmented set of depth values, and each depth map is used to generate a mesh, i.e. each object has a distinct 3D mesh/model generated from the segmented set of depth values, as in paragraphs 41, 55, 62.) 
	The limitations “for each eye of a user: … warping the plurality of meshes to generate a plurality of warped meshes for the eye; generating an eye-specific mesh for the eye by compositing the plurality of warped meshes for the eye according to the segmentation information of the [first] image; and rendering an output image for the eye using the [first] image and the eye-specific mesh” are taught by Garcia (Garcia, e.g. paragraphs 24-26, 33, 43-45, 63, teaches that the system renders passthrough images for each of the user’s eyes by deforming, i.e. warping, the 3D meshes/models of the detected objects according to the position of the respective eye, followed by rendering the deformed 3D mesh comprising the 3D object meshes/models using ray-based visibility testing to generate an output image for each eye, where the captured stereo images are used to texture the surfaces of the object.  That the deformation/re-reprojection technique is performed separately for each eye of the user, as in paragraphs 25, 43, using all of the 3D meshes/models of the detected objects, corresponding to the claimed generating a plurality of warped meshes for each eye, which are composited using the ray-based visibility testing and textured using the captured stereo image(s) to render output images for each eye, corresponding to the claimed eye-specific mesh used to render an output image using the first image and the eye-specific mesh.)
	The limitations “for each eye of a user: capturing a second image of the scene; generating, based on the second image, segmentation information identifying pixels in the second image that correspond to the plurality of object types; warping the plurality of meshes to generate a plurality of warped meshes for the eye; generating an eye-specific mesh for the eye by compositing the plurality of warped meshes for the eye according to the segmentation information of the second image; and rendering an output image for the eye using the second image and the eye-specific mesh” is not explicitly taught by Garcia (As discussed above, Garcia, e.g. paragraphs 24-26, 33, 43-45, 63, teaches that the system renders passthrough images for each of the user’s eyes by deforming, i.e. warping, the 3D meshes/models of the detected objects according to the position of the respective eye, followed by rendering the deformed 3D mesh comprising the 3D object meshes/models using ray-based visibility testing to generate an output image for each eye, where the captured stereo images are used to texture the surfaces of the object.  While Garcia teaches that the segmentation information of the first image is used to avoid occlusion problems when compositing plural 3D object meshes/models for rendering the passthrough images, i.e. the claimed eye-specific mesh is composited based on the segmentation masks of the first image, e.g. Garcia, paragraphs 46-52, 58-60, 62-64, Garcia does not address using the plurality of 3D meshes from the depth map generated at the same time as the first image with segmentation information of a second image captured after the first image/depth map time, as claimed.  It is additionally noted that Garcia, e.g. paragraphs 33-40, also notes that generating depth maps using optical flow requires a higher computing performance, suggesting depth measurements may instead be performed using a video encoder depending on the available resources.)  However, this limitation is taught by Bleyer (Bleyer, e.g. paragraphs 29-108, describes an HMD based passthrough image rendering system analogous to Garcia’s system, with sensors capturing texture and depth images of the environment, e.g. paragraphs 30-35, where the depth map may comprise a plurality of reconstructed objects as in paragraph 31, and the texture and depth maps are used to render passthrough images corresponding to the user’s viewpoint, e.g. paragraphs 67-78, i.e. the 3D meshes/models of the objects in the depth map are re-projected for the user’s viewpoint, analogous to Garcia’s re-projection technique used to generate passthrough images for each of the user’s eye positions.  Further, Bleyer, e.g. paragraphs 45-78, teaches that the processing for generating depth maps may be computationally expensive, causing perceivable latency, where one solution to improve the perceivable latency is to perform depth processing as an asynchronous process relative to capturing the first image, i.e. a single generated depth map is re-projected for more than one captured first/texture image frame time, corresponding to the claimed capturing a second image of the scene, identifying the pixels thereof used to texture the corresponding 3D mesh/model of the object segmented from the depth map generated at the same time as the first image as part of a warping/re-projection process for rendering a passthrough image at the user’s viewpoint.)
	Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Garcia’s passthrough rendering system to use Bleyer’s passthrough rendering improvement of performing asynchronous depth map and texture image processing, i.e. either in addition, or as an alternative, to the video encoding technique disclosed by Garcia for generating depth maps corresponding to the first frame, in order to reduce perceivable latency in Garcia’s system related to generating the depth maps.  Garcia’s passthrough rendering system modified to use Bleyer’s passthrough rendering improvement of performing asynchronous depth map and texture image processing, corresponds to the claimed rendering for each eye of the user, i.e. as discussed above Garcia’s system renders passthrough images for each of the user’s eyes by deforming, i.e. warping, the 3D meshes/models of the detected objects according to the position of the respective eye, followed by rendering the deformed 3D mesh comprising the 3D object meshes/models using ray-based visibility testing to generate an output image for each eye, where the captured stereo images are used to texture the surfaces of the object, such that if the same depth map/3D meshes/models were used with second/subsequent texture images for rendering second/subsequent passthrough images, the segmentation of the second/subsequent texture image would be used for performing visibility testing while avoiding occlusion problems analogous to Garcia, paragraphs 46-52, 58-60, 62-64, and by extension the eye-specific meshes would be generated by compositing the warped meshes using the segmentation masks of the second/subsequent texture image, and the rendered output images textured using the second/subsequent texture images and eye-specific meshes.
	Regarding claim 6, the limitation “wherein the one or more sensors comprise a pair of stereo cameras” is taught by Garcia (Garcia, e.g. paragraphs 26, 27, 32, 33, 61, teaches that the system includes sensors, e.g. stereo cameras, capturing first image(s).)
	Regarding claim 8, the limitation “wherein generating the plurality of meshes further comprises using one or more 3D models of the plurality of object types” is taught by Garcia (As discussed in the claim 1 rejection above, Garcia, e.g. paragraphs 27, 33, 41, 55, 62, teaches that 3D models are generated for the one or more objects identified by the plurality of segmentation masks, i.e. as in paragraphs 27, 33, the dense 3D point cloud identifies the distance of each correspondence point relative to the cameras, where the segmentation mask identifies depths for pixels in the mask using the corresponding points in the dense 3D point cloud as in paragraph 55, resulting in generating a separate 3D model for each identified object using the depths of points identified in the 3D point cloud by the segmentation mask, as in paragraphs 41, 55, 62.  Further, Garcia, e.g. paragraphs 43, 55, teaches that the 3D models are defined using a 3D mesh of vertices and triangles.  That is, for each instance of each object type identified in the first image, a corresponding 3D model is generated from the segmented portion of the depth map corresponding to one of the plurality of meshes, e.g. Garcia, paragraphs 41, 52-55, 62, 63, corresponding to the claim requirement that the plurality of meshes are generated using a 3D model for each of the plurality of object types identified in the image.)
	Regarding claim 9, the limitations “wherein at least one mesh of the plurality of meshes is generated by: identifying an object type associated with the mesh, the object type being selected from the plurality of object types; generating one or more 3D models of the identified object type that fit observed features of one or more objects of the identified object type present in the scene; and using the one or more 3D models to refine the mesh generated from the associated segmented depth map” are taught by Garcia (As discussed in the claim 8 rejection above, Garcia teaches that for each instance of each object type identified in the first image, a corresponding 3D model is generated from the segmented portion of the depth map corresponding to one of the plurality of meshes, e.g. paragraphs 41, 52-55, 62, 63, corresponding to the claim requirement that the plurality of meshes are generated using a 3D model for each of the plurality of object types identified in the image.  Garcia, paragraphs 52, 54, 62, teaches that the segmentation masks are generated for a predetermined set of objects of interest, e.g. humans, dogs, cats, etc., such that as claimed the object type associated with each of the plurality of meshes is selected from the plurality of object types.  Further, the 3D object meshes/models for each identified instance of each object type are fit to the observed features thereof in the scene, e.g. Garcia, paragraphs 43, 55, 62, corresponding to the claimed generating the 3D models of the identified object type(s) that fit the observed features present in the scene.  Finally, Garcia, e.g. paragraphs 46-52, 58-60, 62-64, Garcia teaches that the segmentation information of the texture image is used to avoid occlusion problems when compositing plural 3D object meshes/models for rendering the passthrough images, which includes generating a padded border using the segmentation masks for adjusting the set of pixels used for evaluating occlusion during visibility testing, e.g. paragraphs 58-60, 64, corresponding to the claimed using the 3D meshes/models to refine the mesh generated from the associated segmented depth map, i.e. as in the example of paragraph 58, the padded borders are used to improve the segmentation between depth values corresponding to separate objects.)
	Regarding claim 10, “wherein the identified object type is at least one of planes people, or static objects in the scene observed over a period of time” is taught by Garcia (Garcia, e.g. paragraphs 52, 62, teaches that the particular objects of interest identified by the segmentation masks produced by the machine learning model include humans, i.e. the claimed identified object type being people.)
	Regarding claim 11, the limitation “wherein the plurality of warped meshes, the eye-specific mesh, and the output image generated for a left eye of the user are different from the plurality of warped meshes, the eye-specific mesh, and the output image generated for a right eye of the user” is taught by Garcia (As discussed in the claim 1 rejection above, Garcia, e.g. paragraphs 24-26, 33, 43-45, 63, teaches that the system renders passthrough images for each of the user’s eyes by deforming, i.e. warping, the 3D meshes/models of the detected objects according to the position of the respective eye, followed by rendering the deformed 3D mesh comprising the 3D object meshes/models using ray-based visibility testing to generate an output image for each eye, where the captured stereo images are used to texture the surfaces of the object.  That is, the claimed plurality of warped meshes, composited eye-specific mesh, and output images are separately generated for the different respective locations of the left and right eyes of the user, resulting in the claimed different pluralities of warped meshes, eye-specific meshes, and output images.)
	Regarding claim 12, the limitation “wherein the plurality of warped meshes for the eye is generated by warping the plurality of meshes based on a location of a camera of the artificial reality device used for capturing the second image” is taught by Garcia in view of Bleyer (Garcia, e.g. paragraphs 27, 33, teaches that the depths determined from the stereo images correspond to distances relative to the cameras of the HMD used to capture the stereo images, where the 3D positions are determined in 3D space based on knowing the positions of the cameras in the 3D space.  Further, Garcia, e.g. paragraphs 25, 43, teaches that deforming/warping the plurality of 3D meshes/models for a given viewpoint is based on the position of the 3D points relative to the viewpoint, i.e. the warping/deformation is based on the location of the claimed camera of the artificial reality device relative to the viewpoint 310.  Finally, Bleyer, e.g. paragraphs 54, 57, 67-78, teaches that the passthrough rendering improvement of performing asynchronous depth map and texture image processing includes accounting for the change in position of the camera at the time of capturing the second/subsequent image relative to the position of the camera at the time of capturing the first/depth image, i.e. Garcia’s modified system would perform the warping, compositing, and rendering for each eye for the second/subsequent frame time based on the segmentation masks and texture image of the second/subsequent image, as discussed in the claim 1 rejection above, and additionally the position of the camera at the second/subsequent frame time relative to the user’s eye position(s).  
	Regarding claim 13, the limitation “wherein the plurality of warped meshes for the eye [are] generated by warping the plurality of meshes based on a updated pose of the artificial reality device” is taught by Garcia in view of Bleyer (As discussed in the claim 12 rejection above, Garcia, e.g. paragraphs 25, 43, teaches that deforming/warping the plurality of 3D meshes/models for a given viewpoint is based on the position of the 3D points relative to the viewpoint, i.e. the warping/deformation is based on the location of the claimed camera of the artificial reality device relative to the viewpoint 310.  Further, as discussed in the claim 12 rejection above, Bleyer, e.g. paragraphs 54, 57, 67-78, teaches that the passthrough rendering improvement of performing asynchronous depth map and texture image processing includes accounting for the change in position of the camera at the time of capturing the second/subsequent image relative to the position of the camera at the time of capturing the first/depth image, i.e. Garcia’s modified system would perform the warping, compositing, and rendering for each eye for the second/subsequent frame time based on the segmentation masks and texture image of the second/subsequent image, as discussed in the claim 1 rejection above, and additionally the position of the camera at the second/subsequent frame time relative to the user’s eye position(s).  Finally, Bleyer, e.g. paragraphs 67-78, teaches that each passthrough image for each eye is generated using an updated or predicted pose of the HMD corresponding to the second/subsequent frame time, i.e. such that Garcia’s modified system would perform the deforming/warping of paragraphs 25, 43, using the updated or predicted pose of the HMD at the second/subsequent frame time.)
Regarding claims 14 and 18, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 1 above.
Regarding claim 17, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 8 above.

Claims 2, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2021/023335 A1 (hereinafter Garcia) in view of U.S. Patent Application Publication 2021/0174570 A1 (hereinafter Bleyer) as applied to claims 1, 14, and 18 above, and further in view of U.S. Patent 11,410,385 B1 (hereinafter Alderman).
	Regarding claim 2, the limitation “wherein the depth map is generated by temporally smoothing an original depth map output by the one or more sensors” is not explicitly taught by Garcia or Bleyer (Neither reference discusses temporally smoothing the depth maps generated from the stereo images.)  However, this limitation is taught by Alderman (Alderman, e.g. abstract, cols 3-18, describes a passthrough rendering system analogous to Garcia’s system, i.e. in addition to having a common inventor and assignee, there are many identical paragraphs/columns in the references describing related passthrough rendering systems focused on different aspects of the system.  Alderman, e.g. col 5, line 8 – col 9, line 31, describes the passthrough rendering system using the same description as Garcia, paragraphs 30-40, 42-44, and further teaches, e.g. col 9, line 32 – col 10, line 2, that temporal smoothing can be performed on the depth map data structure by projecting depth data from a previous frame time to the current user’s viewpoint in the current frame time to supplement the depth map data, and in turn performing a smoothing technique on the combined dataset, corresponding to the claimed temporal smoothing of an original depth map output by the device sensors.  It is further noted that Alderman’s temporal smoothing feature corresponds to the limitations of claim 7, i.e. the temporal smoothing fills in missing depth information in the depth map, where the depth map is smoothed prior to generating the 3D meshes/models using a Poisson smoothing technique, which one of ordinary skill in the art would recognize is based on applying a filter, e.g. “Poisson Surface Reconstruction”, by Michael Kazhdan, et al., abstract, sections 3, 4, describes Poisson surface reconstruction from points, i.e. the Poisson smoothing technique, which applies a smoothing filter F.)
	Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Garcia’s passthrough rendering system, using Bleyer’s passthrough rendering improvement of performing asynchronous depth map and texture image processing, to include Alderman’s temporal smoothing technique because Alderman’s technique is applied to the same base passthrough rendering system upon which Garcia’s system is based, i.e. a known improvement to the same system, and also because one of ordinary skill in the art would recognize the benefit of temporally smoothed depth maps in the passthrough rendering system, i.e. consistently representing the same objects in the scene with the same/similar shape over time reduces visual artifacts in the resulting output images in comparison with using depth maps which are not temporally smoothed.
Regarding claims 15 and 19, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 2 above.

Claims 3, 4, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2021/023335 A1 (hereinafter Garcia) in view of U.S. Patent Application Publication 2021/0174570 A1 (hereinafter Bleyer) in view of U.S. Patent 11,410,385 B1 (hereinafter Alderman) as applied to claims 2, 15, and 19 above, and further in view of “Temporal Filtering of Depth Images using Optical Flow” by Razmik Avetisyan, et al. (hereinafter Avetisyan).
	Regarding claim 3, the limitations “wherein temporally smoothing the original depth map to generate the depth map comprises: generating optical flow data to represent motion between the first image and a previous image captured by the one or more sensors; generating a predicted depth map associated with a same time stance as the original depth map by applying the optical flow data to a previous depth map; and generating the depth map based on the original depth map and the predicted depth map” are partially taught by Garcia in view of Alderman (As discussed in the claim 2 rejection above, Alderman teaches, e.g. col 9, line 32 – col 10, line 2, that temporal smoothing can be performed on the depth map data structure by projecting depth data from a previous frame time to the current user’s viewpoint in the current frame time to supplement the depth map data, and in turn performing a smoothing technique on the combined dataset, corresponding to the claimed temporal smoothing of an original depth map output by the device sensors.  Further, Alderman’s temporal smoothing corresponds to the claimed steps of generating a predicted depth map associated with the same time stance as the original depth map, and generating the depth map based on the original depth map and the predicted depth map, i.e. projecting the t-1 depth information to the current user’s viewpoint is predicting the depth map at the current time t from the user’s viewpoint at time t, and generating a combined depth map using the predicted depth map points and the depth map generated from optical flow between the stereo images.  However, while both Garcia, e.g. paragraphs 33, 40, and Alderman, e.g. col 5, lines 47-50, col 6, lines 11-19, teach that the GPU can compute optical flow between the stereo images captured by the sensors, and one of ordinary skill in the art would have understood that optical flow between current and previous frame images could be used for Alderman’s step of projecting the t-1 depth information to the current viewpoint at time t, Alderman does not explicitly indicate how the t-1 depth information is projected to the current viewpoint, i.e. whether the projection is performed by applying optical flow data representing motion between the first image and a previous frame image, as claimed.)  However, this limitation is taught by Avetisyan (Avetisyan, e.g. abstract, sections 1, 3-5, describes a system for enhancing depth images generated by an RGBD camera by performing temporal filtering of the depth images using the optical flow between the RGB/texture images.  Avetisyan, e.g. section 3, teaches keeping a sequence of the latest n frames/pairs of RGB and depth images, where for each new frame, forward and backward motion fields are calculated for each pixel between the current frame RGB image and previous frame RGB image using optical flow, and the depth image pixels from prior frames in the sequence are forward projected to identify corresponding depth pixels of the current frame using the corresponding N motion fields, generating for each depth pixel of the current frame a sequence of historic depth values.  Further, Avetisyan, section 3, indicates that the sequence of depth values for each pixel are temporally filtered by applying a weighted filter.  That is, Avetisyan, section 3, teaches that previous depth image information can be projected into the current frame’s depth map by applying optical flow data calculated between the corresponding RGB/texture images to the depth image pixels, and the resulting projected depth map data and recorded depth map data are combined using a weighted average of the depth pixel sequence for each depth image pixel, corresponding to the claimed generating optical flow between the first and previous images, generating a predicted depth map by applying the optical flow data to a previous depth map, and generating the depth map based on the original and predicted depth map.  Finally, it is noted that Avetisyan’s step of applying the motion fields to project the previous frame’s depth pixels to the current frame corresponds to the claimed predicted depth map, i.e. although Avetisyan does not indicate the structure, per se, of the forward projected pixel data, the forward projected depth pixels collectively correspond to the predicted depth map associated with the current frame time stance generated by applying optical flow data to the previous frame depth map.)
	Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Garcia’s passthrough rendering system, using Bleyer’s passthrough rendering improvement of performing asynchronous depth map and texture image processing, including Alderman’s temporal smoothing technique, to use Avetisyan’s multiple frame temporal smoothing technique because, as noted above, Alderman does not explicitly indicate how the t-1 depth information is projected to the current viewpoint, and Avetisyan describes how t-1 depth information can be projected to the current viewpoint by calculating optical flow between corresponding RGB/texture images of the current and previous frames, and further because Avetisyan’s technique allows for multiple historical depth frames to be used as part of the temporal smoothing process, instead of just the most recent previous frame.  In Garcia’s modified passthrough rendering system including Alderman’s temporal smoothing technique using Avetisyan’s multiple frame temporal smoothing technique, one or more previous frames of depth data, e.g. t-1, t-2, etc., would be projected into the current time t viewpoint using optical flow data calculated between the corresponding current and previous frame RGB/texture images, corresponding to the claimed generating a predicted depth map associated with the same time stance as the original depth map by applying the optical flow data to a previous depth map.  Further, the combined depth map output by the multiple frame temporal smoothing technique would be generated by applying the weighted averaging filter to the sequence of depth values determined for each pixel of the depth map of the current frame as in Avetisyan’s equation 2.
	Regarding claim 4, the limitation “wherein the depth map is generated by averaging the original depth map and the predicted depth map” is taught by Garcia in view of Avetisyan (As discussed in the claim 3 rejection above, In Garcia’s modified passthrough rendering system including Alderman’s temporal smoothing technique using Avetisyan’s multiple frame temporal smoothing technique, the combined depth map output by the multiple frame temporal smoothing technique would be generated by applying the weighted averaging filter to the sequence of depth values determined for each pixel of the depth map of the current frame as in Avetisyan’s equation 2, i.e. as claimed by averaging the original and predicted depth maps.)
Regarding claims 16 and 20, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 3 above.

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2021/023335 A1 (hereinafter Garcia) in view of U.S. Patent Application Publication 2021/0174570 A1 (hereinafter Bleyer) as applied to claims 1, 14, and 19 above, and further in view of U.S. Patent Application Publication 2016/0104031 A1 (hereinafter Shotton).
	Regarding claim 5, the limitation “wherein the one or more sensors comprise a time-of-flight sensor, and the first image is an output of the time-of-flight sensor” is partially taught by Garcia in view of Bleyer (Garcia does not teach using a time-of-flight sensor, per se, although Bleyer, e.g. paragraphs 30-35 suggests that a passthrough rendering system may use depth maps generated using time-of-flight sensors.  While one of ordinary skill in the art would have therefore been motivated to include a time-of-flight sensor in Garcia’s system for generating depth maps, neither Garcia or Bleyer teach that the time-of-flight sensor generates, in addition to a depth image, an intensity/texture image which can be segmented using object masks, corresponding to the claim requirement that the first image as mapped in the claim 1 rejection is output by the time-of-flight sensor.  Therefore, in the interest of compact prosecution, Shotton is cited for teaching that time-of-flight sensors, in addition to generating depth images, generate intensity images which can be segmented to identify one or more objects therein.)  However, this limitation is taught by Bleyer in view of Shotton (As noted above, Bleyer, e.g. paragraphs 30-35 suggests that the passthrough rendering system may use depth maps generated by time-of-flight sensors.  Further, Shotton, e.g. abstract, paragraphs 17-78, discloses a system for processing time-of-flight sensor data, wherein the raw intensity image data is used to identify candidate regions corresponding to objects in the scene for further processing corresponding regions of a depth map, e.g. paragraphs 17-22, 32, 33, 41-44.  That is, one of ordinary skill in the art of computer graphics processing would understand that when including Bleyer’s suggested time-of-flight sensor for generating depth maps in Garcia’s modified passthrough rendering system, the intensity/brightness image generated by the time-of-flight sensor could be used as the claimed first image, i.e. identifying objects therein using segmentation masks generated for a plurality of object types using the machine learning model for performing the passthrough rendering.)
	Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Garcia’s passthrough rendering system, using Bleyer’s passthrough rendering improvement of performing asynchronous depth map and texture image processing, to include Bleyer’s suggested time-of-flight sensor for generating depth maps, and further to use the intensity/brightness image generated by the time-of-flight sensor as the claimed first image as taught by Shotton, because Bleyer suggests including a time-of-flight sensor for generating the depth maps, and one of ordinary skill in the art would have understood, as taught by Shotton, that the time-of-flight sensor also generates intensity/brightness images which can be segmented according to the objects depicted therein.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2021/023335 A1 (hereinafter Garcia) in view of U.S. Patent Application Publication 2021/0174570 A1 (hereinafter Bleyer) as applied to claim 1 above, and further in view of U.S. Patent 11,410,385 B1 (hereinafter Alderman) in view of “Poisson Surface Reconstruction” by Michael Kazhdan, et al. (hereinafter Kazhdan).
	Regarding claim 7, the limitations are similar to those treated in the above rejection(s) and are met by the references as discussed in claim 2 above, i.e. as noted in the claim 2 rejection, the Alderman’s temporal smoothing feature performed the claimed filling in, where one of ordinary skill in the art would know, as taught by Kazhdan, that the Poisson smoothing technique is applied using a smoothing filter function, as in Kazhdan section 3.  It is noted that this mapping relies on the same modification of Garcia’s system as in the claim 2 rejection, without further modification, as Alderman anticipates using the Poisson smoothing technique for temporal smoothing, and Kazhdan is only cited as evidence that one of ordinary skill in the art would understand the Poisson smoothing technique uses a filter.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT BADER whose telephone number is (571)270-3335. The examiner can normally be reached 11-7 m-f.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy Goddard can be reached at 571-272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ROBERT BADER/Primary Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Oct 10, 2023
Application Filed
Dec 19, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/538,814
Patent 12586334
SYSTEMS AND METHODS FOR RECONSTRUCTING A THREE-DIMENSIONAL OBJECT FROM AN IMAGE
2y 5m to grant Granted Mar 24, 2026
18/538,825
Patent 12586335
SYSTEMS AND METHODS FOR RECONSTRUCTING A THREE-DIMENSIONAL OBJECT FROM AN IMAGE
2y 5m to grant Granted Mar 24, 2026
18/258,729
Patent 12541916
METHOD FOR ASSESSING THE PHYSICALLY BASED SIMULATION QUALITY OF A GLAZED OBJECT
2y 5m to grant Granted Feb 03, 2026
18/614,976
Patent 12536728
SHADOW MAP BASED LATE STAGE REPROJECTION
2y 5m to grant Granted Jan 27, 2026
18/520,378
Patent 12505615
GENERATING THREE-DIMENSIONAL MODELS USING MACHINE LEARNING MODELS
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
44%
Grant Probability
70%
With Interview (+26.4%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 393 resolved cases by this examiner. Grant probability derived from career allow rate.