Last updated: April 19, 2026
Application No. 18/976,021
NEAR EYE DISPLAY SYSTEM WITH MACHINE LEARNING (ML) BASED STEREO VIEW SYNTHESIS OVER A WIDE FIELD OF VIEW

Non-Final OA §103
Filed
Dec 10, 2024
Examiner
HODGES, SUSAN E
Art Unit
2425
Tech Center
2400 — Computer Networks
Assignee
Meta Platforms Technologies, LLC
OA Round
1 (Non-Final)
Interview Optional

— +14.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 375 resolved cases, 2023–2026
Examiner Intelligence

HODGES, SUSAN E View full profile →
Grants 67% — above average
Career Allow Rate
250 granted / 375 resolved
+8.7% vs TC avg
Moderate +14% lift
Without
With
+14.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
31 currently pending
Career history
406
Total Applications
across all art units
Statute-Specific Performance

§101
6.0%
-34.0% vs TC avg
§103
48.7%
+8.7% vs TC avg
§102
20.9%
-19.1% vs TC avg
§112
22.6%
-17.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 375 resolved cases
Office Action

§103
DETAILED ACTION
This office action is in response to the application filed on December 10, 2024. Claims 1 – 19 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for priority based on U.S. provisional applications 63/303,371 filed on January 26, 2022.

Information Disclosure Statement
The information disclosure statement (IDS) was submitted on December 10, 2024.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the Examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1 is rejected under 35 U.S.C. 103 as being unpatentable over Fortin-Deschenes et al. (US 2023/0324684 A1) referred to as Fortin-Deschenes hereinafter, and in view of ICHIKAWA et al. (US 2024/0340403 A1) referred to as ICHIKAWA hereinafter.
Regarding Claim 1, Fortin-Deschenes teaches a head-mounted display (HMD) display system (Abstract, FIG. 1 Head-Mounted Display system) having a pass-through configuration (Par. [0072], two RGB cameras (11, 12) for pass-through purposes, Par. [0075], The HMD provides visual data streams to allow the following capabilities: stereo images for the display system (which we call the pass-through stereo view)), comprising: 
a front face of the HMD (Fig. 1, Par. [0072], The HMD (7) is attached via a harness (4) onto the head of a user (1) and integrates multiple sensors (i.e. on front face of HMD as illustrated in Fig. 1 and Fig. 2A), namely two RGB cameras (11, 12) for pass-through purposes, two infrared (IR) cameras (2, 9) for stereo vision analysis, an inertial measurement unit (IMU) (not shown in the figure) and a time-of-flight (ToF) camera (13) with its IR emitter (14) for dense depth sensing); 
a right exterior-facing color stereo camera disposed on the front face and substantially in front of a right eye of a user to collect images (Fig. 2A, Par. [0073], Two RGB (i.e. color) cameras (28, 34 (i.e. right exterior-facing)) located in front of the eyes (22, 38) (i.e. substantially) capture the environment (i.e. collect images) that the user's eyes would see if they were not occluded by the HMD (7)); 
a left exterior-facing color stereo camera disposed on the front face and substantially in front of a left eye of the user to collect images (Fig. 2A, Par. [0073], Two RGB (i.e. color) cameras (28 (i.e. left exterior-facing), 34) located in front of the eyes (22, 38) (i.e. substantially) capture the environment (i.e. collect images) that the user's eyes would see if they were not occluded by the HMD (7)), wherein the right and left exterior-facing color stereo cameras and the right eye and the left eye form a visual plane (Par. [0073], the baseline (39) of the cameras (28, 34) is 64 mm, the average human eye separation (note that camera baseline can be something other than 64 mm), and the position of the cameras (28, 34) is advantageously aligned with the user's eyes (22, 38) (i.e. a visual plane) in order to minimize the incoherence of the user visual perception. The field of view (29, 30, 32, 33) of the cameras (28, 34) must closely match the field of view (23, 25, 36, 37) of the eyes (22, 38) (i.e. visual plane)); 
a processor (Par. [0018] at least one processing unit (i.e. processor) operatively connected to the pair of RGB camera sensors, Par. [0072], the processing units (21) are part of the HMD (7)) to receive the collected images (Par. [0021] obtaining (i.e. receive) from the pair of RGB camera sensors pass-through stereo view images (i.e. collected images)) to perform stereo view synthesis (Par. [0026] performing image processing on the pass-through stereo view images and the stereo images, Par. [0029] mixing processed images (i.e. view synthesis) and the processed rendered graphics resulting in the graphic content; Par. [0030] providing the graphic content to the display. Par. [0076], The compositing module (130) mixes the rendered graphics and the camera images, the resulting images being displayed on the display (27)); and 
a memory storing instructions, which when executed by the processor (Par. [0020], at least one processing unit has an associated memory comprising instructions stored thereon, that when executed on the at least one processing unit perform the step), cause the processor to perform stereo view synthesis (Par. [0026] performing image processing on the pass-through stereo view images and the stereo images, Par. [0029] mixing processed images (i.e. synthesis) and the processed rendered graphics resulting in the graphic content; Par. [0030] providing the graphic content to the display, Par. [0076], The compositing module (130) mixes the rendered graphics and the camera images, the resulting images being displayed on the display (27)) by performing: 
disocclusion filtering to minimize a disocclusion region, wherein the disocclusion region is due to a viewpoint difference (Par. [0007], An occlusion mask (i.e. disocclusion filtering) can be extracted from the tracking information to avoid situations (i.e. minimize) where real objects may inadvertently be hidden (i.e. disocclusion region) by a virtual element that should be located further away or behind the object (i.e. view point difference)).
Fortin-Deschenes does not specifically teach offsets in the visual plane. Therefore, Fortin-Deschenes fails to explicitly teach disocclusion filtering to minimize a disocclusion region, wherein the disocclusion region is due to a viewpoint difference caused by at least one of a right lateral offset in the visual plane between the right exterior-facing color stereo camera and the right eye or a left lateral offset in the visual plane between the left exterior-facing color stereo camera and the left eye
However, ICHIKAWA teaches disocclusion filtering to minimize a disocclusion region (Par. [0104], in step S108, filling processing is performed using a color compensation filter (i.e. disocclusion filtering) or the like in order to compensate (i.e. minimize) for the residual occlusion region (i.e. disocclusion region) remaining in the left-eye display image), wherein the disocclusion region is due to a viewpoint difference (Fig. 4, Par. [0062], the left camera, the right camera, the left display, and the right display are disposed so that an interval between the left camera 101L and the right camera 101R is wider (i.e. viewpoint difference) than an interval (interocular distance) between the left display 108L and the right display 108R. The interval between the left camera 101L and the right camera 101R in rear view and top view is, for example, 130 mm. Furthermore, the interval (interocular distance) between the left display 108L and the right display 108R is, for example, 74 mm.), caused by at least one of a right lateral offset in the visual plane between the right exterior-facing color stereo camera and the right eye or a left lateral offset in the visual plane between the left exterior-facing color stereo camera and the left eye (Par. [0069], the shaded region of the rear object existing on the far side is not visible from the right camera viewpoint, but is visible from the right display viewpoint, that is, the right eye of the user. This region is an occlusion region by a front object (an occluding object) when an image captured by the right camera is displayed on the right display, Par. [0070],  FIG. 6  illustrating generation of an occlusion region by the arrangement (i.e. lateral offset) of the left camera 101L, the right camera 101R, the left display 108L, and the right display 108R illustrated in FIG. 4).
References Fortin-Deschenes and ICHIKAWA are considered to be analogous art because they teach passthrough configuration for cameras on HMDs. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying a filtering step to minimize a disocclusion region due to an offset as suggested by ICHIKAWA in the invention of Fortin-Deschenes in order to compensate for an occlusion region caused by an occluding object (See ICHIKAWA Par. [0047]). 

Claims 2 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Fortin-Deschenes (US 2023/0324684 A1), in view of ICHIKAWA (US 2024/0340403 A1), and in further view of Pugh et al. (US 2021/0142497 A1) referred to as Pugh hereinafter.
Regarding Claim 2, Fortin-Deschenes in view of ICHIKAWA teaches Claim 1. Fortin-Deschenes further teaches wherein the disocclusion filtering comprises at least one of: removal of one or more partial disocclusion hole regions that occur in only one of a right image and a left image, or removal of one or more full disocclusion hole regions that occur in both of the right image and the left image (Par. [0087], Occlusion masks can be found by comparing the calculated depth of each pixel (i.e. full disocclusion hole regions) with that of the virtual object(s). The camera images are blended (i.e. holes are removed) per pixel channel. The alpha mask A needs to be different in each color channel [R,G,B], because each channel is remapped to correct color aberration of the eyepieces); wherein the left and right images (Fig. 9, Par. [0076], Stereo images (106) (i.e. left and right images) for tracking can also be captured by the RGB stereo camera (104)) are to be fed (Par. [0082], A classification algorithm (408) such as support vector machines, is utilized to teach a model to track and recognize those features) for final reconstruction of a target eye view for one of the left eye and the right eye (Par. [0087],  FIG. 10 shows a flow diagram of an exemplary process to achieve graphics rendering and compositing (130). Par. [0076] The compositing module (130) mixes the rendered graphics and the camera images, the resulting images (i.e. final reconstruction) being displayed (i.e. target eye view) on the display (27), where the user (1) wearing the HMD (7) looks at a display (27) through wide angle eyepieces (26, 35)(i.e. left and right eye)). ICHIKAWA also teaches wherein the disocclusion filtering comprises at least one of: removal of one or more partial disocclusion hole regions that occur in only one of a right image and a left image (Par. [0104], filling processing is performed using a color compensation filter or the like in order to compensate for the residual occlusion region (i.e. removal of partial disocclusion hole regions) remaining in the left-eye display image without being compensated in the process of step S106. Then, the left-eye display image subjected to the filling processing in step S108 is finally output as a left-eye display image to be displayed on the left display 108L); or removal of one or more full disocclusion hole regions that occur in both of the right image and the left image, wherein the left and right images are to be fed (Fig. 9, left camera image, right camera image, Par. [0047],  The information processing apparatus 200 uses a color image captured by the color camera 101 and a depth image generated from depth information obtained by the distance measurement sensor 102 as inputs, and generates a left-eye display image and a right-eye display image in which an occlusion region caused by an occluding object is compensated) for final reconstruction of a target eye view for one of the left eye and the right eye (Par. [0047],  The left-eye display image and the right-eye display image are supplied from the information processing apparatus 200 to the synthesis unit 107 (i.e. final reconstruction) and finally output, the left-eye display image is displayed on the left display 108L (i.e. target eye view), and the right-eye display image is displayed on the right display 108R (i.e. target eye view)).
Fortin-Deschenes in view of ICHIKAWA do not specifically teach neural networking. Therefore, Fortin-Deschenes in view of ICHIKAWA fails to explicitly teach wherein the images are to be fed to a neural network for final reconstruction of a target eye view.
However, Pugh teaches wherein the left and right images are to be fed (Fig. 1A, Par. [0019], obtaining a set of images S100, binocular stereo (i.e. left and right images) Par. [0021]) to a neural network for final reconstruction of a target eye view (Fig. 1A, Par. [0019], modifying at least one object in the rendered scene S700 (i.e. final reconstruction of target view), Fig. 11, Par. [0170]-[0171] removing pixels of a real object from a rendered scene includes changes to occlusion behavior such as disabling occlusion for removed pixels, removing depth information for pixels of the object from the 3D depth map of the scene, and/or replacing the depths of the removed pixels with new depth values).
References Fortin-Deschenes, ICHIKAWA and Pugh are considered to be analogous art because they teach occlusion of virtual objects in images of real objects. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying using neural network for reconstruction as suggested by Pugh in the inventions of Fortin-Deschenes and ICHIKAWA in order to determine a depth map for the image by using neural networks (See Pugh Par. [0021]) and to enable dynamic occlusion (controllable obscuring of virtual objects by existing physical objects) and disocclusion (removal of existing foreground objects) using computer vision techniques (See Pugh Par. [0029]). 

Regarding Claim 3, Fortin-Deschenes in combination with ICHIKAWA and Pugh teaches Claim 2. Fortin-Deschenes teaches the right image is a color image and the left image is a color image (Fig. 9, Par. [0082] color stereo images (106)). Pugh further teaches splatted color images (Par. [0019], adjusting and compositing the set of images into an image scene 300, where Par.. [0070] S300 may use one or multiple pixel motion models including: homography warps, affine warps, rotational warps, translational warps, optical flow fields, depth-layered warps (i.e. splatted), novel-view synthesis, or any other suitable coarse-alignment technique Par. [0070]. refining the maps/depth edges using RGB image information (i.e. color image)). 
References Fortin-Deschenes, ICHIKAWA and Pugh are considered to be analogous art because they teach occlusion of virtual objects in images of real objects. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying using splatting or warping for reconstruction as suggested by Pugh in the inventions of Fortin-Deschenes and ICHIKAWA in order to adjust and composite the set of images into an image scene to generate a photorealistic wide-angle image, that can improve image visual quality, rectify images and stitch images together (See Pugh Par. [0060]). 

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Fortin-Deschenes (US 2023/0324684 A1), in view of ICHIKAWA (US 2024/0340403 A1), and in further view of Rusanovskyy (US 2013/0342644 A1) referred to as Rusanovskyy hereinafter.
Regarding Claim 15, Fortin-Deschenes teaches a head-mounted display (HMD) display system (Abstract, FIG. 1 Head-Mounted Display system) having a pass-through configuration (Par. [0072], two RGB cameras (11, 12) for pass-through purposes, Par. [0075], The HMD provides visual data streams to allow the following capabilities: stereo images for the display system (which we call the pass-through stereo view)), comprising: 
a front face of the HMD (Fig. 1, Par. [0072], The HMD (7) is attached via a harness (4) onto the head of a user (1) and integrates multiple sensors (i.e. on front face of HMD as illustrated in Fig. 1 and 2A), namely two RGB cameras (11, 12) for pass-through purposes, two infrared (IR) cameras (2, 9) for stereo vision analysis, an inertial measurement unit (IMU) (not shown in the figure) and a time-of-flight (ToF) camera (13) with its IR emitter (14) for dense depth sensing); 
a plurality of exterior-facing color stereo cameras disposed on the front face in a visual plane of both of a user’s eyes, wherein the plurality of exterior-facing color stereo cameras collects images (Fig. 2A, Par. [0073], Two RGB (i.e. color) cameras (28, 34 (i.e. plurality exterior-facing stereo cameras)) located in front of the eyes (22, 38) (i.e. visual plane of both eyes) capture the environment (i.e. collect images) that the user's eyes would see if they were not occluded by the HMD (7). The field of view (29, 30, 32, 33) of the cameras (28, 34) must closely match the field of view (23, 25, 36, 37) of the eyes (22, 38) (i.e. visual plane));
 a processor (Par. [0018] at least one processing unit (i.e. processor) operatively connected to the pair of RGB camera sensors, Par. [0072], the processing units (21) are part of the HMD (7)) to receive the collected images (Par. [0021] obtaining (i.e. receive) from the pair of RGB camera sensors pass-through stereo view images) to perform stereo view synthesis (Par. [0026] performing image processing on the pass-through stereo view images and the stereo images, Par. [0029] mixing processed images (i.e. view synthesis) and the processed rendered graphics resulting in the graphic content; Par. [0030] providing the graphic content to the display. Par. [0076], The compositing module (130) mixes the rendered graphics and the camera images, the resulting images being displayed on the display (27)); and
a memory storing instructions, which when executed by the processor (Par. [0020], at least one processing unit has an associated memory comprising instructions stored thereon, that when executed on the at least one processing unit perform the step), cause the processor to perform stereo view synthesis (Par. [0026] performing image processing on the pass-through stereo view images and the stereo images, Par. [0029] mixing processed images (i.e. synthesis) and the processed rendered graphics resulting in the graphic content; Par. [0030] providing the graphic content to the display, Par. [0076], The compositing module (130) mixes the rendered graphics and the camera images, the resulting images being displayed on the display (27)) by performing: 
depth estimation (Par. [0069] the HMD is designed to support not only passive computer vision analysis, but also active computer vision analysis. They include, but are not limited to, feature tracking, object recognition and depth estimation); 
sharpening (Par. [0076] the camera image processing module (126) performs some tasks such as trigger control, debayering, automatic white balance, defective pixel replacement, flat field correction, filtering (noise reduction, edge enhancement), distortion and aberration correction (i.e. sharpening)); 
disocclusion filtering to minimize a disocclusion region, wherein the disocclusion region is due to a viewpoint difference (Par. [0007], An occlusion mask (i.e. disocclusion filtering) can be extracted from the tracking information to avoid situations (i.e. minimize) where real objects may inadvertently be hidden (i.e. disocclusion region) by a virtual element that should be located further away or behind the object (i.e. view point difference)); and 
fusion (Par. [0007], An occlusion mask can be extracted from the tracking information to avoid situations where real objects may inadvertently be hidden by a virtual element that should be located further away or behind the object. Par. [0076], The compositing module (130) mixes (i.e. fusion) the rendered graphics and the camera images, the resulting images being displayed on the display (27), Par. [0083] To achieve real-time fusion of the stereo pass-through cameras (62, 64) and virtual image elements, the compositing is done on the HMD (7)).
Fortin-Deschenes does not specifically teach offsets in the visual plane. Therefore, Fortin-Deschenes fails to explicitly teach disocclusion filtering to minimize a disocclusion region, wherein the disocclusion region is due to a viewpoint difference caused by at least one lateral offset in the visual plane between one of the plurality of exterior-facing color stereo cameras and a corresponding one of the user’s eyes.
However, ICHIKAWA teaches disocclusion filtering to minimize a disocclusion region (Par. [0104], in step S108, filling processing is performed using a color compensation filter (i.e. disocclusion filtering) or the like in order to compensate (i.e. minimize) for the residual occlusion region (i.e. disocclusion region) remaining in the left-eye display image), wherein the disocclusion region is due to a viewpoint difference (Fig. 4, Par. [0062], the left camera, the right camera, the left display, and the right display are disposed so that an interval between the left camera 101L and the right camera 101R is wider (i.e. viewpoint difference) than an interval (interocular distance) between the left display 108L and the right display 108R. The interval between the left camera 101L and the right camera 101R in rear view and top view is, for example, 130 mm. Furthermore, the interval (interocular distance) between the left display 108L and the right display 108R is, for example, 74 mm.), caused by at least one lateral offset in the visual plane between one of the plurality exterior-facing color stereo camera and a corresponding one of the user’s eyes (Par. [0069], the shaded region of the rear object existing on the far side is not visible from the right camera viewpoint, but is visible from the right display viewpoint, that is, the right eye of the user (i.e. one of user’s eye). This region is an occlusion region by a front object (an occluding object) when an image captured by the right camera is displayed on the right display, Par. [0070],  FIG. 6  illustrating generation of an occlusion region (i.e. lateral offset) by the arrangement of the left camera 101L, the right camera 101R, the left display 108L, and the right display 108R illustrated in FIG. 4).
References Fortin-Deschenes and ICHIKAWA are considered to be analogous art because they teach passthrough configuration for cameras on HMDs. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying a filtering step to minimize a disocclusion region due to an offset when performing view synthesis as suggested by ICHIKAWA in the invention of Fortin-Deschenes in order to compensate for an occlusion region caused by an occluding object (See ICHIKAWA Par. [0047]). 

In addition, Fortin-Deschenes in view with ICHIKAWA does not specifically teach splatting. 
However, Rusanovskyy teaches splatting (Par. [0187] The view synthesis process may consist of two conceptual steps: forward warping and hole filling. In forward warping, each pixel of the reference image is mapped to a synthesized image. [0189] The current pixel is mapped to the target synthesis image according to the depth-to-disparity mapping/warping equation above. Pixels around depth boundaries may use splatting, in which one pixel is mapped to two neighboring locations).
References Fortin-Deschenes, ICHIKAWA and Rusanovskyy are considered to be analogous art because they teach occlusion of virtual objects in images. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying a splatting technique as suggested by Rusanovskyy in the inventions of Fortin-Deschenes and ICHIKAWA in order to warp artifacts, such as holes and/or occlusions and to suppress those artifacts (See Rusanovskyy Par. [0199]). 

Claims 7 - 10 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Fortin-Deschenes (US 2023/0324684 A1), in view of ICHIKAWA (US 2024/0340403 A1), in view of Rusanovskyy et al. (US 2013/0342644 A1), and in further view of Pugh (US 2021/0142497 A1).
Regarding Claim 7, Fortin-Deschenes in view of ICHIKAWA teaches Claim 1. Fortin-Deschenes further teaches wherein the processor (Par. [0018] at least one processing unit (i.e. processor) operatively connected to the pair of RGB camera sensors, Par. [0072], the processing units (21) are part of the HMD (7)) performs stereo view synthesis (Par.  [0026] performing image processing on the pass-through stereo view images and the stereo images, Par. [0029] mixing processed images (i.e. view synthesis) and the processed rendered graphics resulting in the graphic content; Par. [0030] providing the graphic content to the display. Par. [0076], The compositing module (130) mixes the rendered graphics and the camera images, the resulting images being displayed on the display (27)) by further performing, disocclusion filtering (Par. [0007], An occlusion mask (i.e. disocclusion filtering) can be extracted from the tracking information to avoid situations where real objects may inadvertently be hidden (i.e. disocclusion region) by a virtual element that should be located further away or behind the object (i.e. view point difference)); depth estimation (Par. [0069] the HMD is designed to support not only passive computer vision analysis, but also active computer vision analysis. They include, but are not limited to, feature tracking, object recognition and depth estimation); and image sharpening (Par. [0076] the camera image processing module (126) performs some tasks such as trigger control, debayering, automatic white balance, defective pixel replacement, flat field correction, filtering (noise reduction, edge enhancement), distortion and aberration correction (i.e. image sharpening)).
Fortin-Deschenes in view with ICHIKAWA does not specifically teach forward splatting. 
However Rusanovskyy teaches forward splatting (Par. [0187] The view synthesis process may consist of two conceptual steps: forward warping and hole filling. In forward warping, each pixel of the reference image is mapped to a synthesized image. [0189] The current pixel is mapped to the target synthesis image according to the depth-to-disparity mapping/warping equation above. Pixels around depth boundaries may use splatting, in which one pixel is mapped to two neighboring locations).
References Fortin-Deschenes, ICHIKAWA and Rusanovskyy are considered to be analogous art because they teach occlusion of virtual objects in images. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying a splatting technique as suggested by Rusanovskyy in the inventions of Fortin-Deschenes and ICHIKAWA in order to warp artifacts, such as holes and/or occlusions and to suppress those artifacts (See Rusanovskyy Par. [0199]). 

In addition, Fortin-Deschenes in combination with ICHIKAWA and Rusanovskyy do not specifically teach depth estimation, sharpening and splatting before disocclusion filtering. Therefore, Fortin-Deschenes in combination with ICHIKAWA and Rusanovskyy fails to explicitly teach performs stereo view synthesis by further performing, before the disocclusion filtering: depth estimation; image sharpening; and forward splatting.
However, Pugh teaches performs stereo view synthesis (Pa r. [0021],  FIG. 3 and FIG. 14, in examples, the method includes one or more of: obtaining an image, that includes one or more objects binocular stereo, Par. [0068] using novel view synthesis) by further performing, depth estimation (Par. [0045], estimating depths of pixels and depth edges included in the image (S240)), image sharpening (Par. [0045], identifying object boundaries and object classes in the image by performing edge, contour, and segmentation estimation (S220)), and forward splatting (Par. [0070], S300 may use one or multiple pixel motion models including: homography warps, affine warps, rotational warps, translational warps, optical flow fields, depth-layered warps (i.e. forward splatting), novel-view synthesis, or any other suitable coarse-alignment technique) before the disocclusion filtering (Fig.1A, Par. [0019] computing foreground occlusion masks and depths for the scene imagery S500, rendering scenes interactively with occlusion masks S600).
References Fortin-Deschenes, ICHIKAWA, Rusanovskyy and Pugh are considered to be analogous art because they teach occlusion of virtual objects in images. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying performing steps before disocclusion filtering as suggested by Pugh in the inventions of Fortin-Deschenes, ICHIKAWA and Rusanovskyy in order perform dynamic occlusion (controllable obscuring of virtual objects by existing physical objects) and disocclusion (removal of existing foreground objects) using computer vision techniques and a standard 3D graphics engine by developing custom shaders and transforming the visual information to a format compatible with the graphics engine (See Pugh Par. [0029]). 

Regarding Claim 8, Fortin-Deschenes in combination with ICHIKAWA, Rusanovskyy and Pugh teaches claim 7. Pugh further teaches the depth estimation is based on a depth map (Par. [0021] determining a depth map (e.g., depth estimates for a set of image pixels; etc.) for the image (e.g., by using neural networks based on the image, the photogrammetry point cloud, hardware depth sensors, and/or any other suitable information)) calculated at each stereo input view (Par. [0054] S200 includes determining a depth map (sparse depth map) based on the set of images. This can include: computing disparity across images of the set (e.g., based on camera pose estimates), and estimating semi-dense depth from the disparity (e.g., using binocular stereo camera methods)) by deep-learning-based (Par. [0048], Two-dimensional features and/or correspondences can be extracted using one or more: feature detectors (e.g., edge detectors, keypoint detectors, line detectors, convolutional feature detectors, etc.), feature matchers (e.g., descriptor search, template matching, optical flow, direct methods, etc.), neural networks (e.g., convolutional neural networks (CNN), deep neural networks (DNN), recurrent neural networks, generative neural networks, etc.), object detection (e.g., semantic segmentation, region-based segmentation, edge detection segmentation, cluster-based segmentation, etc.), and any other suitable method for extracting and matching features) disparity estimation using a neural network (Par. [0075],  neural network based contour detection algorithms using disparity maps and/or depth maps).
References Fortin-Deschenes, ICHIKAWA, Rusanovskyy and Pugh are considered to be analogous art because they teach stereo cameras. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying disparity estimation by deep learning using neural networks as suggested by Pugh in the inventions of Fortin-Deschenes, ICHIKAWA and Rusanovskyy in order to  identify regions likely to have sudden change in depth (i.e., depth discontinuity), optionally refining the maps/depth edges using RGB image information (See Pugh, Par. [0075]).  

Regarding Claim 9, Fortin-Deschenes in combination with ICHIKAWA, Rusanovskyy and Pugh teaches Claim 8. Fortin-Deschenes teaches wherein the depth estimation uses input color pairs to be rectified at each frame in order to reduce the estimation from a 2D correspondence matching to a more efficient 1D matching solution (Par. [0089], [0089] Standard stereo depth map methods find for each pixel in the first image the best pixel match in the second image. Neighborhoods around pixels can also be considered instead of only single pixels. A match usually involves finding the lowest pixel intensity difference (or sum of differences when a neighborhood is used). As a preprocessing step, the images are rectified so that the search space for a match is a single horizontal line (i.e. 1D matching solution). Calculating a depth map using stereo vision. The colors of the pass-through view can be mapped onto the depth map). Pugh further teaches deep-learning-based (Par. [0048], Two-dimensional features and/or correspondences can be extracted using one or more: feature detectors (e.g., edge detectors, keypoint detectors, line detectors, convolutional feature detectors, etc.), feature matchers (e.g., descriptor search, template matching, optical flow, direct methods, etc.), neural networks (e.g., convolutional neural networks (CNN), deep neural networks (DNN), recurrent neural networks, generative neural networks, etc.), object detection (e.g., semantic segmentation, region-based segmentation, edge detection segmentation, cluster-based segmentation, etc.), and any other suitable method for extracting and matching features) disparity estimation (Par. [0075],  neural network based contour detection algorithms using disparity maps and/or depth maps).
References Fortin-Deschenes, ICHIKAWA, Rusanovskyy and Pugh are considered to be analogous art because they teach stereo cameras. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying disparity estimation by deep learning using neural networks as suggested by Pugh in the inventions of Fortin-Deschenes, ICHIKAWA and Rusanovskyy in order to  identify regions likely to have sudden change in depth (i.e., depth discontinuity), optionally refining the maps/depth edges using RGB image information (See Pugh, Par. [0075]).  

Regarding Claim 10, Fortin-Deschenes in combination with ICHIKAWA, Rusanovskyy and Pugh teaches claim 7.  Fortin-Deschenes the processor performs stereo view synthesis by further performing, after the disocclusion filtering (Par. [0007], An occlusion mask can be extracted from the tracking information to avoid situations where real objects may inadvertently be hidden by a virtual element that should be located further away or behind the object, before fusion/compositing 130): fusion (Par. [0076], The compositing module (130) mixes (i.e. fusion) the rendered graphics and the camera images, the resulting images being displayed on the display (27), Par. [0083] To achieve real-time fusion of the stereo pass-through cameras (62, 64) and virtual image elements, the compositing is done on the HMD (7)).

Regarding Claim 16, Fortin-Deschenes in combination with ICHIKAWA and Rusanovskyy teaches Claim 15. Fortin-Deschenes further teaches wherein the disocclusion filtering comprises at least one of: removal of one or more partial disocclusion hole regions that occur in only one of a right image and a left image, or removal of one or more full disocclusion hole regions that occur in both of the right image and the left image (Par. [0087], Occlusion masks can be found by comparing the calculated depth of each pixel (i.e. full disocclusion hole regions) with that of the virtual object(s). The camera images are blended (i.e. holes are removed) per pixel channel. The alpha mask A needs to be different in each color channel [R,G,B], because each channel is remapped to correct color aberration of the eyepieces); wherein the left and right images (Fig. 9, Par. [0076], Stereo images (106) (i.e. left and right images) for tracking can also be captured by the RGB stereo camera (104)) are to be fed (Par. [0082], A classification algorithm (408) such as support vector machines, is utilized to teach a model to track and recognize those features) for final reconstruction of a target eye view for one of the left eye and the right eye (Par. [0087],  FIG. 10 shows a flow diagram of an exemplary process to achieve graphics rendering and compositing (130). Par. [0076] The compositing module (130) mixes the rendered graphics and the camera images, the resulting images (i.e. final reconstruction) being displayed (i.e. target eye view) on the display (27), where the user (1) wearing the HMD (7) looks at a display (27) through wide angle eyepieces (26, 35)(i.e. left and right eye)). ICHIKAWA also teaches wherein the disocclusion filtering comprises at least one of: removal of one or more partial disocclusion hole regions that occur in only one of a right image and a left image (Par. [0104], filling processing is performed using a color compensation filter or the like in order to compensate for the residual occlusion region (i.e. removal of partial disocclusion hole regions) remaining in the left-eye display image without being compensated in the process of step S106. Then, the left-eye display image subjected to the filling processing in step S108 is finally output as a left-eye display image to be displayed on the left display 108L); or removal of one or more full disocclusion hole regions that occur in both of the right image and the left image, wherein the left and right images are to be fed (Fig. 9, left camera image, right camera image, Par. [0047],  The information processing apparatus 200 uses a color image captured by the color camera 101 and a depth image generated from depth information obtained by the distance measurement sensor 102 as inputs, and generates a left-eye display image and a right-eye display image in which an occlusion region caused by an occluding object is compensated) for final reconstruction of a target eye view for one of the left eye and the right eye (Par. [0047],  The left-eye display image and the right-eye display image are supplied from the information processing apparatus 200 to the synthesis unit 107 (i.e. final reconstruction) and finally output, the left-eye display image is displayed on the left display 108L (i.e. target eye view), and the right-eye display image is displayed on the right display 108R (i.e. target eye view)).
Fortin-Deschenes in combination with ICHIKAWA and Rusanovskyy do not specifically teach neural networking. Therefore, Fortin-Deschenes in combination with ICHIKAWA and Rusanovskyy fails to explicitly teach wherein the images are to be fed to a neural network for final reconstruction of a target eye view.
However, Pugh teaches wherein the left and right images are to be fed (Fig. 1A, Par. [0019], obtaining a set of images S100, binocular stereo (i.e. left and right images) Par. [0021]) to a neural network for final reconstruction of a target eye view (Fig. 1A, Par. [0019], modifying at least one object in the rendered scene S700 (i.e. final reconstruction of target view), Fig. 11, Par. [0170]-[0171] removing pixels of a real object from a rendered scene includes changes to occlusion behavior such as disabling occlusion for removed pixels, removing depth information for pixels of the object from the 3D depth map of the scene, and/or replacing the depths of the removed pixels with new depth values).
References Fortin-Deschenes, ICHIKAWA, Rusanovskyy and Pugh are considered to be analogous art because they teach occlusion of virtual objects in images of real objects. Therefore, it would have been obvious that one of ordinary skill in the art, before the effective filing date of the claimed invention, would recognize the advantage of further specifying using neural network for reconstruction as suggested by Pugh in the inventions of Fortin-Deschenes, ICHIKAWA and Rusanovskyy in order to determine a depth map for the image by using neural networks (See Pugh Par. [0021]) and to enable dynamic occlusion (controllable obscuring of virtual objects by existing physical objects) and disocclusion (removal of existing foreground objects) using computer vision techniques (See Pugh Par. [0029]). 

Allowable Subject Matter
Claims 4 – 6, 11 – 14 and 17 - 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter: Claim 4 specifically defines the removal of the one or more partial disocclusion hole regions may be performed by blending represented by an expression that is not readily taught or suggested by the prior art uncovered during search or made of record. Claims 11 and 17 specifically define a size 𝛽 of the disocclusion region is represented by an expression that is not readily taught or suggested by the prior art uncovered during search or made of record. Claims 5, 6, 12 – 14, 18 and 19 are allowed for the reasons above by virtue of their respective dependencies.

Conclusion
The prior art references made of record are not relied upon but are considered pertinent to applicant's disclosure. IGNATOV et al. (US 2011/0304708 A1) teaches generating stereo-view and multi-view images for rendering perception of depth of stereoscopic image. KOEPPEL et al. (US 2013/0127844 A1) teaches filling disocclusions in a virtual view. Alregib et al. (US 9,094,660 B2) teaches hierarchical hole-filling for depth-based view synthesis in FTV and 3D video. Xiao et al. (US 11,367,165 B2) teaches neural super-sampling for real-time rendering. 
Any inquiry concerning this communication should be directed to SUSAN E HODGES whose telephone number is (571)270-0498.  The Examiner can normally be reached on Monday - Friday from 8:00 am (EST) to 4:00 pm (EST).  
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner's supervisor, Brian T. Pendleton, can be reached on (571) . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Susan E. Hodges/Primary Examiner, Art Unit 2425
Read full office action
Prosecution Timeline

Dec 10, 2024
Application Filed
Jan 09, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/862,647
Patent 12603982
STEREOSCOPIC HIGH DYNAMIC RANGE VIDEO
2y 5m to grant Granted Apr 14, 2026
18/923,643
Patent 12604008
ADAPTIVE CLIPPING IN MODELS PARAMETERS DERIVATIONS METHODS FOR VIDEO COMPRESSION
2y 5m to grant Granted Apr 14, 2026
18/716,344
Patent 12574558
Method and Apparatus for Sign Coding of Transform Coefficients in Video Coding System
2y 5m to grant Granted Mar 10, 2026
18/208,163
Patent 12568212
ADAPTIVE LOOP FILTERING ON OUTPUT(S) FROM OFFLINE FIXED FILTERING
2y 5m to grant Granted Mar 03, 2026
18/906,582
Patent 12556671
THREE DIMENSIONAL STROBO-STEREOSCOPIC IMAGING SYSTEMS AND ASSOCIATED METHODS
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
67%
Grant Probability
81%
With Interview (+14.4%)
2y 4m
Median Time to Grant
Low
PTA Risk
Based on 375 resolved cases by this examiner. Grant probability derived from career allow rate.