Last updated: May 29, 2026
Application No. 18/787,523
ADAPTIVE FOVEATION PROCESSING AND RENDERING IN VIDEO SEE-THROUGH (VST) EXTENDED REALITY (XR)

Non-Final OA §103
Filed
Jul 29, 2024
Priority
Mar 20, 2024 — provisional 63/567,801
Examiner
LE, MICHAEL
Art Unit
2614
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +22.1% interview lift. Examiner has a relatively high allowance rate (66%); +22.1% interview lift. A written response may suffice.
Based on 873 resolved cases, 2023–2026
Examiner Intelligence

LE, MICHAEL View full profile →
Grants 66% — above average
Career Allowance Rate
575 granted / 873 resolved
+3.9% vs TC avg
Strong +22% interview lift
Without
With
+22.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
41 currently pending
Career history
932
Total Applications
across all art units
Statute-Specific Performance

§101
1.5%
-38.5% vs TC avg
§103
87.4%
+47.4% vs TC avg
§102
5.9%
-34.1% vs TC avg
§112
1.8%
-38.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 873 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

Information Disclosure Statement
2.	The information disclosure statements (IDS) submitted on the following dates are in compliance with the provisions of 37 CFR 1.97 and are being considered by the Examiner: 07/29/2024; 04/17/2025.

Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

4.	Claims 1-2, 8-9 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Tokutake, (“Tokutake”) [US-2025/0371663-A1] in view of Wu et al., (“Wu”) [US-2022/0321858-A1]
Regarding claim 1, Tokutake discloses a method (Tokutake- ¶0001, at least discloses an information processing device, an information processing method, and a program, and more particularly to an information processing device and the others suitable for use in obtaining real space images used in, for example, video see-through AR devices, MR devices, and the like) comprising:
obtaining, using at least one processing device of a video see-through (VST) extended reality (XR) device (Tokutake- Fig. 3 and ¶0035, at least disclose a display system 10A for a real space image, which is installed in a head-mounted display and is used in a video see-through type of AR device, MR device, or the like […] This display system 10A includes a wide-angle camera 101, a normal camera 102, a GPU 103, and a display 104; Fig. 9 and ¶0093-0095, at least disclose a configuration example (functional configuration example) of an information processing device 200 corresponding to the display system 10E illustrated in (a) of FIG. 7 . The information processing device 200 includes a wide-angle image capturing unit 201, a normal image capturing unit 202, a telephoto image capturing unit 203, a close-up image capturing unit 204, an image synthesis unit 205, a display unit 206, a subject distance measurement unit 207, an image capturing unit switching control unit 208, a gaze detection unit 209, and an imaging direction/image synthesis control unit 210), images of a scene captured using one or more imaging sensors of the VST XR device (Tokutake- ¶0011, at least discloses an image of a high-resolution region based on a first image captured at a first angle of view and an image of a peripheral region around the high-resolution region based on a second image captured at a second angle of view narrower than the first angle of view within the first angle of view are synthesized to generate a display image, which makes it possible to achieve both a high-resolution real-space image used in, for example, a video see-through type of AR device or MR device, and reduced system load;  Fig. 3B and ¶0036-0041, at least disclose the wide-angle camera 101 [one or more imaging sensors] captures an image at an angle of view θ1 [images of a scene] and outputs the image with a resolution of 1K (1080×1080), and the normal camera 102 [one or more imaging sensors] captures an image at an angle of view θ2 [images of a scene] corresponding to ¼ of the imaging range of the wide-angle camera 101 […] The GPU 103 synthesizes an image of the focal region (high-resolution region) based on the image captured by the normal camera 102 and an image of the peripheral region (low-resolution region) around the focal region based on the image captured by the wide-angle camera 101, that is, performs foveated rendering, to generate a 4K resolution display image);
identifying, using the at least one processing device (As discussed above), a region of the scene on which a user of the VST XR device is focused (Tokutake- Fig. 4 and ¶0044-0047, at least disclose a display system 10B for a real space image, which is installed in a head-mounted display and is used in a video see-through type of AR device, MR device, or the like […] In (a) of FIG. 4 , the same reference numerals are used for the portions corresponding to those in (a) of FIG. 3 , and detailed descriptions thereof will be omitted as appropriate. In FIG. 4 , (b) is the same diagram as (b) of FIG. 3 […] the eye tracking system 105 analyzes in real time a face image of a user (person) captured by, for example, an infrared camera to acquire gaze information. On the basis of the gaze information, the eye tracking system 105 then controls the movement of the imaging direction of the normal camera 102 so that the imaging direction matches the gaze. On the basis of the gaze information, the eye tracking system 105 also controls the movement of the focal region (high-resolution region) so that the focal region matches the gaze [identifying a region of the scene on which a user is focused]);
rendering, using the at least one processing device (As discussed above), final views of the scene foveated rendering is performed in which a focal region (high-resolution region) including a point of gaze (viewpoint) of a user and a peripheral region (low-resolution region) around the focal region are set and the display image is rendered. The display image generated by this foveated rendering is displayed on a display; Fig. 3 and ¶0035, at least disclose a display system 10A for a real space image, which is installed in a head-mounted display and is used in a video see-through type of AR device, MR device, or the like […] This display system 10A includes a wide-angle camera 101, a normal camera 102, a GPU 103, and a display 104).
Tokutake does not explicitly disclose generating, using the at least one processing device, a mask for each image based on the region of the scene on which the user is focused, wherein different ones of the masks are associated with at least one of (i) different resolutions or (ii) different shapes; mapping, using the at least one processing device, at least some image data of each image onto a mesh based on the mask associated with that image; and rendering, using the at least one processing device, final views of the scene using the mapped image data of the images.
However, Wu discloses
generating, using the at least one processing device (Wu- Fig. 4 and ¶0086, at least disclose hardware 400 of FIG. 4, such hardware can include hardware processor 402, memory and/or storage 404, an input device controller 406, an input device 408, display/audio drivers 410, display and audio output circuitry 412, communication interface(s) 414, an antenna 416, and a bus 418), a mask for each image based on the region of the scene on which the user is focused, wherein different ones of the masks are associated with at least one of (i) different resolutions or (ii) different shapes (Wu- ¶0007, at least discloses generating, using the hardware processor, a foveated mesh in accordance with a foveation ratio parameter on which frames of the video content item are to be projected [mask for each image based on the region of the scene on which the user is focused], wherein the foveated mesh has a non-uniform position map that increases pixel density in a central portion of each frame of the video content item in comparison with peripheral portions of each frame of the video content item; ¶0012-0013, at least discloses generating a first foveated mesh based on a first foveation ratio parameter and generating a second foveated mesh based on a second foveation ratio parameter […] the foveated mesh is generated in response to determining that the received video content item is downsampled by a downsampling ratio from a first pixel resolution to a second pixel resolution and an improvement ratio at the central portion of each frame of the video content item corresponds with the downsampling ratio; ¶0028, at least discloses the mechanisms can generate a first foveated mesh according to a first foveation strategy (e.g., a function based on a first foveation ratio parameter) and a second foveated mesh according to a second foveation strategy (e.g., a function based on a second foveation ratio parameter); ¶0044, at least discloses process 100 can determine that a foveated mesh is to be generated for stereo transcodes of static videos that are associated with cropped equirectangular input meshes; ¶0053, at least discloses in response to determining that the received video content item is associated with a cropped equirectangular input mesh, process 100 can proceed to determine whether the received video content item is a candidate for generating a foveated mesh; ¶0055, at least discloses the improvement ratio at the central portion of each frame of the video content item can be limited by the downsampling ratio. For example, when downsampling a video content item from a 4k pixel resolution (e.g., 3849×2160 pixels or 4096×2160 pixels) to a 1080p resolution (e.g., 1920×1080 pixels) [different resolutions], the highest improvement ratio at the central portion of the video is about 4; ¶0064, at least discloses multiple foveated meshes can be generated. For example, a first foveated mesh can be generated according to a first foveation strategy (e.g., a polynomial fitting function based on a first foveation ratio parameter) and a second foveated mesh can be generated according to a second foveation strategy (e.g., a polynomial fitting function based on a second foveation ratio parameter; ¶0067, at least discloses the foveated mesh can be used to render the immersive video content such that an improvement in the overall perceived video resolution or quality of the immersive video content is achieved, where pixel density or resolution in a central region of interest in each frame of the immersive video content given a static fixation point is greater than the pixel density or resolution in peripheral regions in each frame of the immersive video content [different resolutions]; Claim 1, at least cites “generating, using a hardware processor, a foveated mesh in accordance with a foveation ratio parameter on which frames of a video content item are to be projected, wherein the foveated mesh has a non-uniform position map that increases pixel density in a central portion of each frame of the video content item in comparison with peripheral portions of each frame of the video content item”);
mapping, using the at least one processing device (As discussed above), at least some image data of each image onto a mesh based on the mask associated with that image (Wu- ¶0010-0011, at least disclose determining whether the received video content item is a static video or a dynamic video, the static video includes video content in which a camera did not substantially move when the video content was captured, the dynamic video includes a camera motion metadata track, and the method further comprises generating the foveated mesh in response to determining that the received video content item is the static video […] the at least one criterion includes determining whether the received video content item is associated with a particular type of input mesh and the method further comprises generating the foveated mesh in response to determining that the received video content item is associated with a cropped equirectangular mesh; ¶0053, at least discloses in response to determining that the received video content item is associated with a cropped equirectangular input mesh, process 100 can proceed to determine whether the received video content item is a candidate for generating a foveated mesh); and
rendering, using the at least one processing device (As discussed above), final views of the scene using the mapped image data of the images (Wu- ¶0015-0016, at least discloses store the video content item in a file format that includes the generated foveated mesh, wherein the immersive video content is rendered [rendering final views of the scene] by applying the video content item as a texture to the generated foveated mesh […] generating a foveated mesh in accordance with a foveation ratio parameter on which frames of the video content item are to be projected, wherein the foveated mesh has a non-uniform position map that increases pixel density in a central portion of each frame of the video content item in comparison with peripheral portions of each frame of the video content item; and storing the video content item in a file format that includes the generated foveated mesh, wherein the immersive video content is rendered by applying the video content item as a texture to the generated foveated mesh).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Tokutake to incorporate the teachings of Wu, and apply the foveated mesh on which frames of the video content item are to be projected into Tokutake’s teachings for generating a mask for each image based on the region of the scene on which the user is focused, wherein different ones of the masks are associated with at least one of (i) different resolutions or (ii) different shapes; mapping at least some image data of each image onto a mesh based on the mask associated with that image; and rendering final views of the scene using the mapped image data of the images.
Doing so would provide a viewer with an immersive experience.

Regarding claim 2, Tokutake in view of Wu, discloses the method of Claim 1, and further discloses wherein:
the mask associated with each image identifies the region of the scene on which the user is focused (Tokutake- Fig. 4 and ¶0046, at least disclose the eye tracking system 105 analyzes in real time a face image of a user (person) captured by, for example, an infrared camera to acquire gaze information. On the basis of the gaze information, the eye tracking system 105 then controls the movement of the imaging direction of the normal camera 102 so that the imaging direction matches the gaze. On the basis of the gaze information, the eye tracking system 105 also controls the movement of the focal region (high-resolution region) so that the focal region matches the gaze [each image identifies the region of the scene on which the user is focused]); and
a portion of the mask associated with the region of the scene on which the user is focused has a higher resolution than one or more other portions of the mask (Tokutake- Fig. 4 and ¶0046, at least disclose On the basis of the gaze information, the eye tracking system 105 also controls the movement of the focal region (high-resolution region) so that the focal region matches the gaze [a higher resolution];   Wu- ¶0010-0011, at least disclose determining whether the received video content item is a static video or a dynamic video, the static video includes video content in which a camera did not substantially move when the video content was captured, the dynamic video includes a camera motion metadata track, and the method further comprises generating the foveated mesh in response to determining that the received video content item is the static video […] the at least one criterion includes determining whether the received video content item is associated with a particular type of input mesh and the method further comprises generating the foveated mesh in response to determining that the received video content item is associated with a cropped equirectangular mesh [a portion of the mask associated with the region of the scene]; ¶0053, at least discloses in response to determining that the received video content item is associated with a cropped equirectangular input mesh, process 100 can proceed to determine whether the received video content item is a candidate for generating a foveated mesh).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Tokutake to incorporate the teachings of Wu, and apply controlling the movement of the focal region (high-resolution region) into Tokutake’s teachings in order the mask associated with each image identifies the region of the scene on which the user is focused; and a portion of the mask associated with the region of the scene on which the user is focused has a higher resolution than one or more other portions of the mask.
The same motivation that was utilized in the rejection of claim 1 applies equally to this claim.

The system of claims 8-9 are similar in scope to the functions performed by the method of claims 1-2 and therefore claims 8-9 are rejected under the same rationale.

Regarding claim 8, Tokutake in view of Wu, discloses a video see-through (VST) extended reality (XR) device (Tokutake- Fig. 3A and ¶0035, at least disclose a configuration example of a display system 10A for a real space image, which is installed in a head-mounted display and is used in a video see-through type of AR device, MR device, or the like) comprising:
at least one display (Tokutake- Fig. 3A and ¶0035, at least disclose a display 104);
one or more imaging sensors (Tokutake- Fig. 3A-3B and ¶0035-0036, at least disclose display system 10A includes a wide-angle camera 101, a normal camera 102, a GPU 103, and a display 104 […] The wide-angle camera 101 constitutes a wide-angle image capturing unit, and has a wide angle of view and can capture a wide range image. The normal camera 102 constitutes a normal image capturing unit, and has a narrow angle of view but can capture a narrow range image at high resolution;  ¶0045-0046, at least disclose display system 10B includes a wide-angle camera 101, a normal camera 102, a GPU 103, a display 104, and an eye tracking system 105 […] The eye tracking system 105 analyzes in real time a face image of a user (person) captured by, for example, an infrared camera to acquire gaze information); and
at least one processing device (Tokutake- Fig. 3 and ¶0035, at least disclose display system 10A includes a wide-angle camera 101, a normal camera 102, a GPU 103, and a display 104; Fig. 9 and ¶0093-0095, at least disclose a configuration example (functional configuration example) of an information processing device 200 corresponding to the display system 10E illustrated in (a) of FIG. 7 . The information processing device 200 includes a wide-angle image capturing unit 201, a normal image capturing unit 202, a telephoto image capturing unit 203, a close-up image capturing unit 204, an image synthesis unit 205, a display unit 206, a subject distance measurement unit 207, an image capturing unit switching control unit 208, a gaze detection unit 209, and an imaging direction/image synthesis control unit 210; ¶0115, at least discloses the information processing device 200 illustrated in FIG. 9 corresponds to the display system 10E illustrated in (a) of FIG. 7 . Although detailed description will be omitted, it goes without saying that information processing devices corresponding to the display systems 10A to 10D illustrated in FIGS. 3 to 6 can be configured in a similar manner) configured to:
obtain images of a scene captured using the one or more imaging sensors (see Claim 1 rejection for detailed analysis);
identify a region of the scene on which a user of the VST XR device is focused (see Claim 1 rejection for detailed analysis);
generate a mask for each image based on the region of the scene on which the user is focused (see Claim 1 rejection for detailed analysis), wherein different ones of the masks are associated with at least one of (i) different resolutions or (ii) different shapes (see Claim 1 rejection for detailed analysis);
map at least some image data of each image onto a mesh based on the mask associated with that image (see Claim 1 rejection for detailed analysis); and
render final views of the scene using the mapped image data of the images for presentation on the at least one display(see Claim 1 rejection for detailed analysis).

Regarding claims 15-16, all claim limitations are set forth as claims 1-2 in a non-transitory machine readable medium containing instructions that and rejected as per discussion for claims 1-2.

Regarding claim 15, Tokutake in view of Wu, discloses a  non-transitory machine readable medium containing instructions that when executed cause at least one processor of a video see-through (VST) extended reality (XR) device (Tokutake- Fig. 3A and ¶0035, at least disclose a configuration example of a display system 10A for a real space image, which is installed in a head-mounted display and is used in a video see-through type of AR device, MR device, or the like; ¶0090, at least discloses “GPU Driver” refers to a GPU control driver. “GPU memory” refers to a memory used by “Rendering middleware” to perform synthesis and rendering processing to generate a display image) to:
obtain images of a scene captured using one or more imaging sensors of the VST XR device (see Claim 1 rejection for detailed analysis);
identify a region of the scene on which a user of the VST XR device is focused (see Claim 1 rejection for detailed analysis);
generate a mask for each image based on the region of the scene on which the user is focused (see Claim 1 rejection for detailed analysis), wherein different ones of the masks are associated with at least one of (i) different resolutions or (ii) different shapes (see Claim 1 rejection for detailed analysis);
map at least some image data of each image onto a mesh based on the mask associated with that image (see Claim 1 rejection for detailed analysis); and
render final views of the scene using the mapped image data of the images for presentation on at least one display of the VST XR device (see Claim 1 rejection for detailed analysis).




5.	Claims 3, 10 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Tokutake in view of Wu, further in view of Bastani et al., (“Bastani”) [US-2023/0245261-A1]
Regarding claim 3, Tokutake in view of Wu, discloses the method of Claim 1, and further discloses wherein generating the mask for each image comprises generating, for each image, a mask defining a region in the scene (Wu- ¶0007, at least discloses generating, using the hardware processor, a foveated mesh in accordance with a foveation ratio parameter on which frames of the video content item are to be projected [generating the mask for each image comprises generating, for each image, a mask defining a region], wherein the foveated mesh has a non-uniform position map that increases pixel density in a central portion of each frame of the video content item in comparison with peripheral portions of each frame of the video content item).
The prior art does not explicitly disclose, but Bastani discloses
generating, for each image, a mask defining a region with a first shape or a second shape depending on whether the user is focusing on a closer or farther object in the scene (Bastani- Fig. 1 and  ¶0040, at least disclose tile generator 122 defines a size, shape, position, and corresponding resolution of imagery for each of tiles 128 […] any of the size, position, and corresponding resolution of imagery for each of tiles 128 is determined by tile generator 122 based on gaze vector 136 and/or gaze error 126;  ¶0109, at least discloses the fall-off of the fovea region is a difference in assigned image quality across different sets of tiles (e.g., the subsets of tiles defined in operation 1216); Claim 3, at least cites “establishing the first tile and the plurality of tiles comprises defining or adjusting a size of one or more of the first tile and the plurality of tiles, defining or adjusting a fall-off of a fovea region of the first tile and the plurality of tiles, or defining or adjusting a radius of the fovea region, using the gaze location on the display and the distance between the gaze location and the edge of the display.”). 
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Tokutake/Wu to incorporate the teachings of Bastani, and apply adjusting a radius of the fovea region using distance between the gaze location and the edge of the display into Tokutake/Wu’s teachings for generating the mask for each image comprises generating, for each image, a mask defining a region with a first shape or a second shape depending on whether the user is focusing on a closer or farther object in the scene.
Doing so would provide eye tracking to track the user's eye and accordingly present images.

The system of claims 10 is similar in scope to the functions performed by the method of claim 3 and therefore claims 10 is rejected under the same rationale.

Regarding claims 17, all claim limitations are set forth as claim 3 in a non-transitory machine readable medium containing instructions that and rejected as per discussion for claim 3.


4.	Claims 4-5, 11-12 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Tokutake in view of Wu, further in view of Shen, (“Shen”) [US-2023/0316640-A1]
Regarding claim 4, Tokutake in view of Wu, discloses the method of Claim 1, and disclose the method further comprising:
generating a depth hierarchy associated with depths within the scene for each image, wherein the depth hierarchy defines depths larger than a specified focal distance as background depths and depths smaller than the specified focal distance as foreground depths (Shen- Figs. 10A-10B show a depth image of a foreground object; ¶0005, at least discloses acquire a foreground object image, the foreground object image being an image viewing a foreground object from a virtual viewpoint and including no background; acquire a background image rendered using computer graphics, the background image being an image viewing a CG space from the virtual viewpoint and including background; Fig. 3E and ¶0038-0039, at least disclose As a result of rendering by the virtual viewpoint object rendering unit 104, depth information indicating a distance from the virtual viewpoint to the foreground object is also obtained. The depth information expressed as an image is referred to as a depth image […] The depth image of FIG. 3E is a depth image indicating a distance from the virtual viewpoint […] In the depth image, a pixel value of an area with no foreground is 0. In FIG. 3E, an area with a pixel value of 0 is shown in black. In FIG. 3E, a gray area indicates an area with a depth value other than 0, where the depth value increases as the gray becomes darker. With the increase in the depth value, a position of the object indicated by the corresponding pixel becomes away from the camera (virtual viewpoint). In this manner, in the course of generating an image of the foreground object viewed from the virtual viewpoint, the depth information (depth image) on the foreground object is obtained as two-dimensional intermediate information; ¶0041, at least discloses As a result of rendering, a texture image of the CG space viewed from the virtual viewpoint and depth information (depth image) indicating a distance from the virtual viewpoint to each background object of the CG space are obtained; Claim1, at least cites “acquire a foreground object image, the foreground object image being an image viewing a foreground object from a virtual viewpoint and including no background; acquire a background image rendered using computer graphics, the background image being an image viewing a CG space from the virtual viewpoint and including background;”); and
densifying the foreground depths in each depth hierarchy (Shen- ¶0038-0039, at least disclose As a result of rendering by the virtual viewpoint object rendering unit 104, depth information indicating a distance from the virtual viewpoint to the foreground object is also obtained. The depth information expressed as an image is referred to as a depth image […] The depth image of FIG. 3E is a depth image indicating a distance from the virtual viewpoint […] In the depth image, a pixel value of an area with no foreground is 0. In FIG. 3E, an area with a pixel value of 0 is shown in black. In FIG. 3E, a gray area indicates an area with a depth value other than 0, where the depth value increases as the gray becomes darker. With the increase in the depth value, a position of the object indicated by the corresponding pixel becomes away from the camera (virtual viewpoint). In this manner, in the course of generating an image of the foreground object viewed from the virtual viewpoint, the depth information (depth image) on the foreground object is obtained as two-dimensional intermediate information).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Tokutake/Wu to incorporate the teachings of Shen, and apply the depth image of foreground object image and background image into Tokutake/Wu’s teachings for generating a depth hierarchy associated with depths within the scene for each image, wherein the depth hierarchy defines depths larger than a specified focal distance as background depths and depths smaller than the specified focal distance as foreground depths; and densifying the foreground depths in each depth hierarchy.
Doing so would cause an decrease in the amount of computation in shadow generation.

Regarding claim 5, Tokutake in view of Wu, discloses the method of Claim 1, and disclose the method further comprising:
separating image data of at least some of the images into foreground image data and background image data (Shen- ¶0005, at least discloses generate, based on two-dimensional information on a shape of the foreground object and information on a light in the CG space, a shadow image indicating a shadow of the foreground object corresponding to the CG space; and generate a combined image by combining the foreground object image, the background image, and the shadow image into a single image); and
performing object reconstruction for each of the at least some of the images, the object reconstruction comprising reconstructing an object associated with the foreground image data in the region of the scene on which the user is focused (Shen- ¶0034, at least discloses the three-dimensional shape estimation unit 103 may use the visual hull to estimate the three-dimensional shape of the foreground object. In the visual hull, foreground areas in silhouette images corresponding to the respective image capturing apparatuses constituting the image capturing apparatus 111 are back-projected to the three-dimensional space.);
wherein rendering the final views of the scene comprises rendering at least some of the final views of the scene using the reconstructed object (Shen- ¶0034, at least discloses the three-dimensional shape estimation unit 103 may use the visual hull to estimate the three-dimensional shape of the foreground object. In the visual hull, foreground areas in silhouette images corresponding to the respective image capturing apparatuses constituting the image capturing apparatus 111 are back-projected to the three-dimensional space. By calculating a portion of intersection of visual volumes derived from the respective foreground areas, the three-dimensional shape of the foreground object is obtained; ¶0036, at least discloses The virtual viewpoint object rendering unit 104 renders the three-dimensional model of the foreground object to obtain an image of the foreground object viewed from the virtual viewpoint set by the virtual viewpoint generation unit 102. As a result of rendering by the virtual viewpoint object rendering unit 104, a texture image of the foreground object viewed from the virtual viewpoint is obtained).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Tokutake/Wu to incorporate the teachings of Shen, and apply the foreground areas are back-projected to the three-dimensional space into Tokutake/Wu’s teachings for separating image data of at least some of the images into foreground image data and background image data; and performing object reconstruction for each of the at least some of the images, the object reconstruction comprising reconstructing an object associated with the foreground image data in the region of the scene on which the user is focused; wherein rendering the final views of the scene comprises rendering at least some of the final views of the scene using the reconstructed object.
Doing so would cause an decrease in the amount of computation in shadow generation.

The system of claims 11-12 are similar in scope to the functions performed by the method of claims 4-5 and therefore claims 11-12 are rejected under the same rationale.

Regarding claims 18-19, all claim limitations are set forth as claims 4-5 in a non-transitory machine readable medium containing instructions that and rejected as per discussion for claims 4-5.


Allowable Subject Matter
8.	Claims 6-7, 13-14 and 20 are objected to as being dependent upon a rejected base
claim, but would be allowable if rewritten in independent form including all of the
limitations of the base claim and any intervening claims.
9.	The following is a statement of reasons for the indication of allowable subject
matter:
Regarding Claim 6, the combination of prior arts teaches the method of Claim 1.
However in the context of claim 1, 5 and 6 as a whole, the combination of prior arts does not teach saving a reconstructed object associated with one of the images; and for each of one or more subsequent images, transforming the saved reconstructed object based on a predicted head pose of the user to generate a transformed reconstructed object. Therefore, Claim 6 in the context of claim 1, 5 as a whole does comprise allowable subject matter.

Regarding Claim 7, the combination of prior arts teaches the method of Claim 1.
However in the context of claim 1, 5, 6 and 7 as a whole, the combination of prior arts does not teach generating the predicted head pose of the user for each of the one or more subsequent images, the predicted head pose of the user based on a latency of a pipeline between capture of the images and presentation of the final views of the scene based on the images. Therefore, Claim 7 in the context of claim 1, 5, 6 as a whole does comprise allowable subject matter.

Regarding Claim 13, the combination of prior arts teaches the method of Claim 8.
However in the context of claim 8, 12 and 13 as a whole, the combination of prior arts does not teach save a reconstructed object associated with one of the images; and for each of one or more subsequent images, transform the saved reconstructed object based on a predicted head pose of the user to generate a transformed reconstructed object. Therefore, Claim 13 in the context of claim 8, 12 as a whole does comprise allowable subject matter.

Regarding Claim 14, the combination of prior arts teaches the method of Claim 8.
However in the context of claim 8, 12, 13 and 14 as a whole, the combination of prior arts does not teach generate the predicted head pose of the user for each of the one or more subsequent images, the predicted head pose of the user based on a latency of a pipeline between capture of the images and presentation of the final views of the scene based on the images. Therefore, Claim 14 in the context of claim 8, 12, 13 as a whole does comprise allowable subject matter.

Regarding Claim 20, the combination of prior arts teaches the method of Claim 15.
However in the context of claim 15, 19 and 20 as a whole, the combination of prior arts does not teach save a reconstructed object associated with one of the images; and for each of one or more subsequent images, transform the saved reconstructed object based on a predicted head pose of the user to generate a transformed reconstructed object. Therefore, Claim 20 in the context of claim 15, 19 as a whole does comprise allowable subject matter.




Conclusion
8.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. They are as recited in the attached PTO-892 form.
9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL LE whose telephone number is (571)272-5330. The examiner can normally be reached 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MICHAEL LE/Primary Examiner, Art Unit 2614
Read full office action
Prosecution Timeline

Jul 29, 2024
Application Filed
Apr 29, 2026
Non-Final Rejection mailed — §103
May 06, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/462,703
Patent 12633003
VIDEO GENERATION WITH LATENT DIFFUSION PROBABILISTIC MODELS
2y 8m to grant Granted May 19, 2026
18/240,920
Patent 12626460
RECURSIVE FIELD NETWORKS FOR OBJECT REPRESENTATION
2y 8m to grant Granted May 12, 2026
18/528,979
Patent 12626325
METHOD AND APPARATUS WITH IMAGE PROCESSING
2y 5m to grant Granted May 12, 2026
18/453,828
Patent 12620168
Extended reality streaming method and system
2y 8m to grant Granted May 05, 2026
18/280,116
Patent 12614244
METHODS FOR GENERATING AN UNDISTORTED IMAGE FROM A DISTORTED ORIGINAL IMAGE OF A GRAPHICAL REPRESENTATION
2y 8m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
66%
Grant Probability
88%
With Interview (+22.1%)
3y 3m (~1y 5m remaining)
Median Time to Grant
Low
PTA Risk
Based on 873 resolved cases by this examiner. Grant probability derived from career allowance rate.