Last updated: April 19, 2026
Application No. 18/834,191
VOLUMETRIC IMMERSIVE EXPERIENCE WITH MULTIPLE VIEWS

Non-Final OA §103
Filed
Jul 29, 2024
Examiner
NGUYEN, ANH TUAN V
Art Unit
2619
Tech Center
2600 — Communications
Assignee
Dolby Laboratories Licensing Corporation
OA Round
1 (Non-Final)
Interview Optional

— +19.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 489 resolved cases, 2023–2026
Examiner Intelligence

NGUYEN, ANH TUAN V View full profile →
Grants 73% — above average
Career Allow Rate
355 granted / 489 resolved
+10.6% vs TC avg
Strong +19% interview lift
Without
With
+19.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
38 currently pending
Career history
527
Total Applications
across all art units
Statute-Specific Performance

§101
8.3%
-31.7% vs TC avg
§103
67.6%
+27.6% vs TC avg
§102
4.9%
-35.1% vs TC avg
§112
12.3%
-27.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 489 resolved cases
Office Action

§103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Applicant’s preliminary amendment/response filed 07/29/2024 has been entered and made of record. Claims 3-4, 7-10, and 12-14 were amended. Claims 1-14 are pending in the application.
Claim Objections
Claim 13 is objected to because of the following informalities: Claim 13 recites “performing any the method” instead of “performing the method.” Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-10 and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lakshman et al. (US 2018/0359489) in view of Yuan et al. (US 2005/0094729) and Ward (US 2013/0335438).
Regarding claim 1, Lakshman teaches/suggests: A method comprising: 
receiving a multi-view input image, the multi-view input image covering a plurality of sampled views to an image space depicted in the multi-view input image (Lakshman [0176] “Each of the raw multiview images may comprise a plurality of raw texture images corresponding to a plurality of raw sampled views”); 
generating, from the multi-view input image, a multi-view layered image stack of a plurality of layered images of a first dynamic range for the plurality of sampled views (Lakshman [0177] “To generate the multiview images for the plurality of sampled views, the post-camera ISP (306) may perform one or more post-processing operations, on the raw multiview images” [0233] “Each sampled view of the multiview image comprises a plurality of texture images and a plurality of depth images in a plurality of image layers”); 
determining a target view of a viewer to the image space, the target view being determined based at least in part on a user pose data portion generated from a user pose tracking data collected while the viewer is viewing rendered images on an image display (Lakshman [0070] “A downstream device (e.g., a VR client device, an AR client device, a video decoder, etc.) operating in conjunction with the wearable device can determine the view position and the view direction of the target view in real time or in near real time by tracking or monitoring spatial positions and/or spatial directions of the wearable device used by the viewer while display images including the display image derived from the multiview image are rendered on the display of the wearable device to the viewer”); 
using the target view of the viewer to select a set of user pose selected sampled views from among the plurality of sampled views represented in the multi-view input image (Lakshman [0234] “the image processing device uses the target view to select, from the plurality of sampled views of the multiview image, a set of sampled views”); 
encoding a set of layered images for the set of user pose selected sampled views in the plurality of layered images of the multi-view layered image stack into a video signal to cause a recipient device of the video signal to generate a display image from the set of layered images for rendering on the image display (Lakshman [0235] “the image processing device encodes a texture image and a depth image for each sampled view in the set of sampled views into a multiview video signal to be transmitted to a downstream device” [0070] “while display images including the display image derived from the multiview image are rendered on the display of the wearable device to the viewer”).
Lakshman does not teach/suggest a plurality of alpha maps for the plurality of layered images. Yuan, however, teaches/suggests a plurality of alpha maps for the plurality of layered images (Yuan [0025] “Multilevel alpha maps are frequently used to blend different layers of image sequences”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the encoding of Lakshman to include the alpha maps of Yuan for blending.

Lakshman as modified by Yuan does not teach/suggest a plurality of beta scale maps for the plurality of layered images. Nor does Lakshman teach/suggest encoding … along with a set of alpha maps for the set of user pose selected sampled views in the plurality of alpha maps of the multi-view layered image stack and a set of beta scale maps for the set of user pose selected sampled views in the plurality of beta scale maps of the multi-view layered image stack. Ward, however, teaches/suggests a plurality of beta scale maps for the plurality of layered images (Ward [0090] “the image processing unit may use ratio values in the local multiscale gray scale ratio image with the luminance values derived from the input HDR image to generate a tone-mapped gray scale image” [100] “each of the ratio images may correspond to a spatial resolution level of the number of spatial resolution levels”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the encoding of Lakshman as modified by Yuan to include the ratio images (the beta scale maps) of Ward for tone mapping.

As such, Lakshman as modified by Yuan and Ward teaches/suggests encoding … along with a set of alpha maps for the set of user pose selected sampled views in the plurality of alpha maps of the multi-view layered image stack and a set of beta scale maps for the set of user pose selected sampled views in the plurality of beta scale maps of the multi-view layered image stack (Lakshman [0235] “the image processing device encodes a texture image and a depth image for each sampled view in the set of sampled views into a multiview video signal to be transmitted to a downstream device” Yuan [0025] “Multilevel alpha maps are frequently used to blend different layers of image sequences” Ward [100] “each of the ratio images may correspond to a spatial resolution level of the number of spatial resolution levels”).

Regarding claim 2, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 1, 
wherein the set of beta scale map can be used to apply scaling operations on the set of layered images to generate a set of scaled layered images of a second dynamic range for the set of user pose selected sampled views (Lakshman [0234] “the image processing device uses the target view to select, from the plurality of sampled views of the multiview image, a set of sampled views” Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”); 
wherein the second dynamic range is different from the first dynamic range (Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”).
The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 3, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 1, wherein the display image represents one of: a standard dynamic range image, a high dynamic range image, or a display mapped image that is optimized for rendering on a target image display (Lakshman [0036] “rendering diffuse images in the diffuse image layer by a legacy video decoder that may be of a limited dynamic range … rendering overall texture images that contain both specular and diffuse image details from the diffuse and specular texture images in the different image layers by a compliant video decoder that may be of a relatively large dynamic range” Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 4, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 1, 
wherein the multi-view input image includes a plurality of single-view input images for the plurality of sampled views (Lakshman [0052] “image data of a sampled view comprises a single-view texture image in the plurality of single-view texture images and a corresponding single-view depth image in the plurality of single-view depth images”); 
wherein the plurality of single-view images of the first dynamic range is generated from the plurality of single-view input images used to generate the plurality of layered images (Lakshman [0176] “Each of the raw multiview images may comprise a plurality of raw texture images corresponding to a plurality of raw sampled views” [0052] “image data of a sampled view comprises a single-view texture image in the plurality of single-view texture images and a corresponding single-view depth image in the plurality of single-view depth images”); 
wherein each single-view image of the first dynamic range in the plurality of single-view images of the first dynamic range corresponds to a respective sampled view in the plurality of sampled views and is partitioned into a respective layered image for the respective sampled view in the plurality of layered images (Lakshman [0052] “image data of a sampled view comprises a single-view texture image in the plurality of single-view texture images and a corresponding single-view depth image in the plurality of single-view depth images” [0061] “Each image layer in the plurality of image layers comprises a single-view texture image and a single-view depth image corresponding to the single-view texture image”).

Regarding claim 5, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 4, 
wherein the plurality of single-view input images for the plurality of sampled views is used to generate a second plurality of single-view images of a different dynamic range for the plurality of sampled views (Lakshman [0176] “Each of the raw multiview images may comprise a plurality of raw texture images corresponding to a plurality of raw sampled views” Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”); 
wherein the second plurality of single-view images of the different dynamic range includes a second single-view image of the different dynamic range for the respective sampled view (Lakshman [0233] “Each sampled view of the multiview image comprises a plurality of texture images and a plurality of depth images in a plurality of image layers” Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”); 
wherein the plurality of beta scale maps includes a respective beta scale map for the respective sampled view (Ward [100] “each of the ratio images may correspond to a spatial resolution level of the number of spatial resolution levels”); 
wherein the respective beta scale map includes beta scale data to be used to perform beta scaling operations on the single-view image of the first dynamic range to generate a beta scaled image of the different dynamic range that approximates the second single-view image of the different dynamic range (Ward [0090] “the image processing unit may use ratio values in the local multiscale gray scale ratio image with the luminance values derived from the input HDR image to generate a tone-mapped gray scale image … tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”).
The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 6, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 5, wherein the beta scaling operations include one of: 
simple scaling with scaling factors (Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”), or 
applying one or more codeword mapping relationships to map codewords of the single-view image of the first dynamic range to generate corresponding codeword of the beta scaled image of the different dynamic range [This is yet to be considered because of the “one of” recitation.].
The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 7, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 5, wherein the beta scaling operations are performed in place of one or more of: global tone mapping, local tone mapping, display mapping operations, color space conversion, linear mapping, or non-linear mapping (Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 8, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 1, wherein the set of layered images for the set of user pose selected sampled views is encoded in a base layer of the video signal (Lakshman [0210] “Each signal layer (e.g., one of 330-1, 330-2, through 330-M′, etc.) may comprise media data fields/containers (e.g., one of 332-1, 332-2, through 332-M′, etc.) for carrying media data”).

Regarding claim 9, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 1, wherein the set of alpha maps and the set of beta scale maps for the set of user pose selected sampled views are carried in the video signal as image metadata in a data container separate from the set of layered images (Lakshman [0210] “Each such signal layer may comprise media related metadata fields/containers (e.g., one of 334-1, 334-2, through 334-M′, etc.) for carrying media related metadata” Yuan [0025] “Multilevel alpha maps are frequently used to blend different layers of image sequences” Ward [100] “each of the ratio images may correspond to a spatial resolution level of the number of spatial resolution levels”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Regarding claim 10, Lakshman as modified by Yuan and Ward teaches/suggests: The method of Claim 1, 
wherein the plurality of layered images includes a layered image for a sampled view in the plurality of sampled views (Lakshman [0061] “the plurality of image layers for the sampled view comprises a diffuse image layer (denoted as ‘L1’) and a specular image layer (denoted as ‘L2’)”); 
wherein the layered image includes different image layers respectively at different depth sub-ranges from a view position of the sampled view (Lakshman [0062]-[0063] “a corresponding diffuse depth image 108-1 (denoted as ‘L1 depth’) in the diffuse image layer (‘L1’) comprises depth data of the diffuse visual objects in the 3D scene ... a corresponding specular depth image 108-2 (denoted as ‘L2 depth’) in the specular image layer (‘L2’) comprises depth data of the specular visual objects in the 3D scene”).

Regarding claim 13, Lakshman as modified by Yuan and Ward teaches/suggests: An apparatus performing any the method recited in Claim 1 (Lakshman Fig. 5: computer system 500). See the treatment of claim 1 above.

Regarding claim 14, Lakshman as modified by Yuan and Ward teaches/suggests: A non-transitory computer readable storage medium, storing software instructions, which when executed by one or more processors cause performance of the method recited in Claim 1 (Lakshman [0274] “a main memory 506, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504”). See the treatment of claim 1 above.

Claim(s) 11 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lakshman et al. (US 2018/0359489) in view of Yuan et al. (US 2005/0094729), Ward (US 2013/0335438), and Dimitrov et al. (US 2017/0357327).
Regarding claim 11, Lakshman as modified by Yuan and Ward teaches/suggests: A method comprising: 
decoding, from a video signal, a set of layered images of a first dynamic range for a set of user pose selected sampled views (Lakshman [0247] “an image processing device decodes a multiview video signal into a set of texture images and a set of depth images for a set of sampled views of a multiview image”), 
the set of user pose selected sampled views having been selected based on user pose data from a plurality of sampled views covered by a multi-view source image, the multi-view source image having been used to generate a corresponding multi-view layered image stack (Lakshman [0233]-[0234] “Each sampled view of the multiview image comprises a plurality of texture images and a plurality of depth images in a plurality of image layers … the image processing device uses the target view to select, from the plurality of sampled views of the multiview image, a set of sampled views” [0176]-[0177] “Each of the raw multiview images may comprise a plurality of raw texture images corresponding to a plurality of raw sampled views … To generate the multiview images for the plurality of sampled views, the post-camera ISP (306) may perform one or more post-processing operations, on the raw multiview images” [0070] “A downstream device (e.g., a VR client device, an AR client device, a video decoder, etc.) operating in conjunction with the wearable device can determine the view position and the view direction of the target view in real time or in near real time by tracking or monitoring spatial positions and/or spatial directions of the wearable device used by the viewer while display images including the display image derived from the multiview image are rendered on the display of the wearable device to the viewer”); 
the corresponding multi-view layered image having been used to generate the set of layered images (Lakshman [0061] “the plurality of image layers for the sampled view comprises a diffuse image layer (denoted as ‘L1’) and a specular image layer (denoted as ‘L2’)”); 
decoding, from the video signal, a set of alpha maps for the set of user pose selected sampled views (Lakshman [0247] “an image processing device decodes a multiview video signal into a set of texture images and a set of depth images for a set of sampled views of a multiview image” Yuan [0025] “Multilevel alpha maps are frequently used to blend different layers of image sequences”); 
causing a display image derived from the set of layered images and the set of rendered on a display of a wearable device” Yuan [0025] “Multilevel alpha maps are frequently used to blend different layers of image sequences”), 
where a set of beta scale maps for the set of user pose selected sampled views is decoded from the video signal (Lakshman [0247] “an image processing device decodes a multiview video signal into a set of texture images and a set of depth images for a set of sampled views of a multiview image” Ward [0090] “the image processing unit may use ratio values in the local multiscale gray scale ratio image with the luminance values derived from the input HDR image to generate a tone-mapped gray scale image” [100] “each of the ratio images may correspond to a spatial resolution level of the number of spatial resolution levels”); 
wherein the display image is of a second dynamic range different from the first dynamic range (Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”); 
wherein the display image is generated from the set of beta scale map, the set of layered images and the set of blend different layers of image sequences” Ward [0090] “tone-mapped luminance values in the tone-map gray scale image comprise a reduced dynamic range lower than that of the luminance values in the input HDR image”).
The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.

Lakshman as modified by Yuan and Ward does not teach/suggest:
using a current view of a viewer to adjust alpha values in the set of alpha maps for the set of user pose selected sampled views to generate adjusted alpha values in a set of adjusted alpha maps for the current view; 
Dimitrov, however, teaches/suggests using a current view of a viewer to adjust alpha values (Dimitrov [0030] “the blending engine may adjust a blending factor, such as an alpha blending factor ... the adjustment may account for differences in position and/or orientation of the one or more video cameras and the view point of the user”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the alpha maps of Lakshman as modified by Yuan and Ward to be adjusted as taught/suggested by Dimitrov for the blending. As such, Lakshman as modified by Yuan, Ward, and Dimitrov teaches/suggests:
using a current view of a viewer to adjust alpha values in the set of alpha maps for the set of user pose selected sampled views to generate adjusted alpha values in a set of adjusted alpha maps for the current view (Yuan [0025] “Multilevel alpha maps are frequently used to blend different layers of image sequences” Dimitrov [0030] “the blending engine may adjust a blending factor, such as an alpha blending factor ... the adjustment may account for differences in position and/or orientation of the one or more video cameras and the view point of the user”); 

Regarding claim 12, Lakshman as modified by Yuan, Ward, and Dimitrov teaches/suggests: The method of Claim 11, 
wherein the set of user pose selected sampled views includes two or more sampled views (Lakshman [0234] “the image processing device uses the target view to select, from the plurality of sampled views of the multiview image, a set of sampled views”); 
wherein the display image is generated by performing image blending operations on two or more intermediate images generated for the current view from the set of layered images and the set of adjusted alpha maps (Lakshman [0250] “the image processing device causes a display image derived at least in part from the blended warped texture image of the target view to be rendered on a display of a wearable device” Yuan [0025] “Multilevel alpha maps are frequently used to blend different layers of image sequences” Dimitrov [0030] “the blending engine may adjust a blending factor, such as an alpha blending factor ... the adjustment may account for differences in position and/or orientation of the one or more video cameras and the view point of the user”).
The same rationales to combine as set forth in the rejection of claims 1 and 11 above are incorporated herein.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 11158026 – FOV extension
US 2019/0104324 – selective streaming based on FOV
US 2021/0166353 – generate bokeh images
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANH-TUAN V NGUYEN whose telephone number is 571-270-7513. The examiner can normally be reached on M-F 9AM-5PM ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JASON CHAN can be reached on 571-272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANH-TUAN V NGUYEN/
Primary Examiner, Art Unit 2619
Read full office action
Prosecution Timeline

Jul 29, 2024
Application Filed
Mar 02, 2026
Non-Final Rejection — §103
Apr 16, 2026
Applicant Interview (Telephonic)
Apr 16, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/156,185
Patent 12591359
ELECTRONIC DEVICE COMPRISING DISPLAY THAT OPTIMALLY DISPLAY CONTENT WITH RESPECT TO CAMERA HOLE, AND METHOD FOR CONTROLLING DISPLAY THEREOF
2y 5m to grant Granted Mar 31, 2026
18/633,935
Patent 12592033
METHOD AND APPARATUS FOR DETECTING PICKED OBJECT, COMPUTER DEVICE, READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 31, 2026
17/571,239
Patent 12573132
ASSIGNING PRIMITIVES TO TILES IN A GRAPHICS PROCESSING SYSTEM
2y 5m to grant Granted Mar 10, 2026
18/252,118
Patent 12573161
Learning Articulated Shape Reconstruction from Imagery
2y 5m to grant Granted Mar 10, 2026
18/262,662
Patent 12561893
COLOR AND INFRA-RED THREE-DIMENSIONAL RECONSTRUCTION USING IMPLICIT RADIANCE FUNCTIONS
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
73%
Grant Probability
92%
With Interview (+19.2%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 489 resolved cases by this examiner. Grant probability derived from career allow rate.