Last updated: April 19, 2026
Application No. 18/436,616
Temporal Blending of Depth Maps

Non-Final OA §103
Filed
Feb 08, 2024
Examiner
TRAN, KIM THANH THI
Art Unit
2615
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
1 (Non-Final)
Interview Optional

— +24.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 367 resolved cases, 2023–2026
Examiner Intelligence

TRAN, KIM THANH THI View full profile →
Grants 77% — above average
Career Allow Rate
281 granted / 367 resolved
+14.6% vs TC avg
Strong +24% interview lift
Without
With
+24.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
12 currently pending
Career history
379
Total Applications
across all art units
Statute-Specific Performance

§101
9.7%
-30.3% vs TC avg
§103
65.1%
+25.1% vs TC avg
§102
15.3%
-24.7% vs TC avg
§112
3.3%
-36.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 367 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This Office Action is in response to the Applicants’ communication filed on February 08, 2024.  In virtue of this communication, claims 1-20 are currently presented in the instant application. 

Drawings 
The drawings submitted on February 08, 2024.  These drawings are reviewed and accepted by the examiner. 

Information Disclosure Statement
 The information Disclosure Statement (IDS) Forms PTO-1449, filed on August 12, 2024 in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosed therein was considered by the examiner. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-9, 13, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over VAREKAMP (US 20190385352 A1) in view of  Bleyer et al. (US 20210358084 A1). 
Regarding claim 1.  VAREKAMP discloses a method comprising: 
at a device including one or more processors and non-transitory memory (VAREKAMP, see par. [0081]): 
storing a first mesh of a physical environment in a first mesh buffer and a second mesh of the physical environment in a second mesh buffer (VAREKAMP, see at least par. [0031] Specifically, the first mesh position may be the position in the first mesh which by the view point transformation will be transformed/mapped to the first position (or similarly the first mesh position may be the position in the first mesh linked to a position in the first texture map which by the view point transformation will be transformed/mapped to the first position). Similarly, the second mesh position may be the position in the second mesh which by the view point transformation will be transformed/mapped to the first position (or similarly the second mesh position may be the position in the second mesh linked to a position in the second texture map which by the view point transformation will be transformed/mapped to the first position).); 
rendering a first depth map for an image of the physical environment based on the first mesh (VAREKAMP, see at least par. [0082], [0082] In many embodiments the first texture map and the first mesh may be generated from a capture of a real life scene. The capture may be by a suitable set of cameras. For example, a single stereo camera or range camera may capture a real life scene and generate an image and a depth (/disparity) map. In other embodiments, a plurality of cameras at different positions may capture a two-dimensional image and a depth map may be generated from the captured images, e.g. by disparity estimation.) and a second depth map for the image of the physical environment based on the second mesh (VAREKAMP, see at least par. [0105], Equivalently, the weighting of the first light intensity value from the first view transformer 207 may be a monotonically increasing function of a magnitude of the gradient in the second mesh. The same may symmetrically be applicable to the weighting of the second light intensity value from the second view transformer 209. The weight of each texture may specifically be inversely proportional to the local gradient magnitude in the depth/disparity map or mesh that is associated with each texture.); 
VAREKAMP does not discloses generating a blended depth map based on the first depth map, the second depth map, and a difference between a time of the image of the physical environment and a time of the second mesh.  However, 
Bleyer discloses:
generating a blended depth map based on the first depth map, the second depth map, and a difference between a time of the image of the physical environment and a time of the second mesh (Bleyer, see at least par. [0103] In addition, it should be noted that, in some instances, a filter for generating an interpolated depth map for a target timepoint utilizes image data from more than one other timepoints (e.g., an additional stereo pair of images and an additional depth map). For example, FIGS. 6A-6C illustrate generating an interpolated depth map 620 based on image data from a prior timepoint 655A, a target timepoint 655C, and a subsequent timepoint 655D. Stereo pair of images 610A and depth map 615A are associated with prior timepoint 655A and are representative of stereo pair of images 410A and depth map 415A from FIGS. 4A-4B, respectively. Stereo pair of images 610C is associated with target timepoint 655C and is representative of stereo pair of images 410C from FIG. 4B. Stereo pair of images 610D and depth map 615D are associated with subsequent timepoint 655D and are representative of stereo pair of images 410D and depth map 415D from FIG. 4B, respectively. The arrows directed toward interpolated depth map 620 indicate that, in some instances, interpolated depth map 620 is generated using image data from stereo pairs of images 610A, 610C, and 610D and depth maps 615A and 615D.).  
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system and method of VAREKAMP, with generating a blended depth map based on the first depth map, the second depth map, and a difference between a time of the image of the physical environment and a time of the second mesh, as provided by Bleyer. The modification provides an improved system and method for generating a depth map and improved techniques and systems for calculating and processing depth information, particularly for systems that need to resolve parallax problems. (Bleyer, see par. [0012).
Regarding claim 3.  VAREKAMP in view of Bleyer discloses the method of claim 1 (as rejected above), and VAREKAMP in view of Bleyer further discloses further comprising, in response to detecting an update trigger, overwriting the first mesh buffer with the second mesh and overwriting the second mesh buffer with a third mesh stored in a third mesh buffer (Bleyer, see par. [0110-0112]).

Regarding claim 4.  VAREKAMP in view of Bleyer discloses the method of claim 3 (as rejected above), and VAREKAMP in view of Bleyer further discloses wherein the update trigger includes a temporal trigger detected when a threshold amount of time has elapsed since an initialization or a most recent update trigger (Bleyer, see pars. [0082, 0110-0112], [0082] However, generating a depth map from the stereo pair of images 410A for performing parallax correction is computationally expensive, particularly for high-resolution stereo pairs of images (which may be desirable for providing parallax-corrected pass-through views of an environment). For example, FIG. 4A illustrates the HMD 400 capturing the stereo pair of images 410A at a first timepoint (e.g., indicated according to time axis, t). In some instances, because of the computational expense associated with performing the depth calculations 420A to generate a depth map, the depth map 415A is not completed until a period of time has elapsed (e.g., several frames) since the first timepoint at which the HMD 400 captured the stereo pair of images 410A. The temporal delay associated with conventional depth calculations (e.g., depth calculations 420A) may cause high latency in operations that depend on updated depth information (e.g., providing parallax-corrected pass-through views of an environment).).  
Regarding claim 5.  VAREKAMP in view of Bleyer discloses the method of claim 3 (as rejected above), and VAREKAMP in view of Bleyer further discloses wherein the update trigger includes a motion-based trigger detected based on motion of the device (Bleyer, see at least par. [0123] Additionally, or alternatively, the HMD 800 may detect changes in the position of the HMD 800 relative to an environment based on changes in the depth maps over time. A difference in the depth information of two sequentially generated depth maps may also indicate that the perspective of the HMD 800 is changing with respect to at least a portion of the real-world environment. For instance, a difference in depth information between consecutively generated depth maps may indicate that objects in the environment are moving with respect to the HMD 800, which may trigger an increase in the frequency at which the HMD 800 generates depth maps (e.g., to more frequently update depth information for the moving objects in the environment).).  

Regarding claim 6.  VAREKAMP in view of Bleyer discloses the method of claim 3 (as rejected above), and VAREKAMP in view of Bleyer further discloses wherein the update trigger includes an eye-based trigger detected based on an eye characteristic of a user (Bleyer, see at least par. [0075] By performing these different transforms, the HMD 300 is able to perform three-dimensional (3D) geometric transforms on the raw camera images to transform the perspectives of the raw images in a manner so as to correlate with the perspectives of the user's pupils 330 and 335. Additionally, the 3D geometric transforms rely on depth computations in which the objects in the HMD 300's environment are mapped out to determine their depths as well as the pose 375).  

Regarding claim 7.  VAREKAMP in view of Bleyer discloses the method of claim 3 (as rejected above), and VAREKAMP in view of Bleyer further discloses wherein the update trigger includes a content-based trigger detected based on virtual content displayed by the device (Bleyer, see at least par. [0123] Additionally, or alternatively, the HMD 800 may detect changes in the position of the HMD 800 relative to an environment based on changes in the depth maps over time. A difference in the depth information of two sequentially generated depth maps may also indicate that the perspective of the HMD 800 is changing with respect to at least a portion of the real-world environment. For instance, a difference in depth information between consecutively generated depth maps may indicate that objects in the environment are moving with respect to the HMD 800, which may trigger an increase in the frequency at which the HMD 800 generates depth maps (e.g., to more frequently update depth information for the moving objects in the environment).).  

Regarding claim 8.  VAREKAMP in view of Bleyer discloses the method of claim 3 (as rejected above), and VAREKAMP in view of Bleyer further discloses further comprising, in response to determining that the third mesh is unstable, suppressing the update trigger for a particular amount of time (Bleyer, see pars. [0123] Additionally, or alternatively, the HMD 800 may detect changes in the position of the HMD 800 relative to an environment based on changes in the depth maps over time. A difference in the depth information of two sequentially generated depth maps may also indicate that the perspective of the HMD 800 is changing with respect to at least a portion of the real-world environment. For instance, a difference in depth information between consecutively generated depth maps may indicate that objects in the environment are moving with respect to the HMD 800, which may trigger an increase in the frequency at which the HMD 800 generates depth maps (e.g., to more frequently update depth information for the moving objects in the environment).  [0078] Attention is now directed to FIG. 4A, which illustrates an HMD 400 capturing an environment 405. As used herein, “scene” and “environment” are used interchangeably and refer broadly to any real-world space comprising any arrangement and/or type of real-world objects. As used herein, “mixed-reality environment” refers to any real-world environment that includes virtual content implemented therein/thereon (e.g., holograms of an AR environment), or any immersive virtual environment that only includes virtual content (e.g., a VR environment). One will recognize that virtual content can include virtual representations of real-world objects.).  

Regarding claim 9.  VAREKAMP in view of Bleyer discloses the method of claim 3 (as rejected above), and VAREKAMP in view of Bleyer further discloses further comprising updating the third mesh buffer in response to receiving mesh data (Bleyer, see at least par. [0039] High frequency depth map generation (e.g., providing a depth map for each captured stereo pair of images) may improve user experiences that depend on near-real-time depth calculations, such as, for example, providing parallax-corrected pass-through images of a user's environment, hand or other object tracking, surface reconstruction mesh building or updating, streaming stereoscopic video, and/or others at a higher frame rate or with lower latency than would otherwise be possible using traditional techniques.).  

Regarding claim 13.  VAREKAMP in view of Bleyer discloses the method of claim 1 (as rejected above), and VAREKAMP in view of Bleyer further discloses further comprising transforming the image of the environment based on the blended depth map and a difference between a first perspective of the image of the environment and a second perspective (Bleyer, see pars. [0122], A change in the pose of the HMD 800 may cause portions of real-world objects that were not previously visible to the user to become visible according to the user's new perspective. Therefore, in some implementations, the HMD 800 increases the frequency with which the HMD 800 generates depth maps in response to detecting a change in the pose of the HMD 800 (e.g., in order to more accurately capture the changes in the user's perspective/perception of the real-world environment).  [0123] Additionally, or alternatively, the HMD 800 may detect changes in the position of the HMD 800 relative to an environment based on changes in the depth maps over time. A difference in the depth information of two sequentially generated depth maps may also indicate that the perspective of the HMD 800 is changing with respect to at least a portion of the real-world environment. For instance, a difference in depth information between consecutively generated depth maps may indicate that objects in the environment are moving with respect to the HMD 800, which may trigger an increase in the frequency at which the HMD 800 generates depth maps (e.g., to more frequently update depth information for the moving objects in the environment).

Regarding claim 15.   A device  to perform same method of claim 1.  Therefore, claim 20 is further rejected based on the same rationale as claim 1 set forth above and incorporated herein.

Regarding claim 17, the device of claim 17 performs same method of claim 3.  Therefore, claim 17 is further rejected based on the same rationale as claim 3 set forth above and incorporated herein.

Regarding claim 20.  VAREKAMP in view of Bleyer discloses a non-transitory memory storing one or more programs, which, when executed by one or more processors of a device, cause the device (Bleyer, see par. [0035]) to perform same method of claim 1.  Therefore, claim 20 is further rejected based on the same rationale as claim 1 set forth above and incorporated herein.

Claims 10 is rejected under 35 U.S.C. 103 as being unpatentable over VAREKAMP (US 20190385352 A1) in view of  Bleyer et al. (US 20210358084 A1), as applied claim 1 above, and further in view of Tsukizaki et al. (US 20020075262 A1).
Regarding claim 10.  VAREKAMP in view of Bleyer discloses the method of claim 3 (as rejected above), but VAREKAMP in view of Bleyer does not disclose further comprising, in response to detecting a subsequent update trigger, overwriting the first mesh buffer with the third mesh and overwriting the second mesh buffer with a fourth mesh stored in the third mesh buffer.  However, 
Tsukizaki discloses:
further comprising, in response to detecting a subsequent update trigger, overwriting the first mesh buffer with the third mesh and overwriting the second mesh buffer with a fourth mesh stored in the third mesh buffer (Tsukizaki, see at least par. [0077] As described above, according to the first and second embodiments of the present invention, the frame F is divided into the mesh M, and it is determined whether or not overwrite rendering is possible for each mesh M in a predetermined range E where a predetermined object A should be rendered. More specifically, the Z value of each mesh Mc is subjected to clearing processing on every other dot (every other mesh), or other objects are inhibited from being rendered. By doing so, it is possible to visibly display an object hidden by a shade of another object without carrying out a determination process relative to a positional relationship between a virtual viewpoint and object and a semi-transparency processing for making object semitransparent at high speed and low cost.).  
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system and method of VAREKAMP, with further comprising, in response to detecting a subsequent update trigger, overwriting the first mesh buffer with the third mesh and overwriting the second mesh buffer with a fourth mesh stored in the third mesh buffer, as provided by Tsukizaki,. The modification provides an improved system and method for generating a depth map and improved techniques and systems for perform a display process at high speed and low cost, and to a computer-readable recording medium on which a rendering program executed by a computer is recorded, a program processor for executing a rendering program, and a rendering program executed by a computer. (Tsukizaki, see par. [0014]).

Claims 11-12, 14 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over VAREKAMP (US 20190385352 A1) in view of  Bleyer et al. (US 20210358084 A1), as applied claim 1 above, and further in view of Cole et al. (US 20160255327 A1).
Regarding claim 11.  VAREKAMP in view of Bleyer discloses the method of claim 1 (as rejected above), but VAREKAMP in view of Bleyer does not disclose further comprising, in response to determining that a portion of the first mesh was not used in rendering the first depth map, updating the portion of the first mesh with a corresponding portion of the second mesh.  However, 
Cole discloses:
further comprising, in response to determining that a portion of the first mesh was not used in rendering the first depth map, updating the portion of the first mesh with a corresponding portion of the second mesh (Cole, see at least [0098], generating depth map of the environment of interest, generating 3D mesh models and UV maps, processing image content received from one or more camera devices positioned at one or more location in the environment, e.g., encoding image in one or more different formats, extract occluded image data in accordance with the features of the present invention, and communicating the image content as well as environmental model information and UV maps to one or more playback devices in accordance with the features of the invention. In some embodiments the processing system 608 may include a server with the server responding to requests for content and/o environmental information for use in rendering content, e.g., depth maps corresponding to environment of interest, and/or 3D environmental mesh models, UV maps and/or imaging content.).  

Regarding claim 12.  VAREKAMP in view of Bleyer and further in view of Cole discloses the method of claim 1 (as rejected above), VAREKAMP in view of Bleyer and further in view of Cole discloses does not disclose further comprising, in response to determining that a portion of the first mesh was not used in rendering the first depth map and a portion of the second mesh was not used in rendering the second depth map, updating the portion of the first mesh and the portion of the second mesh with corresponding portions of a third mesh.  However, Cole discloses:
further comprising, in response to determining that a portion of the first mesh was not used in rendering the first depth map and a portion of the second mesh was not used in rendering the second depth map, updating the portion of the first mesh and the portion of the second mesh with corresponding portions of a third mesh (Cole, see pars. [0092] While the depth map generated from each image corresponds to a portion of the environment to be mapped, in some embodiments the depth maps generated from individual images are processed, e.g., stitched together, to form a composite map of the complete environment scanned using the light field cameras. Thus by using the light field cameras a relatively complete environmental map can be, and in some embodiments is generated.  [0093] In the case of light field cameras, an array of micro-lenses captures enough information that one can refocus images after acquisition. It is also possible to shift, after image capture, one's viewpoint within the sub-apertures of the main lens, effectively obtaining multiple views. In the case of a light field camera, depth cues from both defocus and correspondence are available simultaneously in a single capture. This can be useful when attempting to fill in occluded information/scene portions not captured by the stereoscopic cameras.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system and method of Burn, with generating a blended depth map based on the first depth map, the second depth map, and further comprising, in response to determining that a portion of the first mesh was not used in rendering the first depth map and a portion of the second mesh was not used in rendering the second depth map, updating the portion of the first mesh and the portion of the second mesh with corresponding portions of a third mesh, as provided by Cole. The modification provides an improved system and method for generating a depth map and improved techniques and systems for calculating and processing depth information, particularly for systems that need to allow a playback device to receive and/or use images of non-occluded portions of an environment along with at least some image content corresponding to occluded portions of the environment. (Cole, see par. [0008]).
Regarding claim 14.  VAREKAMP in view of Bleyer discloses the method of claim 1 (as rejected above), VAREKAMP in view of Bleyer does not disclose further comprising: generating virtual content based on the blended depth map; compositing the virtual content with the image of the physical environment to generate a display image; and displaying the display image.  However, 
Cole discloses:
 further comprising: 
generating virtual content based on the blended depth map (Cole, see at least par. [0087] In some embodiments the camera rig 801 may be mounted on a support structure such that it can be rotated around a vertical axis. In various embodiments the camera rig 801 may be deployed in an environment of interest, e.g., such as a stadium, auditorium, or another place where an event to be captured is taking place. In some embodiments the light field cameras of the camera rig 801 are used to capture images of the environment of interest, e.g., a 360 degree scene area of interest, and generate depth maps which can be used in simulating a 3D environment and displaying stereoscopic imaging content.); 
compositing the virtual content with the image of the physical environment to generate a display image (Cole, see at least the second part of par. [0089], Such depth maps and/or composite depth map may, and in some embodiments are, provided to a playback device for use in displaying stereoscopic imaging content and simulating a 3D environment which can be experienced by the viewers.); and 
displaying the display image (Cole, see par. [0093]).  
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system and method of Burn, with generating a blended depth map based on the first depth map, the second depth map, and further comprising: generating virtual content based on the blended depth map; compositing the virtual content with the image of the physical environment to generate a display image; and displaying the display image, as provided by Cole. The modification provides an improved system and method for generating a depth map and improved techniques and systems for calculating and processing depth information, particularly for systems that need to allow a playback device to receive and/or use images of non-occluded portions of an environment along with at least some image content corresponding to occluded portions of the environment. (Cole, see par. [0008]).

Regarding claim 18, the device of claim 18 performs same method of claim 11.  Therefore, claim 18 is further rejected based on the same rationale as claim 11 set forth above and incorporated herein.

Regarding claim 19, the device of claim 19 performs same method of claim 12.  Therefore, claim 19 is further rejected based on the same rationale as claim 12 set forth above and incorporated herein.

Allowable Subject Matter
Claims 2 and 16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
VAREKAMP in view of Bleyer discloses claims 1 and 15 (as rejected above).  However, the limitations:
wherein generating the blended depth map includes generating a weighted sum of an inverse of the first depth map and an inverse of the second depth map, wherein a weighting factor of the weighted sum is proportional to the difference between a time of the image of the physical environment and a time of the second mesh, taken as a whole render the claim patentably distinct over prior arts.   

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KIM THANH THI TRAN whose telephone number is (571)270-1408. The examiner can normally be reached Monday-Friday 8:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALICIA HARRINGTON can be reached at 5712722330. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/KIM THANH T TRAN/Examiner, Art Unit 2615                                                                                                                                                                                                        

/JAMES A THOMPSON/Primary Examiner, Art Unit 2615
Read full office action
Prosecution Timeline

Feb 08, 2024
Application Filed
Mar 13, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/347,299
Patent 12578912
CHIP AND RELATED ELECTRONIC DEVICE
2y 5m to grant Granted Mar 17, 2026
18/305,904
Patent 12572997
GRAPHICS PROCESSING UNIT PROCESSING AND CACHING IMPROVEMENTS
2y 5m to grant Granted Mar 10, 2026
18/134,086
Patent 12567124
Technologies for Improved Whole Slide Imaging
2y 5m to grant Granted Mar 03, 2026
18/653,576
Patent 12561887
Mapping Texture Point Samples to Lanes of a Filter Pipeline
2y 5m to grant Granted Feb 24, 2026
19/011,351
Patent 12548277
SYSTEMS AND METHODS FOR USING AUGMENTED REALITY TO PREVIEW FALSE EYELASHES
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
77%
Grant Probability
99%
With Interview (+24.1%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 367 resolved cases by this examiner. Grant probability derived from career allow rate.