Last updated: April 19, 2026
Application No. 18/759,361
DYNAMIC OVERLAPPING OF MOVING OBJECTS WITH REAL AND VIRTUAL SCENES FOR VIDEO SEE-THROUGH (VST) EXTENDED REALITY (XR)

Non-Final OA §103
Filed
Jun 28, 2024
Examiner
TAHA, AHMED
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +75.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 8 resolved cases, 2023–2026
Examiner Intelligence

TAHA, AHMED View full profile →
Grants 62% of resolved cases
Career Allow Rate
5 granted / 8 resolved
+0.5% vs TC avg
Strong +75% interview lift
Without
With
+75.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
6.5%
-33.5% vs TC avg
§103
59.8%
+19.8% vs TC avg
§102
29.9%
-10.1% vs TC avg
§112
3.8%
-36.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 8 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 8, 15, 11, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Parra Pozo et. al (U.S. Patent Publication No. 2022/0413433), in view of Zobel et al. (U.S. Patent Publication No. 2023/0216999).
Regarding claim 1, Parra Pozo discloses a method comprising: obtaining image frames of a scene captured using one or more imaging sensors of a video see-through (VST) extended reality (XR) device and depth data associated with the scene, the image frames capturing a moving object and static scene contents, the moving object comprising a portion of a body of a user (interpreted as capture camera image frames and depth data for the same scene and the frames include a moving body portion of a user and other static scene content)[Parra Pozo: 0030 “The pipeline stage for capturing audio , color images , and depth images can be performed on the sender side using a capture devices such as microphones for audio , RGB ( or black - and - white ) cameras for color images , and depth sensors for depth images . As discussed below , these capture devices can come in a variety of types . These capture devices can be housed in one or more devices such as an XR system or a mobile phone .”][Parra Pozo: 0034 “the depth sensor may capture an image of the user while the user is gesturing with her hands in front of herself .”](teaches capturing color images (image frames) and depth images / depth data using cameras and depth sensors housed in an XR system, further teaches that the captured scene includes a user’s hands); generating masks associated with the moving object using a machine learning model trained to separate pixels corresponding to human skin from other portions of the scene (interpreted as produce moving object masks using a trained ML model that distinguishes human skin pixels from non-skin pixels)[Parra Pozo: 0020 “is a flow diagram illustrating a process used in some implementations of the present technology for training a machine learning model to perform depth densification , masking , and body modeling on holographic data”][Parra: Pozo: 0067 “The segmentation can identify portions of an image , such as the foreground ( the portion showing a user ) , portions of a body such as a torso , arms , hands , and head , and a worn XR device .”](teaches generating masks using a trained ML model that can identify parts of the body meaning it can differentiate human skin from non-human skin); reconstructing images of the moving object based on the image frames, the depth data, and the masks (interpreted as use the RGB frames + depth + the masks to reconstruct the moving object imagery)[Parra Pozo: 0029 “use the masks to segment the images into parts needed for hologram generation ; convert the depth images into a 3D mesh ; paint the 3D mesh with color data”](teaches using masks to segment the captured images then uses depth images to create a 3d mesh and applies color data to that mesh, that’s reconstruction of the moving object imagery/representation based on image frames + depth data + masks); but fails to explicitly disclose reconstructing images of the static scene contents based on the image frames and the depth data; combining the images of the moving object, the images of the static scene contents, and one or more virtual features to generate combined images; and rendering the combined images for presentation on at least one display of the VST XR device.
However, Zobel discloses reconstructing images of the static scene contents based on the image frames and the depth data (interpreted as using the RGB frames + depth data to reconstruct the static scene (environment/background))[Zobel: 0080 “an imaging system receives depth data (corresponding to an environment) captured by a depth sensor, and the imaging system receiving first image data (a depiction of the environment) captured by an image sensor.”][Zobel: 0005 “The imaging system generates second image data by modifying the first image data according to the first motion vectors and/or the second motion vectors. The second image data includes a second depiction of the environment from a different perspective than the first image data”](teaches reconstructing an environment depiction by using depth data plus first image to generate second image data which is interpreted as reconstruction), combining the images of the moving object, the images of the static scene contents, and one or more virtual features to generate combined images [Zobel: 0074 “Extended reality (XR) systems or devices can provide virtual content to a user and/or can combine real world views of physical environments (scenes) and virtual environments (including virtual content).”]; and rendering the combined images for presentation on at least one display of the VST XR device [Zobel: 0004 “An extended reality (XR) device is a device that displays an environment to a user”](teaches the ability to display the images).
Parra Pozo and Zobel are considered to be analogous to the claimed invention because they are in the same field of ZR/AR rendering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Parra Pozo to incorporate Zobel’s teachings of reconstructing, combining, and displaying the images. The motivation for such a combination would provide the benefit of improving compositing quality.
Regarding claim 4, Parra Pozo discloses the method of claim 1, but fails to explicitly disclose further comprising: correcting for parallax associated with the moving object in the images of the moving object; and separately correcting for parallax associated with the static scene contents in the images of the static scene contents.
However, Zobel discloses further comprising: correcting for parallax associated with the moving object in the images of the moving object (interpreted as adjust the moving object images to compensate for parallax (apparent positional shift caused by a change in viewpoint), so the moving object appears correctly aligned from the intended viewing point)[Zobel: 0079 “the change in perspective can be used for 3D stabilization of video data, for instance to reduce or eliminate parallax movements that may be caused by a user's unsteady hand holding the camera and/or by the user's footsteps”](teaches performing changes in perspective operation that reduces parallax); and separately correcting for parallax associated with the static scene contents in the images of the static scene contents [Zobel: 0076 “The imaging system uses the depth data to generate a first set of motion vectors. The first set of motion vectors correspond to a change in perspective of the depiction of the environment in the first image data, from a first perspective to a second perspective”][Zobel: 0079 “the change in perspective can be used for 3D stabilization of video data, for instance to reduce or eliminate parallax movements that may be caused by a user's unsteady hand holding the camera and/or by the user's footsteps”](teaches using depth data to compute a change in perspective for the environment depiction and using that to reduce parallax).
Parra Pozo and Zobel are considered to be analogous to the claimed invention because they are in the same field of ZR/AR rendering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Parra Pozo to incorporate Zobel’s teachings of reducing parallax. The motivation for such a combination would provide the benefit of improving stability.
Claims 8 and 15 are device and non-transitory machine-readable medium claims corresponding to claim 1 without any additional limitations. Thus, claims 8 and 15 are rejected for the same reasons as claim 1 above.
Claims 11 and 18 are device and non-transitory machine-readable medium claims corresponding to claim 4 without any additional limitations. Thus, claims 11 and 18 are rejected for the same reasons as claim 4 above.
Regarding claim 19, Parra Pozo and Zobel disclose the non-transitory machine readable medium of claim 15, wherein the instructions that when executed cause the at least one processor (Parra Pozo: 110; Fig. 1) to combine the images of the moving object, the images of the static scene contents, and the one or more virtual features comprise instructions that when executed cause the at least one processor to: overlap the images of the moving object with the images of the static scene contents based on estimated boundaries of the moving object within the scene (interpreted as combine/overlay the moving object imagery on top of the static scene imagery to the estimated boundaries which are a computed pixel region) [Parra Pozo: 0031 “The parts masks can be masks that specify which areas in an image show characteristics such as a mask for segmenting the sending user from the background , masks to identify particular body parts , such as the sending user's head , torso , arm segments , hands , etc. ”][Parra Pozo: 0103 “At block 604 , process 600 can segment the self view data . This segmentation can include first removing the bac round ( the parts of the images not depicting the user ) from the foreground ( the parts of the images depicting the user ) . In some cases , the segmentation can further identify , in the foreground , parts of the sending user's body , such as her face , arms , hands , torso , etc. In some implementations , the segmentation can be obtained from the results of block 508 or 708 .”](teaches segmenting the background images to the other images of the users body corresponding to combining images of the static scene and virtual features); and perform hole filling to generate image content for each portion of the scene behind the moving object that is disoccluded when the moving object moves (interpreted as when the moving object blocks part of what’s behind it, and then moves so that behind area becomes newly visible (disoccluded), the system creates (hole fills) image content for that behind area that would otherwise be missing)[Parra Pozo: 0034 “Thus , when the 3D mesh is created , there may be a hole in the representation of the sending user behind the user's hands . This can be filled in with the corresponding portion of the existing model of the sending user , created from a snapshot in the where the user's hands did not occlude the user's torso”](teaches filling holes in the 3d mesh).
Claims 2, 3, 5, 6, 9, 10, 12, 13, 16, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Parra Pozo et. al (U.S. Patent Publication No. 2022/0413433), in view of Zobel et al. (U.S. Patent Publication No. 2023/0216999), in further view of Hilliges et al. (U.S. Patent No. 9,552,673).
Regarding claim 2, Parra Pozo discloses the method of claim 1, but fails to explicitly disclose further comprising: generating first depth maps associated with the moving object based on the depth data; and generating second depth maps associated with the static scene contents based on the depth data;
wherein the masks are generated based on the first depth maps; wherein the images of the moving object are reconstructed based on the image frames, the first depth maps, and the masks; and wherein the images of the static scene contents are reconstructed based on the image frames and the second depth maps.
	However, Zobel discloses wherein the images of the static scene contents are reconstructed based on the image frames and the second depth maps (Zobel: Abstract “An imaging system receives depth data (corresponding to an environment) from a depth sensor and first image data (a depiction of the environment) from an image sensor.”)[Zobel: 0005 “The imaging system generates second image data by modifying the first image data according to the first motion vectors and/or the second motion vectors. The second image data includes a second depiction of the environment from a different perspective than the first image data.”](teaches reconstructing an environment depiction (second image data) from first image using depth data).
	However, Hilliges discloses further comprising: generating first depth maps associated with the moving object based on the depth data (Hilliges: Col. 8, Lines 58-62 “Using the background depth map 602 the received depth map is segmented 606 to obtain a foreground depth map 608 depicting the real objects or parts of real objects”)(teaches producing a foreground depth map from a received depth map via segmentation); and generating second depth maps associated with the static scene contents based on the depth data (Hilliges: Col. 8, Lines 50-56 “A reference background depth map is computed 600 by aggregating a plurality of depth maps from a depth camera (where the camera 106 comprises a depth camera). This may be done without any physical objects present in the interaction volume to provide a background depth map 602”)(teaches computing a background depth map (static scene contents) from depth maps corresponding to a second depth map); wherein the masks are generated based on the first depth maps (Hilliges: Col. 8, Lines 58-62 “Using the background depth map 602 the received depth map is segmented 606 to obtain a foreground depth map 608 depicting the real objects or parts of real objects”)(teaches producing a foreground depth map from a received depth map via segmentation); wherein the images of the moving object are reconstructed based on the image frames, the first depth maps, and the masks (Hilliges: Col. 9, Lines 53-56 “A background depth map (such as described above with reference to FIG. 6) is used to segment the RGB images in order to give foreground RGB images.”)(teaches using a depth map based segmentation to generate foreground RGB images, the depth based segmentation provides the foreground selection (mask), and the foreground depth map 608 is the first depth map, thus the resulting foreground RGB images are reconstructed based on image frames + first depth maps + masks).
	Parra Pozo, Zobel, and Hilliges are considered to be analogous to the claimed invention because they are in the same field of ZR/AR rendering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Parra Pozo to incorporate Zobel and Hilliges’s teachings of generating a foreground depth map and a background depth map from depth data via segmentation. The motivation for such a combination would provide the benefit of improving compositing quality.
Regarding claim 3, Parra Pozo, Zobel, and Hilliges disclose the method of claim 2, wherein generating the second depth maps comprises: accessing a previous depth map corresponding to a previous image frame (interpreted as retrieve/use a depth map from an earlier time (previous image frame), a depth map means a depth image where each pixel encodes a distance/depth value)[Parra Pozo: 0030 “The captured depth information can be formed into depth images , e.g. , as color or grayscale images where the hue or shade of each pixel represents a depth for that pixel .”][Parra Pozo: 0033 “the holographic calling system can also extrapolate to fill in the 3D mesh from other points of view , e.g. , based on previous depth images that captured other portions of the sending user”](teaches depth images as per pixel depth representations, further teaches using previous depth images); and for a current image frame, determining, based on the previous depth map, a depth for each portion of the scene behind the moving object that is disoccluded when the moving object moves.
Regarding claim 5, Parra Pozo and Zobel disclose the method of claim 1, but fail to explicitly disclose wherein combining the images of the moving object, the images of the static scene contents, and the one or more virtual features comprises: overlapping the images of the moving object with the images of the static scene contents based on estimated boundaries of the moving object within the scene.
However, Hilliges discloses wherein combining the images of the moving object, the images of the static scene contents, and the one or more virtual features comprises: overlapping the images of the moving object with the images of the static scene contents based on estimated boundaries of the moving object within the scene (interpreted as put the moving object images on top of the static scene images based on estimated boundaries which is a pixel region boundary separating moving object from the rest of the scene)(Hilliges: Col. 9, Lines 53-56 “A background depth map (such as described above with reference to FIG. 6) is used to segment the RGB images in order to give foreground RGB images”)(teaches segmenting (overlapping) RGB images to obtain foreground RGB images and segmentation necessarily identifies which pixels are foreground (moving object) vs non foreground which defines the estimated boundary in the scene).
Parra Pozo, Zobel, and Hilliges are considered to be analogous to the claimed invention because they are in the same field of ZR/AR rendering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Parra Pozo to incorporate Zobel and Hilliges’s teachings of segmenting the images. The motivation for such a combination would provide the benefit of improving compositing quality.
Regarding claim 6, Parra Pozo, Zobel, and Hilliges disclose the method of claim 5, wherein combining the images of the moving object, the images of the static scene contents, and the one or more virtual features further comprises: performing hole filling to generate image content for each portion of the scene behind the moving object that is disoccluded when the moving object moves (interpreted as when the moving object blocks part of what’s behind it, and then moves so that behind area becomes newly visible (disoccluded), the system creates (hole fills) image content for that behind area that would otherwise be missing)[Parra Pozo: 0034 “Thus , when the 3D mesh is created , there may be a hole in the representation of the sending user behind the user's hands . This can be filled in with the corresponding portion of the existing model of the sending user , created from a snapshot in the where the user's hands did not occlude the user's torso”](teaches filling holes in the 3d mesh).
Claims 9 and 16 are device and non-transitory machine-readable medium claims corresponding to claim 2 without any additional limitations. Thus, claims 9 and 16 are rejected for the same reasons as claim 2 above.
Claims 10 and 17 are device and non-transitory machine-readable medium claims corresponding to claim 3 without any additional limitations. Thus, claims 10 and 17 are rejected for the same reasons as claim 3 above.
Claim 12 is a device claim corresponding to claim 5 without any additional limitations. Thus, claim 12 is rejected for the same reasons as claim 5 above.
Claim 13 is a device claim corresponding to claim 6 without any additional limitations. Thus, claim 13 is rejected for the same reasons as claim 6 above.
Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Parra Pozo et. al (U.S. Patent Publication No. 2022/0413433), in view of Zobel et al. (U.S. Patent Publication No. 2023/0216999), in further view of Mironov et al. (U.S. Patent Publication No. 2022/0095149).
Regarding claim 7, Parra Pozo and Zobel disclose the method of claim 1, but fail to explicitly disclose further comprising: estimating a latency associated with a pipeline of the VST XR device; and
modifying the images of the moving object and the images of the static scene contents or the combined images based on an estimated head pose of the user when the rendered combined images will be presented to the user.
However, Mironov discloses further comprising: estimating a latency associated with a pipeline of the VST XR device [Mironov: 0034 “In general , overall latency of a VR system can be defined as the time from the moment a head pose is captured to the moment when a frame reflecting this head pose is presented”](defines how latency is defined); and modifying the images of the moving object and the images of the static scene contents or the combined images based on an estimated head pose of the user when the rendered combined images will be presented to the user (interpreted as change the moving object images using an estimated head pose for the future presentation time)[Mironov: 0039 “FIG . 7 is an example signaling diagram 700 between the headset 210 and the PC 110 showing pose prediction . By utilizing the prediction time based on a previous latency , the headset 210 can then predict an estimated head pose for a future moment by extrapolating the current head pose”][Mironov: 0033 “One way of addressing latency is to use timewarp , where an offset of the image is used when it is received to account for the time it took to receive it . That is , the image rendered is shifted for an older orientation of the headset by the amount the headset has moved since the moment the pose used for rendering was captured to the moment the image rendered with this pose is presented .”](teaches estimated head pose at presentation time and using that to reduce latency).
Parra Pozo, Zobel, and Mironov are considered to be analogous to the claimed invention because they are in the same field of ZR/AR rendering. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Parra Pozo and Zobel to incorporate Mironov’s teachings of reducing latency. The motivation for such a combination would provide the benefit of improving visual quality and reducing latency.
Claims 14 and 20 are device and non-transitory machine-readable medium claims corresponding to claim 7 without any additional limitations. Thus, claims 14 and 20 are rejected for the same reasons as claim 7 above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHMED TAHA whose telephone number is (571)272-6805. The examiner can normally be reached 8:30 am - 5 pm, Mon - Fri.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XIAO WU can be reached at (571)272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/AHMED TAHA/Examiner, Art Unit 2613      


/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613
Read full office action
Prosecution Timeline

Jun 28, 2024
Application Filed
Jan 07, 2026
Non-Final Rejection — §103
Mar 16, 2026
Applicant Interview (Telephonic)
Mar 16, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/411,475
Patent 12565101
WINDSHIELD AND VISIBILITY IMPROVEMENTS FOR DRIVERS IN ADVERSE WEATHER AND LIGHTING CONDITIONS
2y 5m to grant Granted Mar 03, 2026
18/143,708
Patent 12561880
AUGMENTED REALITY TATTOO
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+75.0%)
2y 5m
Median Time to Grant
Low
PTA Risk
Based on 8 resolved cases by this examiner. Grant probability derived from career allow rate.