DETAILED ACTION
Response to Amendment
The Applicants’ amendment, filed 12/10/2025, was received and entered. As the results, independent claims 1, 11 and 16 were amended with the feature of “displaying, by the second client device, the multilayer video wherein the background image and the backgroundless video have an orientation relative to each other that is based on
the pose” (emphasis added). No claims were cancelled as well as no new claims were added. Therefore, claims 1-20 are pending in this application at this time.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 6, 9, 11-12, 16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (US 9,024,997) in view of Culbertson et al. (US 2005/0168402).
Regarding claim 1, Kumar et al. (hereinafter “Kumar”) teaches a method, comprising:
capturing a video of a first participant with a first camera of a first client device (i.e., a first mobile device 100, as shown in figure 1, having a camera to capture a video of a user 120; col.4, lines 42-54);
removing a background of the video to create a backgroundless video (i.e., the captured background is removed and discarded to make the captured video as a backgroundless and leaving an image or foreground of user 120; col.5, lines 21-28);
creating a multilayer video by combining the backgroundless video with a background image (i.e., a new background is added to behind the image of user (foreground) 120 by a selection of the user associated with the first mobile device 100; col.5, lines 33-36);
transferring the multilayer video to a second client device (i.e., combining the new or appropriate background and foreground to create an output video and sent the output video to second mobile device; col.5, lines 37-44).
It should be noticed that Kumar failed to clearly teach the features of detecting a pose of a face of a second participant with a second camera of the second client device; and displaying, by the second client device, the multilayer video wherein the background image and the backgroundless video have an orientation relative to each other that is based on the pose. However, Culbertson et al. (Hereinafter “Culbertson”) teaches a system and method for communicating gaze in an immersive virtual environment. Culbertson further teaches interactions of participants within the immersive virtual environment. Culbertson further teaches the immersive virtual environment comprising four participants W, X, Y and Z as shown in figure 1A, wherein the participant W is a viewing participant to view other remote participants in the immersive virtual environment on his or her monitor (para.[0031]-[0032]). Culbertson further teaches that the monitor displays a view of the virtual environment from the viewpoint of the viewing participant within the virtual environment (para.[0039]) wherein the view of the virtual environment comprising a plurality of objects that are located within the virtual environment (para.[0040]). Culbertson further teaches the feature of tracking or detecting the physical gaze (i.e., a pose of face) of the viewing participant to determine a physical direction of the gaze within a physical environment that includes, in part, the viewing participant and the monitor (para.[0041]). Culbertson further teaches the feature of displaying on the monitor of participant W’, the viewed participants in the virtual environment as shown in figure 4B, have an orientation relative to each other (remote participants) based on the pose (the physical direction of the gaze of the viewing participant)(para.[0057], [0061] and [0063]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the feature of detecting a pose of a face of a second participant with a second camera of the second client device; and displaying, by the second client device, the multilayer video wherein the background image and the backgroundless video have an orientation relative to each other that is based on the pose, as taught by Culbertson, into view of Kumar in order to enhance the video-conference experience for the viewing participant.
Regarding claims 2, 6 and 20, Culbertson further teaches the physical direction of gaze of the viewing participant 310 is expressed on some physical coordinate system 320, such as horizontal and vertical components (horizontal and vertical orientations) (para.[0052]).
Regarding claim 9, Kumar further teaches that the videoconference server 130 compiles an output video with the appropriate background and foreground. The videoconference server 130 then sends the output video to second mobile device 140 via second base transceiver station 136 (col.5, lines 37-44).
Regarding claim 11, Kumar teaches a non-transitory computer-readable medium storing instructions (i.e., a videoconferencing server 330, as shown in figure 3, comprising connection logic 332, video logic 333 and database 331, as memories to store instructions performed by the videoconferencing server 330; col.9, line 26 through col.10, line 19) operable to cause one or more processors to perform operations comprising:
capturing a video of a first participant with a first camera of a first client device (i.e., a first mobile device 100, as shown in figure 1, having a camera to capture a video of a user 120; col.4, lines 42-54);
removing a background of the video to create a backgroundless video (i.e., the captured background is removed and discarded to make the captured video as a backgroundless and leaving an image or foreground of user 120; col.5, lines 21-28);
creating a multilayer video by combining the backgroundless video with a background image (i.e., a new background is added to behind the image of user (foreground) 120 by a selection of the user associated with the first mobile device 100; col.5, lines 33-36);
transferring the multilayer video to a second client device (i.e., combining the new or appropriate background and foreground to create an output video and sent the output video to second mobile device; col.5, lines 37-44).
It should be noticed that Kumar failed to clearly teach the features of detecting a pose of a face of a second participant with a second camera of the second client device; and displaying, by the second client device, the multilayer video wherein the background image and the backgroundless video have an orientation relative to each other that is based on the pose. However, Culbertson et al. (Hereinafter “Culbertson”) teaches a system and method for communicating gaze in an immersive virtual environment. Culbertson further teaches interactions of participants within the immersive virtual environment. Culbertson further teaches the immersive virtual environment comprising four participants W, X, Y and Z as shown in figure 1A, wherein the participant W is a viewing participant to view other remote participants in the immersive virtual environment on his or her monitor (para.[0031]-[0032]). Culbertson further teaches that the monitor displays a view of the virtual environment from the viewpoint of the viewing participant within the virtual environment (para.[0039]) wherein the view of the virtual environment comprising a plurality of objects that are located within the virtual environment (para.[0040]). Culbertson further teaches the feature of tracking or detecting the physical gaze (i.e., a pose of face) of the viewing participant to determine a physical direction of the gaze within a physical environment that includes, in part, the viewing participant and the monitor (para.[0041]). Culbertson further teaches the feature of displaying on the monitor of participant W’, the viewed participants in the virtual environment as shown in figure 4B, have an orientation relative to each other (remote participants) based on the pose (the physical direction of the gaze of the viewing participant)(para.[0057], [0061] and [0063]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the feature of detecting a pose of a face of a second participant with a second camera of the second client device; and displaying, by the second client device, the multilayer video wherein the background image and the backgroundless video have an orientation relative to each other that is based on the pose, as taught by Culbertson, into view of Kumar in order to enhance the video-conference experience for the viewing participant.
Regarding claim 12, Culbertson further teaches the physical direction of gaze of the viewing participant 310 is expressed on some physical coordinate system 320, such as horizontal and vertical components (horizontal and vertical orientations) (para.[0052]).
Regarding claim 16, Kumar teaches a system, comprising:
one or more memories; and one or more processors configured to execute instructions stored in the one or more memories (i.e., a videoconferencing server 330, as shown in figure 3, comprising connection logic 332, video logic 333 and database 331, as memories to store instructions performed by the videoconferencing server 330; col.9, line 26 through col.10, line 19) to:
capture a video of a first participant with a first camera of a first client device (i.e., a first mobile device 100, as shown in figure 1, having a camera to capture a video of a user 120; col.4, lines 42-54);
remove a background of the video to create a backgroundless video (i.e., the captured background is removed and discarded to make the captured video as a backgroundless and leaving an image or foreground of user 120; col.5, lines 21-28);
create a multilayer video by combining the backgroundless video with a background image (i.e., a new background is added to behind the image of user (foreground) 120 by a selection of the user associated with the first mobile device 100; col.5, lines 33-36);
transfer the multilayer video to a second client device (i.e., combining the new or appropriate background and foreground to create an output video and sent the output video to second mobile device; col.5, lines 37-44).
It should be noticed that Kumar failed to clearly teach the features of detecting a pose of a face of a second participant with a second camera of the second client device; and displaying, by the second client device, the multilayer video wherein the background image and the backgroundless video have an orientation relative to each other that is based on the pose. However, Culbertson et al. (Hereinafter “Culbertson”) teaches a system and method for communicating gaze in an immersive virtual environment. Culbertson further teaches interactions of participants within the immersive virtual environment. Culbertson further teaches the immersive virtual environment comprising four participants W, X, Y and Z as shown in figure 1A, wherein the participant W is a viewing participant to view other remote participants in the immersive virtual environment on his or her monitor (para.[0031]-[0032]). Culbertson further teaches that the monitor displays a view of the virtual environment from the viewpoint of the viewing participant within the virtual environment (para.[0039]) wherein the view of the virtual environment comprising a plurality of objects that are located within the virtual environment (para.[0040]). Culbertson further teaches the feature of tracking or detecting the physical gaze (i.e., a pose of face) of the viewing participant to determine a physical direction of the gaze within a physical environment that includes, in part, the viewing participant and the monitor (para.[0041]). Culbertson further teaches the feature of displaying on the monitor of participant W’, the viewed participants in the virtual environment as shown in figure 4B, have an orientation relative to each other (remote participants) based on the pose (the physical direction of the gaze of the viewing participant)(para.[0057], [0061] and [0063]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the feature of detecting a pose of a face of a second participant with a second camera of the second client device; and displaying, by the second client device, the multilayer video wherein the background image and the backgroundless video have an orientation relative to each other that is based on the pose, as taught by Culbertson, into view of Kumar in order to enhance the video-conference experience for the viewing participant.
Regarding claims 18 and 19, Culbertson further teaches the physical direction of gaze of the viewing participant 310 is expressed on some physical coordinate system 320 (shown in figure 3A and 3B), such as horizontal and vertical components (horizontal and vertical orientations) (para.[0052] and [0056]).
Claims 3-4 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (US 9,024,997) in view of Culbertson et al. (US 2005/0168402) as applied to claim 1 above, and further in view of Slotznick (US 11,601,618 as cited in the previous Office Action).
Regarding claim 3, Kumar and Culbertson, in combination, teaches all subject matters as claimed above, except for the feature of the background image being obtained from a storage server. However, Slotznick teach a communication system 1300, as shown in figure 13. The system 1300 includes a plurality of communication platforms for providing respective layers. Slotznick further teaches one of communication platforms, CP2-CP6, may store and/or generate the background layer for the devices of the participants (col.31, lines 10-20).
Therefore, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to incorporate the feature of the background
image being obtained from a storage server, as taught by Slotznick, into view of Krol
and Oz in order to provide the background image to devices of the participants.
Regarding claim 4, Slotznick further teaches the features of displaying the video
streams with multiple layers as shown in figures 1C; 3B, 4B and 6A, etc. (col.7, lines 4-
16; col.28, lines 20-40). Culbertson also teaches the limitations in paragraphs [0057], [0061] and [0063].
Claims 7-8 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (US 9,024,997) in view of Culbertson et al. (US 2005/0168402) as applied to claims 1 and 11 above, and further in view of Lindmark (US 2025/0078870 also cited in the previous Office Action).
Regarding claims 7, 8 and 14, Kumar and Culbertson, in combination, teaches all subject matters as claimed above, except for features of performing each of a horizontal
perspective transformation or vertical perspective transformation of the background
image according to either a yaw or pitch of the pose. However, Lindmark teaches a
system and a method of generating a 3D effect of a video stream based on movement,
e.g., yaw or pitch, etc. of a viewing participant. Lindmark further teaches a server
machine 150, as shown in figure 1, included 3D effect engine 151. The 3D effect engine
151 can dynamically modify a presentation position of a background layer of a video
stream to produce a video stream with a modified background that provides a 3D effect
for a viewing participant (para. [0042]). Lindmark further teaches positions or poses of
the viewing participant's head and/or eyes which can be tracked and detected by a local
camera associated with the viewing participant. The detected positions or poses of
viewing participant's head and/or eyes are includes movements of the viewing
participant looking to either right or left side, looking up, etc. as yaw or pitch poses so that portions of the background can become visible in the modified background layer
(para. [0024]).
Therefore, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to incorporate the features of performing
each of a horizontal perspective transformation or vertical perspective transformation of
the background image according to either a yaw or pitch of the pose, as taught by
Lindmark, into view of Kumar and Culbertson in order to provide updated video with the video background on the movements of the viewing participant.
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (US 9,024,997) in view of Culbertson et al. (US 2005/0168402) as applied to claim 1 above, and further in view of Chiou et al. (US 2016/0343389).
Regarding claim 10, Kumar and Culbertson, in combination, teaches all subject matters as claimed above, except for features of detecting the pose of the face by determining a direction and quantity of pixels that the face has moved relative to a previously detected pose of the face; and performing a perspective transformation of the background image based on the direction and the quantity of pixels. However, Chiou et al. (hereinafter "Chiou") teaches a voice control system comprising a voice receiving unit, an image capturing unit, a storage unit and a control unit. The control unit detects a mouth on a human. Based on the face features, such as movements, pixels and directions, a direction and quantity of pixels are determined and performed a perspective transformation of the background image based on the direction and the quantity of pixels (para. [0055]).
Therefore, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to incorporate the features of detecting the
pose of the face by determining a direction and quantity of pixels that the face has
moved relative to a previously detected pose of the face; and performing a perspective
transformation of the background image based on the direction and the quantity of
pixels, as taught by Chiou, into view of Kumar and Culbertson in order to provide the updated video and background corresponding to the movement of the head of the user.
Claims 13 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (US 9,024,997) in view of Culbertson et al. (US 2005/0168402) as applied to claim 1 and 11 above, and further in view of Kim et al. (EP 3148184 as submitted in the IDS and cited in the previous Office Action).
Regarding claim 13, Kumar and Culbertson, in combination, teach all subject matters as claimed above, except for the feature of the background image includes distance information for at least one layer of the background image; and the orientation is further based on the distance information. However, Kim et al. (hereinafter "Kim") teaches distance information or depth, as shown in figure 10 (para. [0085]-[0087]) for a purpose of displaying the video having background image changed according to the change of the location of the participant.
Therefore, it would have been obvious to one of ordinary skill in the art before the
effective filing date of the claimed invention to incorporate the features of the background image includes distance information for at least one layer of the background image; and the orientation is further based on the distance information, as taught by Kim, into view of Kumar and Culbertson in order to enhance the video-conference experience for the viewing participant.
Regarding claim 17, Kim teaches distance information or depth, as shown in figure 10 (para. [0085]-[0087]), as discussed above. Culbertson also teaches limitations of the claim in paragraphs [0057], [0061] and [0063].
Allowable Subject Matter
Claims 5 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicants' amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for response to this final action is set to expire THREE MONTHS from the date of this action. In the event a first response is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event will the statutory period for response expire later than SIX MONTHS from the date of this final action.
Any response to this final action should be mailed to:
BOX AF
Commissioner of Patents and Trademarks
Washington, D.C. 20231
Or faxed to:
(703) 872-9314 or (301) 273-8300 (for formal communications;
Please mark “EXPEDITED PROCEDURE”)
Or: If it is an informal or draft communication, please label “PROPOSED” or “DRAFT”)
Hand Carry Deliveries to:
Customer Service Window
(Randolph Building)
407 Dulany Street
Alexandria, VA 22314
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BINH TIEU whose telephone number is (571)272-7510. The examiner can normally be reached on 9-5. The Examiner’s fax number is (571) 273-7510 and E-mail address: BINH.TIEU@USPTO.GOV.
Examiner interviews are available via telephone or video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, FAN S. TSANG can be reached on (571) 272-7547.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (FAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the FAIR system, see fitp://nair-direct.usoto.aqev. If you have any questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/Binh Kien Tieu/Primary Examiner, Art Unit 2694
Date: February 2026