Last updated: April 19, 2026
Application No. 18/695,666
REMOVAL OF HEAD MOUNTED DISPLAY FOR REAL-TIME 3D FACE RECONSTRUCTION

Non-Final OA §102§103
Filed
Mar 26, 2024
Examiner
GALERA, PATRICK PAUL CONTRER
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Canon Kabushiki Kaisha
OA Round
1 (Non-Final)
Interview Optional

— +16.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 7 resolved cases, 2023–2026
Examiner Intelligence

GALERA, PATRICK PAUL CONTRER View full profile →
Grants 86% — above average
Career Allow Rate
6 granted / 7 resolved
+23.7% vs TC avg
Strong +17% interview lift
Without
With
+16.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
21 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.1%
-37.9% vs TC avg
§103
72.9%
+32.9% vs TC avg
§102
18.8%
-21.2% vs TC avg
§112
5.2%
-34.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 7 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The abstract of the disclosure is objected to because:
The abstract of the disclosure does not commence on a separate sheet in accordance with 37 CFR 1.52(b)(4) and 1.72(b).
A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).

Claim Objections
Claims 1-2, 4, 6-7, 9, 13-14 are objected to because of the following informalities: 
Claim 1: the phrase “the apparatus” in line 6 and “the user” in the 3rd to the last line lacks antecedent basis.
Claims 2, and 9: the phrase “the one or more of the first type of reference images” in the last 3 lines lacks antecedent basis.
Claim 4: the phrase “references images” should read “reference image”.
Claim 6: the phrase “the upper face region” in the last 3 lines lacks antecedent basis.
Clams 6, and 13: the phrase “the live captured image of the first user” in the last 2 lines and the phrase “the data of the full face image” in the first 2 lines lacks antecedent basis.
Claims 7, and 14: the phrase “the apparatus” in the last 2 lines lacks antecedent basis.
Appropriate correction is required.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 4,7, 8-9, 11, 14 is/are rejected under 35 U.S.C. 102(a1/a2) as being anticipated by Grau et al. (US 20180158246 A1, hereinafter “Grau”).

Regarding claim 1,
Grau teaches:
A server for removing an image of a first apparatus that occludes a portion of a face of a first user in a video stream comprising (Grau: ¶37, “. . . the image processing unit 410 may be . . . remote server . . .”; ¶81, “Process 900 may include “warp occluded face parts from images to 3D model” 912. Specifically, the result is a mapping to effectively remove the pixel data that represents the HMD itself, and replace it with the image data from the internal images that show the occluded area of the user's face. . .”; ¶155, “Examples of content may include. . .data from . . . videoconference, streaming video. . .”; NOTE: The first apparatus is the HMD. Grau’s system removes the pixel data of the HMD so that the whole face of the user is displayed because the HMD occludes a portion of a user’s face when worn. Also see figures 1A-1B, “FIG. 1B is an image of the user in FIG. 1A without the HMD and showing an image of the user as desired for a representation in a virtual or augmented reality according to the implementations provided herein” (Grau ¶4)): 
one or more processors (Grau: ¶136, “. . . one or more of the processors 1340 . . .”); and
one or more memories storing instructions that, when executed, configure the one or more processors to (Grau: ¶136, “. . . and at least one memory 1344 communicatively coupled to the processor to perform the operations . . .”):
receive captured video data of the first user wearing the apparatus that occludes the portion of the face of the first user (Grau: ¶3, “FIG. 1A is an image of a user wearing a head mounted display (HMD) . . .”; ¶33, “. . . cameras 306 are positioned to face toward the user to record external images of the user wearing the HMD . . .” ¶38, “. . . the color images of the external image capture device(s), which may be video images . . . pre-processing units may perform . . . division into frames . . . for sufficient image processing . . .”; ¶107, “The blending takes into account slight differences in shading and color from the external image . . .and may include varying the frame to frame rate of the blending depending on the rate of change of the image data (whether a stable flat area or an area with quick changes in color and/or brightness from frame to frame”) ; NOTE: Fig 1. Shows a first user wearing an apparatus that occludes the portion of the face of the first user. Grau’s system. Grau’s system pre-process the videos by dividing the frames. A frame of a video is a video data of a captured video data. Further, paragraph 107 details the process of blending the mapped non-occluded image with the external image to generate the final image showing the full face of the user without the HMD. The system takes account of the differences in shading and color from the external image, as well as changes in the brightness from frame to frame. The captured video data of Grau’s system includes shading data, color data, brightness data.);
obtain first type of reference images of the first user including the occluded portion and non-occluded portion of the face of the first user (Grau: ¶61-62, “process 600 may include “obtain external images of user without HMD” 603. Thus, such images then may contain the images of the entire face of the user including the occluded areas . . . During the learning stage to generate an appearance model, a library of appearance images may be generated for matching to the internal IR images during run-time. . .; NOTE: The obtained first type of reference images is Grau’s external images of user without HMD. The external images with the user’s entire face is referenced and used for creating an appearance model and library.);
acquire lighting information corresponding to lighting of the first user in the captured video stream (Grau: ¶32, “learning an appearance model based on color video images and a 3D model to provide the . . . brightness . . . for the occluded area . . .” ¶80, “. . . a neural network, such as a convolutional neural network (CNN) may be used to map the IR and lighting from non-occluded parts of the face. Neural networks (like CNNs) can take multiple inputs, in this case the external and internal images, and maps this to the anticipated facial image . . .; ¶173, “. . .and using a neural network to map at least lighting from non-occluded areas of the face to the at least part of the occluded area . . .; NOTE: The brightness and lighting from the non-occluded areas of the face are the lighting information acquired and used for mapping and providing brightness to the occluded area. Grau’s system acquires lighting information from color video images of the first user so it can provide brightness to the occluded area occluded by the HMD. Also see paragraph 107 as cited above regarding blending. Grau’s system takes account differences of the shading, color, and changes in brightness from frame to frame, which are lighting information corresponding to the user in the captured video stream. Therefore, Grau’s system acquires lighting information.);
generate data of the first user including a full face image using a trained machine learning model based on the obtained first type of reference images (Grau: Fig. 1A and 1B, ¶47, “Neural networks (like CNNs) can take multiple inputs, in this case the external and internal images, and maps this to the anticipated facial image. The mapping would be trained using a training set that contains many examples. . .”; ¶30, “the present method and system propose to augment the image of one or more external cameras with internal images of the occluded areas captured by one or more cameras mounted inside the HMD. This may be performed with closed virtual or augmented reality HMDs that completely cover a part of a user's face, as shown in FIG. 1A where an image 100 from an external camera has a face 102 of a user covered by a virtual reality HMD 104. Image 106 (FIG. 1B) shows the user's face 102 without the HMD (and without occlusions) and as desired for the face model in the virtual reality . . .”; NOTE: Figure 1A shows a user wearing an HMD covering the eye portion, Fig. 1B is the generated data image including a full face image where the HMD is removed. Grau’s system uses CNN to map and generate the image as shown in Fig. 1B.); and
causing the generated data of the user to be displayed on a display of a second apparatus that occludes a portion of a face of a second user (Grau: ¶55, “. . . The final image then may be displayed in the HMD to a second or more users so those additional users can view the full face of the first user. . .”; NOTE: The generated data of the user is the final image displayed on the HMD worn by a second user/users. If the second user is wearing the HMD, then it occludes a portion of the face of the second user).

Regarding claim 2, depending on claim 1,
Grau teaches:
The server according to claim 1, wherein execution of the instructions further configures the one or more processors to:
obtain, from the first type of reference images stored in a storage device, facial landmarks representing an entire face of the first user including the occluded portion and non- occluded portion of the face of the first user (Grau ¶61-63, “. . . process 600 may include “obtain external images of user without HMD” 603. Thus, such images then may contain the images of the entire face of the user including the occluded areas . . . include detecting and tracking facial landmarks with object detection, depth-sensing, depth-processing (creating a 3D map or space with objects in a captured scene), and background, foreground, and/or object segmentation”; ¶40, “The 3D head model unit 422 uses the images of the external image capture device(s) 402 to form a 3D model of at least the face or head . . .” NOTE: The images taken by the capture devices are inherently stored in a storage device for processing. The facial landmarks are then obtained with object detection from the images of the user captured. The images of the user without the HMD includes the occluded portion because the user is not wearing the HMD.);
provide, as an input to the trained machine learning model, a second type of reference images of the first user not wearing the first apparatus captured during a live image capture process at time preceding capturing of the video data that includes the lighting information (Grau: ¶44, “. . . images are taken . . . without the HMD being worn by the user, this process is repeated for each or multiple frames of a learning video sequence run provided by the internal and external cameras 402 and 406. The result is an appearance model for a user that has a library of stored 3D color images where each image shows at least possible facial expressions and eye gaze directions for the occluded area including the eyes and position of the eyebrows . . .”¶105-107, “The process 1100 may include “select matching non-occluded image from appearance model library” 1106, or in other words, generate the non-occluded image employing the appearance model. Thus, the matching or indexing is performed by using the information from internal and external cameras and determined . . . found through machine learning techniques . . . learning separation function based on neural networks, support vector machine (SVM), or cluster techniques . . . The process 1100 may include “blend non-occluded image(s) into external image” 1110. The blending takes into account slight differences in shading and color from the external image to the internal image . . . a more robust blending technique may be used when the results are still rough, and may include varying the frame to frame rate of the blending depending on the rate of change of the image data (whether a stable flat area or an area with quick changes in color and/or brightness from frame to frame . . .”; ¶39, “. . . external images taken of the user in various poses and eye gaze directions without the user wearing the HMD are provided  . . .”; NOTE: Grau’s system first creates an appearance model during a learning stage using images of the first user not wearing the first apparatus or the HMD. The images are taken during a live image capture process because the images are of a live person captured by a camera. In process 1106, machine learning is used for matching and uses the appearance model library, which is created using the external images of the user without the HMD. In process 1110, the blending is the final process where it takes into account lighting information, which is the brightness, color, and shading information during actual session where the occluded areas were filled and the face was reconstructed without the HMD. Since process 1106 happens before 1110, therefore the images of the first user not wearing the first apparatus were captured during a live image capture process at time preceding capturing of the video data that includes the lighting information.
NOTE: External images taken without HMD >> appearance model >> using machine learning for matching step 1106 >> fill image to reconstruct the face step 1108 as described in paragraph 106 >> blending operation taking account light information step 1110).
generate the data of the first user (Grau: Fig. 1B, Grau: Fig 1B, ¶45, “. . . During the actual use of the HMDs . . .a face occlusion synthesis unit 432 to perform synthesis to compute an image of the occluded parts to be placed on external images . . .  to form a representation of the user's head at the time. . .”; ¶80, “. . . anticipated facial image. . . ” NOTE: The generated data of the first user is Fig 1B or the anticipated facial image described in paragraph 80, where the HMD is removed from the image and the Fig 1B shows the full face of the user. Grau’s system first takes external images without the HMD as first type reference images which includes parts of the face which are normally occluded by an HMD during actual sessions) 
by using the lighting information to select a region from the one or more of the first type of reference images corresponding to the first apparatus (¶32, “. . . by learning an appearance model based on color video images and a 3D model to provide the position (or landmarks), color, brightness, and so forth for the occluded area. . .”; ¶80, “a neural network, such as a convolutional neural network (CNN) may be used to map the IR and lighting from non-occluded parts of the face. Neural networks (like CNNs) can take multiple inputs, in this case the external and internal images, and maps this to the anticipated facial image. The mapping would be trained using a training set that contains many examples of different persons under different lighting conditions. The mapping makes use of the registered and warped images, but could also work on unwarped images.”; NOTE: Grau’s system maps the lighting information based on the training set which uses first type of reference images (external images) under different lighting conditions. Grau’s system knows the lighting information for the external and internal images used by the CNN. Grau’s system also chooses the closest appearance image by matching against the appearance model as described in ¶124. Grau’s mapping inherently selects a region from the first type of reference images, which are the external and internal images used for training, so it can map the lighting to the anticipated facial image. The mapping of the lighting information would not be possible if the system does not have information about the region of the face it is supposed to do the mapping.

Regarding claim 4, depending on 1,
Grau teaches:
The server according to claim 1, 
wherein the trained machine learning model is user specific (Grau: ¶92, “the process 1000 may include “compute appearance model” . . . use machine learning, for example, a CNN method.”; ¶42, “An appearance model learning unit 428 generates a library of images of possible facial expressions for the occluded face area of the particular user . . . during a preliminary run to learn or train the appearance model.”; NOTE: The trained machine learning model is the trained appearance model which uses CNN machine learning. User specific is the particular user.)
and trained using a set of reference images of the first user to identify facial landmarks in each reference image of the set of references images (Grau: ¶42, “. . . during the learning stage, the internal and external images are both registered . . . the parameters for warping all can be obtained from the IR images such as eye gaze points, eyebrow landmarks, and so forth.”; ¶63, “. . . “pre-process image data” 604. . . include . . . division into frames. . . apply sufficient image processing to raw image. . . detecting and tracking facial landmarks with object detection . . .”; NOTE: the set of reference images are the internal and external images used for training the appearance model. They are images of the first user (user) as described in ¶44. Grau’s system obtains landmarks as parameters for warping such as eye gaze points, and eyebrow landmarks (¶42, 63). Grau’s system pre-process images including division into frames for image processing including detecting facial landmarks in each of the reference image (frames from captured video of the external and internal cameras, each frame correspond to external/internal image).)
and predict an upper face image from at least one of the first type of set of reference images used when removing the image of the first apparatus that occludes the face of the first user (¶Grau: ¶47, “an appearance model image matching unit 436 matches the mapped internal images to a matching occluded area image in the appearance model library 430 to generate a non-occluded image to be used on the avatar of the user. This also may be referred to as computing a synthetic image of the occluded parts. The matching is performed by matching algorithms such as sum of absolute differences (SADs) of face landmark points on the internal image and the non-occluded image from the library or by retrieving the occlusion from a CNN or similar machine learning technique”; ¶81, “. . . “warp occluded face parts from images to 3D model” 912. Specifically, the result is a mapping to effectively remove the pixel data that represents the HMD itself, and replace it with the image data from the internal images that show the occluded area of the user's face”; NOTE: Grau’s system predicts an upper face image by matching or computing a synthetic image of the occluded parts. The upper face image is Grau’s synthetic image of the occluded parts. When an HMD is worn by a user, the occluded parts inherently includes the upper face region as illustrated in Fig. 1A. Fig 1A is an image where the user is wearing an HMD, Fig 1B is an image of the user where the first apparatus is already removed. The first apparatus is the HMD, and the HMD occludes the upper face region of the first user.).

Regarding claim 7, depending on 1,
Grau teaches:
The server according to claim 1, wherein execution of the instructions further configures the one or more processors to 
obtain first facial landmarks of a non-occluded portion of the face (Grau: ¶63 - 64, “. . . detecting and tracking facial landmarks with object detection. . . adapted to a real person's face by identifying specific points (such . . . nose, the corners of the mouth, and so forth). . .”; ¶39, “. . . By one approach, external images taken while the user was wearing the HMD are provided to the 3D head model unit 422 . . .” NOTE: The nose and mouth are in the non-occluded portion when an HMD is worn by a person. If the user is wearing the HMD, the only visible part for landmark detection would be the non-occluded portion of the face which is the nose and mouth region.);
obtain second facial landmarks representing the entire face of the user including the occluded portion and non-occluded portion of the face of the user (Grau: ¶32, “. . . learning an appearance model based on color video images and a 3D model to provide the position (or landmarks), . . .; ¶39, “. . . external images taken of the user in various poses and eye gaze directions without the user wearing the HMD are provided relatively directly to an appearance model unit 424. . .”; NOTE: If the user is not wearing an HMD, the whole face is visible including the occluded portion (upper face region) and the non-occluded portion of the face of the user (lower face region). When the whole face is visible, then landmarks representing the entire face can be detected); and
provide one or more types of reference images of the user with the first and second obtained facial landmarks to the trained machine learning model (Grau ¶80, “. . . Neural networks (like CNNs) can take multiple inputs, in this case the external and internal images, and maps this to the anticipated facial image.; ¶42, “When a photo-realistic avatar is formed by using an RGB-D external camera, the parameters for warping all can be obtained from the IR images such as eye gaze points, eyebrow landmarks, and so forth. . .” NOTE: The landmark information is used for warping form a photo-realistic avatar. The trained machine learning model is the neural network CNN that maps the inputs to the anticipated image. If the first and second landmarks are missing, the mapping will not be accurate.)
to remove the apparatus from the received captured video data (Grau: Fig 1A-1B, ¶29, “. . . a photo-realistic avatar (PRA) that is generated by using video of a user. . .”;¶112, “The resulting image can be used as it is in video conferencing applications . . .”; ¶81, “. . .  remove the pixel data that represents the HMD itself, and replace it with the image data from the internal images that show the occluded area of the user's face.”; NOTE: Figure 1B shows an image with the apparatus (HMD) removed from Fig 1A. ).

Regarding claims 8-9, 11, and 14,
Method claims 8-9, 11, and 14 respectively are drawn to the methods corresponding to the instructions of using same as claimed in apparatus claims 1-2, 4, and 7 respectively. Therefore, methods claim 8-9, 11, and 14 respectively correspond to the instructions in the apparatus of claims 1-2, 4, and 7 respectively and are rejected for the same reasons of anticipation as used above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 3, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Grau in view of Ono (US 20210334517 A1, hereinafter “Ono”).

Regarding claim 3, depending on 1
Grau teaches:
The server according to claim 1, 
Although Grau teaches acquiring lighting information corresponding to lighting of the first user in the captured video stream, recording a first user not wearing the first apparatus, taking into account luminance, brightness, color, and shading information from the reference images, Grau fails to explicitly teach: wherein the lighting information includes information characterizing lighting applied to the first user and light being reflected from the first user not wearing the first apparatus.
The analogous art Ono teaches:
wherein the lighting information includes information characterizing 
lighting applied to the first user and light being reflected from the first user not wearing the first apparatus (Ono: ¶94, “As shown in FIG. 6, the TOF sensor 100 applies light that has been so modulated that the intensity of the light varies periodically from the irradiation unit 102 toward the subject. The applied light is reflected from the subject and is detected as the reflected light by the light receiving unit 104 of the TOF sensor 100. As shown in FIG. 6, the detected reflected light (lower part of FIG. 6) has a phase difference with respect to the irradiation light (upper part of FIG. 6). The phase difference is larger as the distance from the TOF sensor 100 to the subject is longer, and is smaller as the distance from the TOF sensor 100 to the subject is shorter.”; NOTE: The light applied is characterized by a light that has been so modulated applied to the first user (subject), and also characterized as irradiation light. The light being reflected from the first user is Ono’s applied light reflected from the first user (subject) and characterized by the information as detected by the light receiving unit 104. In reference to Fig. 1, the image of the first user (subject) is not wearing any apparatus and the whole face is shown.)
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention to combine Grau and Ono to include: wherein the lighting information includes information characterizing lighting applied to the first user and light being reflected from the first user not wearing the first apparatus.
The reason for doing so is to perform “identification accurately without being affected by variations in ambient light” (Ono: ¶10).

Regarding claim 10,
Method claim 10 is drawn to the method corresponding to the instructions of using same as claimed in apparatus claim 3. Therefore, method claim 10 corresponds to the instructions in the apparatus of claim 3 and is rejected for the same reasons of obviousness as used above.


Claims 5-6, and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Grau in view of Le Clerc et al. (US 20170178306 A1, hereinafter “Le Clerc”).

Regarding claim 5, depending on 4,
Grau teaches:
The server according to claim 4, 
Although Grau’s machine learning model is trained to use the images of the user to predict by matching images of the upper face region with the registered images during the learning stage, Grau fails to teach: wherein the trained machine learning model is further trained to use, a live captured image of a lower face region with lower face regions from the set of reference images to predict facial landmarks for an upper face region that corresponds to the live captured image of the lower face region.
The analogous art Le Clerc teaches:
wherein the trained machine learning model is further trained to use, a live captured image of a lower face region with lower face regions from the set of reference images (Le Clerc: ¶31, “. . . synthesizing, a first face in a first image, by determining a first occluded part of the first face that is occluded by an occluding object, for example a Head-Mounted Display (HMD). One or more first visible part(s) of the first face is (are) determined . . .  applying a regressor to the first attributes, the regressor modelling . . .”; ¶54, “a regressor . . . The regression function of the regressor is learnt from . . . a set of second parameters p.sub.i and λ.sub.i being associated with each second image of the training set of second images. . . machine learning . . . as random ferns . . . or random forests, may be used to obtain the desired hashing”; “¶43, “FIG. 5 shows a training method for learning an appearance model from a set of second images of second faces and for learning a regressor . .”; NOTE: The trained machine learning model is the regressor modelling which is trained based of the training set of second images. The live captured image is the training set of second images. Le Clerc system can distinguish between an upper face region, (which is the determined occluded part by an HMD) and an image of a lower face region with lower face regions (which is the visible parts of the first face since the occluded parts includes the upper portion of the face covered by the HMD). If the upper portion is not visible due to HMD covering the face, then the visible parts will be the lower face region of the face including the mouth, see Fig. 3. Fig. 3 illustrates a live person’s image. 
to predict facial landmarks for an upper face region that corresponds to the live captured image of the lower face region. (Le Clerc Fig. 6, ¶55-59, steps 60-64 “FIG. 6 shows a method of generating, e.g. synthesizing, a first face in a first image . . . provides the texture of the reconstructed face . . . as well as the locations of the landmarks defining the geometry of the face. . .”)
¶56, “. . . step 61, a first occluded and a first visible part of the first face are determined . . . input image 60. . . The first occluded part corresponds to the part of the first face in the first image that is occluded by an occluding object, for example a HMD . . . A first visible part of the first face may then be obtained as the part of the first face that is complementary to the first occluded part”; NOTE: Step 61:the upper face region is the occluded part of the face; the visible parts is the lower face region since the upper face is not visible due to the HMD.
¶57, “. . . step 62, first attributes representative of the first visible part are obtained . . . the first visible part is for example subdivided into a determined set of possibly overlapping rectangles (or triangles) . . .; NOTE: Step 62: Information for the lower face region (visible parts) are obtained.
¶58, “. . . step 63, first parameters {p.sub.i} and {λ.sub.i} representative of the first face of the first image are obtained by applying the regressor learnt at step 54 to the vector of first attributes, the output of the regressor being a vector of first parameters {p.sub.i} and {λ.sub.i} describing the first face of the first image 11. . .; “ NOTE: Step 63, )
¶59, “. . . step 64, the first face is generated based on the first parameters obtained at step 63. Synthesizing the first face corresponds to reconstructing the first face in its entirety, . . . provides the texture of the reconstructed face in the reference “shape-free” geometry as well as the locations of the landmarks defining the geometry of the face.; NOTE: Step 64: generate the final image including the occluded parts removing the HMD from the image.
NOTE: STEP 61 identifies upper (occluded parts) and lower face regions (visible parts), STEP 62: obtain information parameters for lower face region, STEP 63: applies regression modelling to the information obtained for the lower face region in step 62, the prediction of facial landmarks is the output parameters of the regression modelling used for synthesis including landmarks defining the geometry of the face in step 64; STEP 64: Synthesize the complete image without the HMD)
It would have been obvious to a person having ordinary skill in the art (PHOSITA) before the effective filing date of the claimed invention to combine Grau and Le Clerc and implement a regression modelling wherein the trained machine learning model is further trained to use, a live captured image of a lower face region with lower face regions from the set of reference images to predict facial landmarks for an upper face region that corresponds to the live captured image of the lower face region.
The reason for doing so is to “enhance the user experience during video-conference via HMD for example or enhance the possibility to recognize a person on a video (for example for security reason in video surveillance application). In addition, the reconstruction of the full face better conveys its expressions and emotions, which are known to be important non-verbal communication cues” (Le Clerc: ¶33).

Regarding claim 6, depending on 4,
The combination of Grau and Le Clerc teaches:
The server according to claim 4, 
wherein the data of the full face image is three dimensional data generated using extracted upper face regions of the set of reference images that are mapped onto the upper face region in the live captured image of the first user to remove the upper face region occluded by the first apparatus (Grau: Claim 12, “The method of claim 1 comprising: generating a 3D model of at least the person's face; generating an appearance model of the occluded area and comprising a library of appearance images of the person with different poses, facial expressions, or eye gaze directions than other appearance images; registering the location of internal images of the image data with the 3D model to register the internal images with external images from an external camera registered with the 3D model; synthesizing the internal images by finding a closest appearance image from the library and that best matches the internal image; blending the appearance image with a face displayed on a corresponding one of the external images to form a synthesized image of the occluded area; and merging the synthesized image with other parts of the corresponding external image.”; NOTE: The full face is the at least the person’s face in 3D. The extracted upper face regions is the occluded area. Grau finds a closest match in the library of reference images (appearance images) that best matches the upper face regions (occluded area). Since synthesis is done to the image matched, therefore, it is inherently extracted. Also, referring to FIG 1A – 1B, Fig 1A is the live captured image with a full face wearing an HMD covering the upper face region of the user. Fig 1B is the result of Grau’s methods, removing the HMD from the image reconstructing the full face clearly mapping the extracted upper face region data to the live image so the whole face is visible.).

Regarding claim 12,
Method claim 12 is drawn to the method corresponding to the instructions of using same as claimed in apparatus claim 5. Therefore, method claim 12 corresponds to the instructions in the apparatus of claim 5 and is rejected for the same reasons of obviousness as used above.

Regarding claim 13,
Method claim 13 is drawn to the method corresponding to the instructions of using same as claimed in apparatus claims 5-6. Therefore, method claim 13 corresponds to the instructions in the apparatus of claims 5-6 and is rejected for the same reasons of obviousness as used above.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PATRICK GALERA whose telephone number is (571)272-5070. The examiner can normally be reached Mon-Fri 0800-1700 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at 571-270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PATRICK P GALERA/Examiner, Art Unit 2617                                                                                                                                                                                                        /KING Y POON/Supervisory Patent Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Mar 26, 2024
Application Filed
Jan 08, 2026
Non-Final Rejection — §102, §103
Apr 03, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/555,546
Patent 12602567
SYSTEM AND METHOD FOR RENDERING A VIRTUAL MODEL-BASED INTERACTION
2y 5m to grant Granted Apr 14, 2026
18/264,402
Patent 12597184
IMAGE PROCESSING METHOD AND APPARATUS, DEVICE AND READABLE STORAGE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/430,695
Patent 12586549
Image conversion apparatus and method having timing reconstruction mechanism
2y 5m to grant Granted Mar 24, 2026
18/399,412
Patent 12579921
ELECTRONIC DEVICE HAVING FLEXIBLE DISPLAY AND METHOD FOR CONTROLLING THE SAME
2y 5m to grant Granted Mar 17, 2026
17/875,699
Patent 12491085
SYSTEMS AND METHODS FOR ORTHOPEDIC IMPLANT FIXATION
2y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+16.7%)
2y 5m
Median Time to Grant
Low
PTA Risk
Based on 7 resolved cases by this examiner. Grant probability derived from career allow rate.