Last updated: April 19, 2026
Application No. 18/239,712
3D FACIAL RECONSTRUCTION AND VISUALIZATION IN DENTAL TREATMENT PLANNING

Non-Final OA §103
Filed
Aug 29, 2023
Examiner
MA, MICHELLE HAU
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Align Technology, Inc.
OA Round
3 (Non-Final)
Interview Optional

— +36.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 21 resolved cases, 2023–2026
Examiner Intelligence

MA, MICHELLE HAU View full profile →
Grants 81% — above average
Career Allow Rate
17 granted / 21 resolved
+19.0% vs TC avg
Strong +36% interview lift
Without
With
+36.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.0%
-37.0% vs TC avg
§103
84.2%
+44.2% vs TC avg
§102
6.4%
-33.6% vs TC avg
§112
5.5%
-34.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 21 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 28, 2026 has been entered.
Response to Amendment
The amendment filed January 28, 2026 has been entered. Claims 1, 4, 6-8, 10-16, 19, 21-25, and 28 remain pending in the application. 
Response to Arguments
Applicant’s arguments, see Page 10-12 and 14-15 of Remarks, filed January 28, 2026, with respect to the rejection(s) of claim(s) 1, 4, 6-8, 10-16, 19, 21-25, and 28 under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Ben-Hamadou et al. (US 20220175491 A1), Wang et al. (US 20190012804 A1), and Induchoodan et al. (Depth recovery from stereo images). 
Applicant's arguments on Page 13-14, regarding the lack of teaching from Wang, filed January 28, 2026 have been fully considered but they are not persuasive. 
The applicant argued that “Wang is a multi-camera system…” and thus fails to teach the limitations: “capturing, using at least one a mobile device with a built-in camera, a plurality of 2- dimensional images of a patient's face by moving the mobile device around the patient's face, wherein the plurality of 2-dimensional images are captured at different times”; “generating one or more camera matrices comprising camera poses for at least some of the plurality of 2-dimensional images, intrinsic camera parameters of the built-in camera, and extrinsic camera parameters of the built-in camera”; and “generating a plurality of depth maps from the plurality of 2-dimensional images, where each depth map of the plurality of depth maps is generated from a 2-dimensional image of the plurality of 2-dimensional images based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices”. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). 
Ben-Hamadou teaches capturing, using at least one a mobile device with a built-in camera, a plurality of 2- dimensional images of a patient's face by moving the mobile device around the patient's face, wherein the plurality of 2-dimensional images are captured at different times (Paragraph 0103, 0105, 0108 – “the first capture step 110 is performed, for example, by utilizing an image capture device…use of a smartphone equipped with a camera enables this capture step 110 to be performed… a plurality of images is captured. In some variants, at least two captured images are captured from a different angle relative to the patient's face. In some variants, at least one portion of a plurality of images is captured along a circular arc surrounding the patients face…during the step 110 of capturing at least one image of the patients face shape, at least one captured image is in two dimensions”; Note: the smartphone, which is a mobile device, captures 2D images along a circular arc around the patient’s face. It would be obvious to one of ordinary skill in the art that the images are captured at different times because there is one camera capturing multiple images at different angles, which would likely imply that the images are taken at different times in order for the camera to be moved to each angle). 
Wang teaches generating one or more camera matrices comprising camera poses for at least some of the plurality of 2-dimensional images, intrinsic camera parameters of the built-in camera, and extrinsic camera parameters of the built-in camera (Paragraph 0136 – “Specifically, the 2D projection p of a visible 3D point P∈P.sub.A.sup.i to a virtual camera i is computed as: p=K[R|t]P, where K and [R|t] are the respective intrinsic and extrinsic parameters of said virtual camera. More specifically, the 2D projection p may be computed as: 
    PNG
    media_image1.png
    61
    324
    media_image1.png
    Greyscale
 where K, R and t are the camera intrinsic (K) and extrinsic (R, t) parameters, respectively, of each virtual camera estimated by SfM”; Note: the camera matrix is shown by the equation and it is comprised of intrinsic and extrinsic parameters, as well as the camera pose (the x, y, z values)).
A combination of Wang and Induchoodan teaches generating a plurality of depth maps from the plurality of 2-dimensional images, where each depth map of the plurality of depth maps is generated from a 2-dimensional image of the plurality of 2-dimensional images (Wang: Paragraph 0115-0117 – “An operation 6.2 comprises generating depth map images corresponding to the stereo pair images, e.g. the left-eye panoramic image and the right-eye panoramic image…An operation 6.3 comprises re-projecting the stereo pair panoramic images to obtain a plurality of second images, each associated with a respective virtual camera. For example, operation 6.3 may correspond with operation 4.3 in FIG. 4. An operation 6.4 comprises re-projecting the stereo pair depth map images to generate a re-projected depth map associated with each second image”; Note: stereo pair images are 2D and are used to create depth maps) based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices (Induchoodan: Paragraph 1 in 2nd Col. of Page 2, Paragraph 1 in 1st Col. of Page 3, Paragraph 4 in 1st Col. of Page 5 – “This matrix has a number of entries called the intrinsic parameters (of the camera)…The matrix D describes the change of the pose of the camera (the world coordinate system) and has 6 independent parameters (3 parameters for R and other 3 parameters for ′I) called the extrinsic parameters…A depth map is a two-dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (z values) are stored in the array’s elements (pixels). By intersecting the optical rays of two matched pixels we reconstruct the corresponding 3-D point i.e., the depth information can be obtained by triangulation of corresponding image points with known stereoscopic camera parameters and disparity”; Note: camera poses and parameters, which are represented by camera matrices, are used to generate depth maps). 
While Wang is a multi-camera system, its method of generating camera matrices and depth maps can still be used with Ben-Hamadou since Ben-Hamadou discloses capturing multiple images of the same subject/scene from different angles. Therefore, the limitations are taught by the combination of the references. See the 103 rejection below for more details. 
Claim Objections
Claim 19 is objected to because of the following informalities: 
Claim 19 recites the limitation “the media” in line 1. There is insufficient antecedent basis for this limitation in the claim. Perhaps, “the media” should read “the plurality of 2-dimensional images”.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 10, 14-16, 19, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Ben-Hamadou et al. (US 20220175491 A1) in view of Wang et al. (US 20190012804 A1) and Induchoodan et al. (Depth recovery from stereo images), hereinafter Ben-Hamadou, Wang, and Induchoodan respectively. 
Regarding claim 1, Ben-Hamadou teaches a method (Paragraph 0083, 0117 – “method 100 of estimating and viewing a result of a dental treatment plan…the 3D reconstruction of the shape of the patient's face is performed”) comprising: 
capturing, using a mobile device with a built-in camera, a plurality of 2- dimensional images of a patient's face by moving the mobile device around the patient's face, wherein the plurality of 2-dimensional images are captured at different times (Paragraph 0103, 0105, 0108 – “the first capture step 110 is performed, for example, by utilizing an image capture device…use of a smartphone equipped with a camera enables this capture step 110 to be performed… a plurality of images is captured. In some variants, at least two captured images are captured from a different angle relative to the patient's face. In some variants, at least one portion of a plurality of images is captured along a circular arc surrounding the patients face…during the step 110 of capturing at least one image of the patients face shape, at least one captured image is in two dimensions”; Note: the smartphone, which is a mobile device, captures 2D images along a circular arc around the patient’s face. It would be obvious to one of ordinary skill in the art that the images are captured at different times because there is one camera capturing multiple images at different angles, which would likely imply that the images are taken at different times in order for the camera to be moved to each angle); 
transforming the plurality of 2-dimensional images into a 3-dimensional representation of the patient's face (Paragraph 0070, 0085-0087, 0108 – “the terms “reconstructing” and “modeling” are equivalent insofar as they designate the transposition of a physical object into a 3D virtual space…a step 105 of reconstructing a patients face shape in a first 3D virtual space comprising: a first step 110 of capturing, by an RGB-D capture device, at least one image of the patient's face; and a first step 120 of fitting the patients face shape onto a parametric model of at least one portion of the face based on at least one captured image of the face… during the step 110 of capturing at least one image of the patients face shape, at least one captured image is in two dimensions. These embodiments make the step 120 of fitting the face shape in three dimensions more complex, but make it possible to use capture devices that are less expensive and more widely available”; Note: the images are transformed into a 3D model of the patient’s face. It would have been obvious to one of ordinary skill in the art to combine the embodiments to have the images be 2D because as stated by Ben-Hamadou, it would “make it possible to use capture devices that are less expensive and more widely available” (Paragraph 0108)) 
and transmitting the 3-dimensional representation of the patient's face to one or more processing components to be integrated with a 3-dimensional representation of a dentition of the patient generated based on an intra-oral scan to visualize a dental treatment plan for the patient (Paragraph 0120, 0125, 0164-0165, 0169 – “The fitting step 120 is performed, for example, by utilizing an electronic computing circuit, such as a computer or server, configured to compute a face shape in a virtual geometric reference space from detected anatomical landmarks…The reconstruction step 125 can be carried out, for example, by a practitioner utilizing an intra-oral scanning device providing, when used, a 3D model of the dentition of a patient… After these steps of reconstructing the face shape 105 and of reconstructing the dentition 125 have been performed, an assembly step 142 takes place. During this assembly step 142, the two models reconstructed separately are assembled in the reference space. This step can be based on the 3D adjustment of the dentition model onto the face model by taking as reference facial photos in which a portion of the dentition and all of the face are visible…The step 145 of determining at least one dental treatment plan is performed”; Note: the face 3D model and dentition 3D model are assembled/integrated together. The dentition 3D model is generated from an intra-oral scan. It is implied that the 3D face model is transmitted to a processing component because it is acquired in the assembly step, and a processing component is necessary for performing the step).
Ben-Hamadou does not teach determining a plurality of camera poses, where each camera pose of the plurality of camera poses is determined for a 2-dimensional image of the plurality of 2-dimensional images; generating one or more camera matrices comprising camera poses for at least some of the plurality of 2-dimensional images, intrinsic camera parameters of the built-in camera, and extrinsic camera parameters of the built-in camera; generating a plurality of depth maps from the plurality of 2-dimensional images, where each depth map of the plurality of depth maps is generated from a 2-dimensional image of the plurality of 2-dimensional images based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices; fusing depth maps for two or more images together to generate a plurality of stereo depth maps; and generating the 3-dimensional representation of the patient's face based on combining information from the plurality of stereo depth maps. However, Wang teaches determining a plurality of camera poses, where each camera pose of the plurality of camera poses is determined for a 2-dimensional image of the plurality of 2-dimensional images (Paragraph 0096-0101 – “a plurality of first images 21 which are captured by a plurality of multi-directional image capture apparatuses 10 may be received…the first images 21 may be processed to generate a plurality of stereo-pairs of panoramic images 22. At operation 4.3, the stereo-pairs of panoramic images 22 may be re-projected to generate re-projected second images 23. At operation 4.4, the second images 23 from operation 4.3 may be processed to obtain positions and orientations of virtual cameras…At operation 4.6, positions and orientations of the plurality of multi-directional image capture apparatuses 10 may be determined based on the positions and orientations of the virtual cameras 11 determined at operation 4.4”; Note: the positions and orientations of the image capture apparatus, which are camera poses, are determined for images. The 2-dimensional images were previously taught by Ben-Hamadou in this rejection of claim 1 and could have been used in this process to determine the camera poses for each image); generating one or more camera matrices comprising camera poses for at least some of the plurality of 2-dimensional images, intrinsic camera parameters of the built-in camera, and extrinsic camera parameters of the built-in camera (Paragraph 0136 – “Specifically, the 2D projection p of a visible 3D point P∈P.sub.A.sup.i to a virtual camera i is computed as:
p=K[R|t]P
where K and [R|t] are the respective intrinsic and extrinsic parameters of said virtual camera. More specifically, the 2D projection p may be computed as: 
    PNG
    media_image1.png
    61
    324
    media_image1.png
    Greyscale
 where K, R and t are the camera intrinsic (K) and extrinsic (R, t) parameters, respectively, of each virtual camera estimated by SfM”; Note: the camera matrix is shown by the equation and it is comprised of intrinsic and extrinsic parameters, as well as the camera pose (the x, y, z values)); generating a plurality of depth maps from the plurality of 2-dimensional images, where each depth map of the plurality of depth maps is generated from a 2-dimensional image of the plurality of 2-dimensional images (Paragraph 0115-0117 – “An operation 6.2 comprises generating depth map images corresponding to the stereo pair images, e.g. the left-eye panoramic image and the right-eye panoramic image…An
operation 6.3 comprises re-projecting the stereo pair panoramic images to obtain a plurality of second
images, each associated with a respective virtual camera. For example, operation 6.3 may correspond with operation 4.3 in FIG. 4. An operation 6.4 comprises re-projecting the stereo pair depth map images to generate a re-projected depth map associated with each second image”; Note: stereo pair images are 2D and are used to create depth maps); fusing depth maps for two or more images together to generate a plurality of stereo depth maps (Paragraph 0117 – “An operation 6.4 comprises re-projecting the stereo pair depth map images to generate a re-projected depth map associated with each second image”; Note: there are stereo depth maps); and generating the 3-dimensional representation of the patient's face based on combining information from the plurality of stereo depth maps (Paragraph 0119 – “An operation 6.6 comprises determining a second 3D model based on the plurality of re-projected depth map images”; Note: the images of the patient’s face were previously taught by Ben-Hamadou in this rejection of claim 1 and could have been used in this process to generate a 3D representation). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to determine camera poses and matrices because when multiple images are taken of the same subject from different angles, like in Ben-Hamadou, knowing the camera poses and matrices makes image stitching or alignment easier for 3D reconstruction. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to generate and fuse depth maps for the benefit of realistic 3D reconstruction. The depth maps provide important data for transforming 2D images into 3D. Moreover, one depth map typically corresponds to one image, so if there are multiple images, there would be multiple depth maps that would need to be combined for a full reconstruction. Finally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to generate a 3D representation based on the depth maps. Using depth maps is one of the common methods to generate a 3D model, and the method is beneficial because it provides an efficient way of understanding and processing the spatial aspect of a scene, and it focuses only on the visible portion of the scene. 
Ben-Hamadou modified by Wang still does not teach that generating a plurality of depth maps is based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices. However, Induchoodan teaches generating a plurality of depth maps is based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices (Paragraph 1 in 2nd Col. of Page 2, Paragraph 1 in 1st Col. of Page 3, Paragraph 4 in 1st Col. of Page 5 – “This matrix has a number of entries called the intrinsic parameters (of the camera)…The matrix D describes the change of the pose of the camera (the world coordinate system) and has 6 independent parameters (3 parameters for R and other 3 parameters for ′I) called the extrinsic parameters…A depth map is a two-dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (z values) are stored in the array’s elements (pixels). By intersecting the optical rays of two matched pixels we reconstruct the corresponding 3-D point i.e., the depth information can be obtained by triangulation of corresponding image points with known stereoscopic camera parameters and disparity”; Note: camera poses and parameters, which are represented by camera matrices, are used to generate depth maps). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Induchoodan to generate depth maps based on camera poses and matrices because the pose and matrix information assists in estimating depth. Specifically, camera poses help determine object distances, and camera matrices help compute the shifts between images at different angles. Thus, these data are commonly used to generate depth maps. 
Regarding claim 4, Ben-Hamadou in view of Wang and Induchoodan teaches the method of claim 1. Ben-Hamadou further teaches wherein the plurality of 2- dimensional images comprise a video of the patient's face (Paragraph 0103, 0106, 0108 – “the first capture step 110 is performed, for example, by utilizing an image capture device. This image capture device is, for example, a camera or a video capture camera… In some variants, the capture device only captures videos. In some variants, the capture device captures a combination of photographs and videos… during the step 110 of capturing at least one image of the patients face shape, at least one captured image is in two dimensions”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take a video of the patient’s face because a video may be more convenient in providing a smooth capture of multiple angles compared to taking individual photographs. 
Regarding claim 10, Ben-Hamadou teaches a method of generating a 3-dimensional representation of a patient's face for dental treatment planning (Paragraph 0083, 0117 – “method 100 of estimating and viewing a result of a dental treatment plan…the 3D reconstruction of the shape of the patient's face is performed”), the method comprising: 
receiving, at a processing component communicatively coupled to a mobile device with a camera (Paragraph 0103, 0109 – “the first capture step 110 is performed, for example, by utilizing an image capture device…use of a smartphone equipped with a camera enables this capture step 110 to be performed… The first detection step 115 is performed, for example, by utilizing an electronic computing circuit, such as a computer or server, configured to detect at least one anatomical landmark from at least one captured image”; Note: the smartphone is a mobile device, and it is implied to be coupled to the computing circuit, which is equivalent to the processing component, since they communicate with each other to send/receive the images), a 3-dimensional representation of the patient's face generated (Paragraph 0070, 0085-0087, 0108 – “the terms “reconstructing” and “modeling” are equivalent insofar as they designate the transposition of a physical object into a 3D virtual space…a step 105 of reconstructing a patients face shape in a first 3D virtual space comprising: a first step 110 of capturing, by an RGB-D capture device, at least one image of the patient's face; and a first step 120 of fitting the patients face shape onto a parametric model of at least one portion of the face based on at least one captured image of the face”; Note: a 3D model of the patient’s face is generated) based on a plurality of 2-dimensional images of the patient's face captured from multiple angles by the mobile device at different times (Paragraph 0103, 0105, 0108 – “the first capture step 110 is performed, for example, by utilizing an image capture device…use of a smartphone equipped with a camera enables this capture step 110 to be performed… a plurality of images is captured. In some variants, at least two captured images are captured from a different angle relative to the patient's face. In some variants, at least one portion of a plurality of images is captured along a circular arc surrounding the patients face…during the step 110 of capturing at least one image of the patients face shape, at least one captured image is in two dimensions”; Note: the smartphone, which is a mobile device, captures 2D images along a circular arc around the patient’s face. It would be obvious to one of ordinary skill in the art that the images are captured at different times because there is one camera capturing multiple images at different angles, which would likely imply that the images are taken at different times in order for the camera to be moved to each angle);
and integrating, at the processing component, the 3-dimensional representation of the patient's face with a 3-dimensional representation of a dentition of the patient generated based on an intra-oral scan to visualize a dental treatment plan for the patient (Paragraph 0120, 0125, 0164-0165, 0169 – “The fitting step 120 is performed, for example, by utilizing an electronic computing circuit, such as a computer or server, configured to compute a face shape in a virtual geometric reference space from detected anatomical landmarks…The reconstruction step 125 can be carried out, for example, by a practitioner utilizing an intra-oral scanning device providing, when used, a 3D model of the dentition of a patient… After these steps of reconstructing the face shape 105 and of reconstructing the dentition 125 have been performed, an assembly step 142 takes place. During this assembly step 142, the two models reconstructed separately are assembled in the reference space. This step can be based on the 3D adjustment of the dentition model onto the face model by taking as reference facial photos in which a portion of the dentition and all of the face are visible…The step 145 of determining at least one dental treatment plan is performed”; Note: the face 3D model and dentition 3D model are assembled/integrated together. The dentition 3D model is generated from an intra-oral scan. It is implied that the 3D face model is integrated at a processing component because a processing component is necessary for performing the step).
Ben-Hamadou does not teach determining a plurality of camera poses, where each camera pose of the plurality of camera poses is determined for a 2-dimensional image of the plurality of 2-dimensional images; generating one or more camera matrices comprising camera poses for at least some of the plurality of 2-dimensional images, intrinsic camera parameters of the built-in camera, and extrinsic camera parameters of the built-in camera; generating a plurality of depth maps from the plurality of 2-dimensional images, where each depth map of the plurality of depth maps is generated from a 2-dimensional image of the plurality of 2-dimensional images based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices; fusing depth maps for two or more images together to generate a plurality of stereo depth maps; and generating the 3-dimensional representation of the patient's face based on combining information from the plurality of stereo depth maps. However, Wang teaches determining a plurality of camera poses, where each camera pose of the plurality of camera poses is determined for a 2-dimensional image of the plurality of 2-dimensional images (Paragraph 0096-0101 – “a plurality of first images 21 which are captured by a plurality of multi-directional image capture apparatuses 10 may be received…the first images 21 may be processed to generate a plurality of stereo-pairs of panoramic images 22. At operation 4.3, the stereo-pairs of panoramic images 22 may be re-projected to generate re-projected second images 23. At operation 4.4, the second images 23 from operation 4.3 may be processed to obtain positions and orientations of virtual cameras…At operation 4.6, positions and orientations of the plurality of multi-directional image capture apparatuses 10 may be determined based on the positions and orientations of the virtual cameras 11 determined at operation 4.4”; Note: the positions and orientations of the image capture apparatus, which are camera poses, are determined for images. The 2-dimensional images were previously taught by Ben-Hamadou in this rejection of claim 1 and could have been used in this process to determine the camera poses for each image); generating one or more camera matrices comprising camera poses for at least some of the plurality of 2-dimensional images, intrinsic camera parameters of the built-in camera, and extrinsic camera parameters of the built-in camera (Paragraph 0136 – “Specifically, the 2D projection p of a visible 3D point P∈P.sub.A.sup.i to a virtual camera i is computed as:
p=K[R|t]P
where K and [R|t] are the respective intrinsic and extrinsic parameters of said virtual camera. More specifically, the 2D projection p may be computed as: 
    PNG
    media_image1.png
    61
    324
    media_image1.png
    Greyscale
 where K, R and t are the camera intrinsic (K) and extrinsic (R, t) parameters, respectively, of each virtual camera estimated by SfM”; Note: the camera matrix is shown by the equation and it is comprised of intrinsic and extrinsic parameters, as well as the camera pose (the x, y, z values)); generating a plurality of depth maps from the plurality of 2-dimensional images, where each depth map of the plurality of depth maps is generated from a 2-dimensional image of the plurality of 2-dimensional images (Paragraph 0115-0117 – “An operation 6.2 comprises generating depth map images corresponding to the stereo pair images, e.g. the left-eye panoramic image and the right-eye panoramic image…An
operation 6.3 comprises re-projecting the stereo pair panoramic images to obtain a plurality of second
images, each associated with a respective virtual camera. For example, operation 6.3 may correspond with operation 4.3 in FIG. 4. An operation 6.4 comprises re-projecting the stereo pair depth map images to generate a re-projected depth map associated with each second image”; Note: stereo pair images are 2D and are used to create depth maps); fusing depth maps for two or more images together to generate a plurality of stereo depth maps (Paragraph 0117 – “An operation 6.4 comprises re-projecting the stereo pair depth map images to generate a re-projected depth map associated with each second image”; Note: there are stereo depth maps); and generating the 3-dimensional representation of the patient's face based on combining information from the plurality of stereo depth maps (Paragraph 0119 – “An operation 6.6 comprises determining a second 3D model based on the plurality of re-projected depth map images”; Note: the images of the patient’s face were previously taught by Ben-Hamadou in this rejection of claim 1 and could have been used in this process to generate a 3D representation). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to determine camera poses and matrices because when multiple images are taken of the same subject from different angles, like in Ben-Hamadou, knowing the camera poses and matrices makes image stitching or alignment easier for 3D reconstruction. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to generate and fuse depth maps for the benefit of realistic 3D reconstruction. The depth maps provide important data for transforming 2D images into 3D. Moreover, one depth map typically corresponds to one image, so if there are multiple images, there would be multiple depth maps that would need to be combined for a full reconstruction. Finally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to generate a 3D representation based on the depth maps. Using depth maps is one of the common methods to generate a 3D model, and the method is beneficial because it provides an efficient way of understanding and processing the spatial aspect of a scene, and it focuses only on the visible portion of the scene. 
Ben-Hamadou modified by Wang still does not teach that generating a plurality of depth maps is based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices. However, Induchoodan teaches generating a plurality of depth maps is based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices (Paragraph 1 in 2nd Col. of Page 2, Paragraph 1 in 1st Col. of Page 3, Paragraph 4 in 1st Col. of Page 5 – “This matrix has a number of entries called the intrinsic parameters (of the camera)…The matrix D describes the change of the pose of the camera (the world coordinate system) and has 6 independent parameters (3 parameters for R and other 3 parameters for ′I) called the extrinsic parameters…A depth map is a two-dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (z values) are stored in the array’s elements (pixels). By intersecting the optical rays of two matched pixels we reconstruct the corresponding 3-D point i.e., the depth information can be obtained by triangulation of corresponding image points with known stereoscopic camera parameters and disparity”; Note: camera poses and parameters, which are represented by camera matrices, are used to generate depth maps). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Induchoodan to generate depth maps based on camera poses and matrices because the pose and matrix information assists in estimating depth. Specifically, camera poses help determine object distances, and camera matrices help compute the shifts between images at different angles. Thus, these data are commonly used to generate depth maps. 
Regarding claim 14, Ben-Hamadou teaches a system (Paragraph 0103, 0109, 0184 – “the first capture step 110 is performed, for example, by utilizing an image capture device…The first detection step 115 is performed, for example, by utilizing an electronic computing circuit, such as a computer or server, …This reduction step 295 is performed, for example, by utilizing an electronic computing circuit executing a computation and image processing computer program”; Note: all the devices and circuits used in the method make up the system) comprising: 
a mobile device comprising a camera; and a processing component communicatively coupled to the mobile device (Paragraph 0103, 0109 – “the first capture step 110 is performed, for example, by utilizing an image capture device…use of a smartphone equipped with a camera enables this capture step 110 to be performed… The first detection step 115 is performed, for example, by utilizing an electronic computing circuit, such as a computer or server, configured to detect at least one anatomical landmark from at least one captured image”; Note: the smartphone is a mobile device, and it is implied to be coupled to the computing circuit, which is equivalent to the processing component, since they communicate with each other to send/receive the images), wherein the mobile device is configured to

capture a plurality of 2- dimensional images of a patient's face by moving the mobile device around the patient's face, wherein the plurality of 2-dimensional images are captured at different times (Paragraph 0103, 0105, 0108 – “the first capture step 110 is performed, for example, by utilizing an image capture device…use of a smartphone equipped with a camera enables this capture step 110 to be performed… a plurality of images is captured. In some variants, at least two captured images are captured from a different angle relative to the patient's face. In some variants, at least one portion of a plurality of images is captured along a circular arc surrounding the patients face…during the step 110 of capturing at least one image of the patients face shape, at least one captured image is in two dimensions”; Note: the smartphone, which is a mobile device, captures 2D images along a circular arc around the patient’s face. It would be obvious to one of ordinary skill in the art that the images are captured at different times because there is one camera capturing multiple images at different angles, which would likely imply that the images are taken at different times in order for the camera to be moved to each angle); 
transform the plurality of 2-dimensional images into a 3-dimensional representation of the patient's face (Paragraph 0070, 0085-0087, 0108 – “the terms “reconstructing” and “modeling” are equivalent insofar as they designate the transposition of a physical object into a 3D virtual space…a step 105 of reconstructing a patients face shape in a first 3D virtual space comprising: a first step 110 of capturing, by an RGB-D capture device, at least one image of the patient's face; and a first step 120 of fitting the patients face shape onto a parametric model of at least one portion of the face based on at least one captured image of the face… during the step 110 of capturing at least one image of the patients face shape, at least one captured image is in two dimensions. These embodiments make the step 120 of fitting the face shape in three dimensions more complex, but make it possible to use capture devices that are less expensive and more widely available”; Note: the images are transformed into a 3D model of the patient’s face. It would have been obvious to one of ordinary skill in the art to combine the embodiments to have the images be 2D because as stated by Ben-Hamadou, it would “make it possible to use capture devices that are less expensive and more widely available” (Paragraph 0108)) 
and transmit the 3-dimensional representation of the patient's face to the processing component; and the processing component is configured to integrate the 3-dimensional representation of the patient’s face with a 3-dimensional representation of a dentition of the patient generated based on an intra-oral scan to visualize a dental treatment plan for the patient (Paragraph 0120, 0125, 0164-0165, 0169 – “The fitting step 120 is performed, for example, by utilizing an electronic computing circuit, such as a computer or server, configured to compute a face shape in a virtual geometric reference space from detected anatomical landmarks…The reconstruction step 125 can be carried out, for example, by a practitioner utilizing an intra-oral scanning device providing, when used, a 3D model of the dentition of a patient… After these steps of reconstructing the face shape 105 and of reconstructing the dentition 125 have been performed, an assembly step 142 takes place. During this assembly step 142, the two models reconstructed separately are assembled in the reference space. This step can be based on the 3D adjustment of the dentition model onto the face model by taking as reference facial photos in which a portion of the dentition and all of the face are visible…The step 145 of determining at least one dental treatment plan is performed”; Note: the face 3D model and dentition 3D model are assembled/integrated together. The dentition 3D model is generated from an intra-oral scan. It is implied that the 3D face model is transmitted to a processing component because it is acquired in the assembly step, and a processing component is necessary for performing the step).
Ben-Hamadou does not teach determining a plurality of camera poses, where each camera pose of the plurality of camera poses is determined for a 2-dimensional image of the plurality of 2-dimensional images; generating one or more camera matrices comprising camera poses for at least some of the plurality of 2-dimensional images, intrinsic camera parameters of the built-in camera, and extrinsic camera parameters of the built-in camera; generating a plurality of depth maps from the plurality of 2-dimensional images, where each depth map of the plurality of depth maps is generated from a 2-dimensional image of the plurality of 2-dimensional images based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices; fusing depth maps for two or more images together to generate a plurality of stereo depth maps; and generating the 3-dimensional representation of the patient's face based on combining information from the plurality of stereo depth maps. However, Wang teaches determining a plurality of camera poses, where each camera pose of the plurality of camera poses is determined for a 2-dimensional image of the plurality of 2-dimensional images (Paragraph 0096-0101 – “a plurality of first images 21 which are captured by a plurality of multi-directional image capture apparatuses 10 may be received…the first images 21 may be processed to generate a plurality of stereo-pairs of panoramic images 22. At operation 4.3, the stereo-pairs of panoramic images 22 may be re-projected to generate re-projected second images 23. At operation 4.4, the second images 23 from operation 4.3 may be processed to obtain positions and orientations of virtual cameras…At operation 4.6, positions and orientations of the plurality of multi-directional image capture apparatuses 10 may be determined based on the positions and orientations of the virtual cameras 11 determined at operation 4.4”; Note: the positions and orientations of the image capture apparatus, which are camera poses, are determined for images. The 2-dimensional images were previously taught by Ben-Hamadou in this rejection of claim 1 and could have been used in this process to determine the camera poses for each image); generating one or more camera matrices comprising camera poses for at least some of the plurality of 2-dimensional images, intrinsic camera parameters of the built-in camera, and extrinsic camera parameters of the built-in camera (Paragraph 0136 – “Specifically, the 2D projection p of a visible 3D point P∈P.sub.A.sup.i to a virtual camera i is computed as:
p=K[R|t]P
where K and [R|t] are the respective intrinsic and extrinsic parameters of said virtual camera. More specifically, the 2D projection p may be computed as: 
    PNG
    media_image1.png
    61
    324
    media_image1.png
    Greyscale
 where K, R and t are the camera intrinsic (K) and extrinsic (R, t) parameters, respectively, of each virtual camera estimated by SfM”; Note: the camera matrix is shown by the equation and it is comprised of intrinsic and extrinsic parameters, as well as the camera pose (the x, y, z values)); generating a plurality of depth maps from the plurality of 2-dimensional images, where each depth map of the plurality of depth maps is generated from a 2-dimensional image of the plurality of 2-dimensional images (Paragraph 0115-0117 – “An operation 6.2 comprises generating depth map images corresponding to the stereo pair images, e.g. the left-eye panoramic image and the right-eye panoramic image…An operation 6.3 comprises re-projecting the stereo pair panoramic images to obtain a plurality of second images, each associated with a respective virtual camera. For example, operation 6.3 may correspond with operation 4.3 in FIG. 4. An operation 6.4 comprises re-projecting the stereo pair depth map images to generate a re-projected depth map associated with each second image”; Note: stereo pair images are 2D and are used to create depth maps); fusing depth maps for two or more images together to generate a plurality of stereo depth maps (Paragraph 0117 – “An operation 6.4 comprises re-projecting the stereo pair depth map images to generate a re-projected depth map associated with each second image”; Note: there are stereo depth maps); and generating the 3-dimensional representation of the patient's face based on combining information from the plurality of stereo depth maps (Paragraph 0119 – “An operation 6.6 comprises determining a second 3D model based on the plurality of re-projected depth map images”; Note: the images of the patient’s face were previously taught by Ben-Hamadou in this rejection of claim 1 and could have been used in this process to generate a 3D representation). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to determine camera poses and matrices because when multiple images are taken of the same subject from different angles, like in Ben-Hamadou, knowing the camera poses and matrices makes image stitching or alignment easier for 3D reconstruction. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to generate and fuse depth maps for the benefit of realistic 3D reconstruction. The depth maps provide important data for transforming 2D images into 3D. Moreover, one depth map typically corresponds to one image, so if there are multiple images, there would be multiple depth maps that would need to be combined for a full reconstruction. Finally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Wang to generate a 3D representation based on the depth maps. Using depth maps is one of the common methods to generate a 3D model, and the method is beneficial because it provides an efficient way of understanding and processing the spatial aspect of a scene, and it focuses only on the visible portion of the scene. 
Ben-Hamadou modified by Wang still does not teach that generating a plurality of depth maps is based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices. However, Induchoodan teaches generating a plurality of depth maps is based at least in part on the camera pose for the 2-dimensional image and the one or more camera matrices (Paragraph 1 in 2nd Col. of Page 2, Paragraph 1 in 1st Col. of Page 3, Paragraph 4 in 1st Col. of Page 5 – “This matrix has a number of entries called the intrinsic parameters (of the camera)…The matrix D describes the change of the pose of the camera (the world coordinate system) and has 6 independent parameters (3 parameters for R and other 3 parameters for ′I) called the extrinsic parameters…A depth map is a two-dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (z values) are stored in the array’s elements (pixels). By intersecting the optical rays of two matched pixels we reconstruct the corresponding 3-D point i.e., the depth information can be obtained by triangulation of corresponding image points with known stereoscopic camera parameters and disparity”; Note: camera poses and parameters, which are represented by camera matrices, are used to generate depth maps). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Induchoodan to generate depth maps based on camera poses and matrices because the pose and matrix information assists in estimating depth. Specifically, camera poses help determine object distances, and camera matrices help compute the shifts between images at different angles. Thus, these data are commonly used to generate depth maps. 
Regarding claim 15, Ben-Hamadou in view of Wang and Induchoodan teaches the system of claim 14. Ben-Hamadou further teaches wherein the processing component is further configured to receive the 3-dimensional representation of the patient's dentition from a dental scanner (Paragraph 0125, 0128-0129 – “The reconstruction step 125 can be carried out, for example, by a practitioner utilizing an intra-oral scanning device providing, when used, a 3D model of the dentition of a patient. This intra-oral scanning device then successively carries out: the second step 130 of capturing at least one image of an object representative of the dentition of the patient and… the second step 140 of modeling the 3D shape of at least one tooth based on the detected position of at least one set of points. The term “intra-oral scanning device” means both the scanning device and the electronic computing device connected to it that provides a model of the shape of at least one portion of the dentition of the patient”; Note: the dental/intra-oral scanner generates and provides a 3D model of the dentition. It is implied to be received by the electronic computing device, which is a processing component, since the electronic computing device is connected to the overall scanning device and is required to perform further processing). 
Regarding claim 16, Ben-Hamadou in view of Wang and Induchoodan teaches the system of claim 14. Ben-Hamadou further teaches wherein the processing component is further configured to integrate with one or more dental treatment planning applications to use a visualization of the dental treatment plan (Paragraph 0169, 0174, 0177, 0180 – “The step 145 of determining at least one dental treatment plan is performed, for example, by utilizing an electronic computing device configured to determine a possible treatment plan for at least one tooth… the treatment plan must improve or respect the patient's aesthetic criteria…The optional selection step 150 consists of selecting, via a human-machine interface, a treatment plan from among at least one treatment plan determined during the determination step 145. This selection step 150 can consist, for example, in clicking on a button of a digital interface representative of a treatment plan, the click triggering the selection of said treatment plan… The step 155 of computing an image is performed, for example, in a similar way to the first fitting step 120 and based on the treatment plan selected”; Note: the computing device, which is the processing component, incorporates software that allows for determining and visualizing a treatment plan. Since the software has a user interface, it is equivalent to the treatment planning application). 
Regarding claim 19, Ben-Hamadou in view of Wang and Induchoodan teaches the system of claim 14. Ben-Hamadou further teaches wherein the media includes a video of the patient's face (Paragraph 0103, 0106, 0108 – “the first capture step 110 is performed, for example, by utilizing an image capture device. This image capture device is, for example, a camera or a video capture camera… In some variants, the capture device only captures videos. In some variants, the capture device captures a combination of photographs and videos… during the step 110 of capturing at least one image of the patients face shape, at least one captured image is in two dimensions”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to take a video of the patient’s face because a video may be more convenient in providing a smooth capture of multiple angles compared to taking individual photographs. 
Regarding claim 21, Ben-Hamadou in view of Wang and Induchoodan teaches the system of claim 19. Ben-Hamadou further teaches wherein a result of integrating the 3-dimensional representation of the patient's face with the 3-dimensional representation of the patient's dentition allows an operator of the mobile device to visualize movement of facial tissues, lips, and facial expressions once the dental treatment plan is applied to the patient's teeth (Paragraph 0180 – “The step 155 of computing an image is performed, for example, in a similar way to the first fitting step 120 and based on the treatment plan selected and of the impact of this treatment plan on the parameters of the model of the patient's face shape. For example, the rotation a tooth can result in the movement of a lip or a cheek”; Note: the image after applying the treatment plan to the teeth visualizes movement of lips, facial tissue (cheeks), and expressions (lips and cheeks constitute expression)).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Ben-Hamadou in view of Wang, Induchoodan, and Peng et al. (US 11257299 B2), hereinafter Peng.
Regarding claim 6, Ben-Hamadou in view of Wang and Induchoodan teaches the method of claim 1. Ben-Hamadou does not teach wherein the 3-dimensional representation of the dentition is used as a reference for scaling the 3-dimensional representation of the patient's face. On the other hand, Peng teaches wherein the 3-dimensional representation of the patient's face is used as a reference for scaling the 3-dimensional representation of the dentition (Col. 10 lines 6-14 – “According to the oral cavity position parameter and the expression parameter of the three-dimensional face model, the position and the scale of the sample oral cavity model are configured and the oral cavity shape is adjusted respectively, so that the sample oral cavity model can keep consistent with the three-dimensional face model, the problem of synthesis distortion of the sample oral cavity model and the three-dimensional face model is resolved, and further the synthesis effect of the generated three-dimensional face expression model is good”; Note: the 3D face model is used as a referenced to scale the 3D model of the oral cavity model, which is the dentition). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Peng to scale the patient’s face relative to the dentition, because if the size of the dentition is preferred to the size of the face, then the face should be scaled to meet the size of the dentition. When combining the patient’s face and dentition, there is a finite number of ways to scale them; either the face is scaled relative to the dentition or the dentition is scaled relative to the face. One of ordinary skill in the art could have scaled the patient’s face relative to the dentition with a reasonable expectation of success and would have done so in the case when the shape and size of the patient’s dentition is ideal or preferred in comparison to the shape and size of the patient’s face. Therefore, it would have been obvious to try the solution of using the patient’s dentition as a reference to scale the patient’s face. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Peng to scale one of the models to the other. Since Ben-Hamadou places the models in the same space (Paragraph 0164 – “After these steps of reconstructing the face shape 105 and of reconstructing the dentition 125 have been performed, an assembly step 142 takes place. During this assembly step 142, the two models reconstructed separately are assembled in the reference space”), scaling one of the models would be beneficial for ensuring that the sizes of the models are accurate in reference to each other, which makes it easier to visualize the face and teeth for treatment.
Claims 7-8 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Ben-Hamadou in view of Wang, Induchoodan, and Hauer et al. (CN 112789008 A), hereinafter Hauer. 
Regarding claim 7, Ben-Hamadou in view of Wang and Induchoodan teaches the method of claim 1. Ben-Hamadou does not teach wherein the plurality of 2-dimensional images are captured via an application installed on the mobile device having access to a built-in camera. However, Hauer teaches wherein the plurality of 2-dimensional images are captured via an application installed on the mobile device having access to a built-in camera (Paragraph 0023 – “The figure illustrates a view of an application of a smart device 20 within the camera function of said smart device, by means of which the user is guided to produce the production of an optimal image of the face”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Hauer to capture images via an application that provides access to a camera because “by guiding the user through the production of facial images and/or films with the help of the application of smart devices”, the user is provided with “information on how to produce a suitable photograph or film” (Hauer: Paragraph 0010). 
Regarding claim 8, Ben-Hamadou in view of Wang, Induchoodan, and Hauer teaches the method of claim 7. Ben-Hamadou does not teach wherein the application provides real-time guidance for moving the mobile device around the patient's face in order to optimize the capture of the plurality of 2-dimensional images. However, Hauer teaches wherein the application provides real-time guidance for moving the mobile device around the patient's face in order to optimize the capture of the plurality of 2-dimensional images  (Paragraph 0023 – “FIG. 2 illustrates a scene within the application, where location points 13 are displayed on the user interface 10 of the smart device 20 along the center line 12, with the help of the smart device 20, the user is prompted to capture images at various locations by moving the smart device 20. In order to generate image data for producing a complex three-dimensional model, the user may be further prompted by arrows or the like to capture images from positions above and below the center line 12”; Note: the location points, prompts, and arrows are real-time guidance for moving the device). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Hauer to have an application that provides guidance for moving the device because “by guiding the user through the production of facial images and/or films with the help of the application of smart devices”, the user is provided with “information on how to produce a suitable photograph or film. This ensures that the photograph or film is in such a condition that it can be easily processed by software used for computer-aided design of restorations or treatment planning” (Hauer: Paragraph 0010).
Regarding claim 22, Ben-Hamadou in view of Wang, Induchoodan, and Hauer teaches the system of claim 14. Ben-Hamadou does not teach wherein a mobile application on the mobile device is configured to provide real-time guidance for moving the mobile device around the patient's face in order to optimize capture of the plurality of 2-dimensional images. However, Hauer teaches wherein a mobile application on the mobile device is configured to provide real-time guidance for moving the mobile device around the patient's face in order to optimize capture of the plurality of 2-dimensional images (Paragraph 0023 – “FIG. 2 illustrates a scene within the application, where location points 13 are displayed on the user interface 10 of the smart device 20 along the center line 12, with the help of the smart device 20, the user is prompted to capture images at various locations by moving the smart device 20. In order to generate image data for producing a complex three-dimensional model, the user may be further prompted by arrows or the like to capture images from positions above and below the center line 12”; Note: the location points, prompts, and arrows are real-time guidance for moving the device). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Hauer to have an application that provides guidance for moving the device because “by guiding the user through the production of facial images and/or films with the help of the application of smart devices”, the user is provided with “information on how to produce a suitable photograph or film. This ensures that the photograph or film is in such a condition that it can be easily processed by software used for computer-aided design of restorations or treatment planning” (Hauer: Paragraph 0010).
Claims 11-12 and 23-24 rejected under 35 U.S.C. 103 as being unpatentable over Ben-Hamadou in view of Wang, Induchoodan, Peng, and Mednikov et al. (US 20200000552 A1), hereinafter Mednikov.
Regarding claim 11, Ben-Hamadou in view of Wang and Induchoodan teaches the method of claim 10. Ben-Hamadou does not teach wherein integrating the 3- dimensional representation of the patient's face with the 3-dimensional representation of the dentition comprises: identifying an inner mouth region of the patient in the 3-dimensional representation of the patient's face; removing the inner mouth region from the 3-dimensional representation of the patient's face; determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's face to the 3-dimensional representation of the patient's intra-oral scan; replacing the inner mouth region removed from the 3-dimensional representation with the 3-dimensional representation of the patient's dentition; and aligning the 3-dimensional representation of the patient's face with the 3- dimensional representation of the patient's dentition based on the scaled-rigid relative transform. However, Mednikov teaches identifying an inner mouth region of the patient in the patient's face (Paragraph 0207 – “FIG. 13D shows a process of identifying a patient's lip contours and mouth opening. At panel 4710 an image of a patient's face, and imparticular the mouth and the region near the mouth, is shown. The lip contours and mouth opening are determined based on such an image”); and removing the inner mouth region from the patient's face (Paragraph 0235 – “the portion of the 2D image within the mouth opening 1010 is deleted or otherwise removed from the 2D image 600 of the patient. In some embodiments, the 3D bite model 1040, or a 2D projection of the 3D bite model 1040 is placed or rendered behind the 2D image 600 such that the 3D bite model 1040, or at least a portion of the 3D bite model 1040, is visible through the mouth opening of the 2D image of the patient”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Mednikov to identify an inner mouth region and removing it from the patient’s face for the benefit of being able to replace the mouth region of the person’s face with a model of their teeth and accurately match them (Mednikov: Paragraph 0198). This assists in providing a “facial visualization of the 3D model of the shape and position of the teeth undergoing the treatment plan” (Mednikov: Paragraph 278). 
Ben-Hamadou modified by Mednikov still does not teach determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's face to the 3-dimensional representation of the patient's intra-oral scan; replacing the inner mouth region removed from the 3-dimensional representation with the 3-dimensional representation of the patient's dentition; and aligning the 3-dimensional representation of the patient's face with the 3- dimensional representation of the patient's dentition based on the scaled-rigid relative transform. However, Peng teaches determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's dentition to the 3-dimensional representation of the patient's face (Col. 6 lines 43-66 – “the electronic device obtains an rotation angle and a translation of the sample oral cavity model relative to the three-dimensional face model…The rotation angle refers to a rotation angle and a rotation direction of the sample oral cavity model relative to the three-dimensional face model in a space (a three-dimensional coordinate system)…The translation refers to a relative distance between the sample oral cavity model and the three-dimensional face model on a plane (such as projected onto an x or y plane)…According to the oral cavity position parameter and the expression parameter of the three-dimensional face model, the position and the scale of the sample oral cavity model are configured”; Note: rigid transformations, rotation and translation, are performed followed by scaling, which is equivalent to a scaled-rigid relative transform); replacing the inner mouth region removed from the 3-dimensional representation with the 3-dimensional representation of the patient's intra-oral scan (Fig. 5 – The mouth region of the 3D face is replaced with a 3D oral cavity model; see screenshot of Fig. 5 below); and aligning the 3-dimensional representation of the patient's face with the 3-dimensional representation of the patient's dentition based on the scaled-rigid relative transform (Col. 6 lines 43-54 – “the electronic device…obtains target coordinate information of the sample oral cavity model in the three-dimensional face model based on the rotation angle, the translation, the current coordinate information, coordinate information of the three-dimensional face model, and coordinate information of an average face model; and moves the sample oral cavity model to a target position indicated by the target coordinate information”; Note: the oral cavity model is aligned with the face model using coordinates based on the scaled-rigid relative transform, which is the rotation and translation, of the oral cavity model). 

    PNG
    media_image2.png
    142
    398
    media_image2.png
    Greyscale

Screenshot of Fig. 5 (taken from Peng)
Peng does not directly teach determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's face to the 3-dimensional representation of the patient's dentition. However, Peng teaches determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's dentition to the 3-dimensional representation of the patient's face (Col. 6 lines 43-66 – “the electronic device obtains an rotation angle and a translation of the sample oral cavity model relative to the three-dimensional face model…The rotation angle refers to a rotation angle and a rotation direction of the sample oral cavity model relative to the three-dimensional face model in a space (a three-dimensional coordinate system)…The translation refers to a relative distance between the sample oral cavity model and the three-dimensional face model on a plane (such as projected onto an x or y plane)…According to the oral cavity position parameter and the expression parameter of the three-dimensional face model, the position and the scale of the sample oral cavity model are configured”; Note: rigid transformations, rotation and translation, are performed followed by scaling, which is equivalent to a scaled-rigid relative transform). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Peng to determine a scaled-rigid relative transform for the patient’s face to the dentition, because if the size and shape of the dentition is preferred to the size and shape of the face, then the face should be transformed and scaled to meet the size of the intra-oral scan. When combining the patient’s face and dentition, there is a finite number of ways to perform a scaled-rigid relative transform; either it is performed for the face relative to the dentition or for the dentition relative to the face. One of ordinary skill in the art could have performed a scaled-rigid relative transform for the patient’s face relative to the dentition with a reasonable expectation of success and would have done so in the case when the shape and size of the patient’s dentition is ideal or preferred in comparison to the shape and size of the patient’s face. Therefore, it would have been obvious to try the solution of performing a scaled-rigid relative transform for the patient’s face to the dentition. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Peng to scale/transform one of the models to the other. Since Ben-Hamadou places the models in the same space (Paragraph 0164 – “After these steps of reconstructing the face shape 105 and of reconstructing the dentition 125 have been performed, an assembly step 142 takes place. During this assembly step 142, the two models reconstructed separately are assembled in the reference space”), scaling/transforming one of the models would be beneficial for ensuring that the sizes and spacing of the models are accurate in reference to each other, which makes it easier to visualize the face and teeth for treatment.
Additionally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Peng to replace the inner mouth region of the face model with the dentition model and align them for the benefit of better visualizing the patient’s teeth in relation to their face. In other words, having a face model by itself would not allow the user to be able to see all of the patient’s teeth, and having only a teeth model would not allow the user to see how the teeth shape and positions affect the patient’s face. Therefore, assembling the face model and teeth model together creates a more helpful visualization. 
Regarding claim 12, Ben-Hamadou in view of Wang, Induchoodan, Mednikov, and Peng teaches the method of claim 11. Ben-Hamadou further teaches wherein the inner mouth region is identified in a 2-dimensional space using a machine learning-based model (Paragraph 0161-0162 – “a step 210 of automatically learning how to recognize a tooth based on at least one captured image, the computation step 135 being based on the machine learning performed. The learning step 210 is carried out, for example, by utilizing a machine learning algorithm based on a sample of captured images or of a set of points extracted from captured images representative of determined teeth”; Note: the machine learning model identifies teeth from captured (2D) images. Teeth are part of the inner mouth region). 
Regarding claim 23, Ben-Hamadou in view of Wang and Induchoodan teaches the system of claim 14. Ben-Hamadou does not teach wherein the processing component is configured to integrate the 3-dimensional representation of the patient's face with the 3-dimensional representation of the patient's dentition by: identifying an inner mouth region of the patient in the 3-dimensional representation of the patient's face; removing the inner mouth region from the 3-dimensional representation of the patient's face; determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's face to the 3-dimensional representation of the patient's intra-oral scan; replacing the inner mouth region removed from the 3-dimensional representation with the 3-dimensional representation of the patient's dentition; and aligning the 3-dimensional representation of the patient's face with the 3- dimensional representation of the patient's dentition based on the scaled-rigid relative transform. However, Mednikov teaches identifying an inner mouth region of the patient in the patient's face (Paragraph 0207 – “FIG. 13D shows a process of identifying a patient's lip contours and mouth opening. At panel 4710 an image of a patient's face, and imparticular the mouth and the region near the mouth, is shown. The lip contours and mouth opening are determined based on such an image”); and removing the inner mouth region from the patient's face (Paragraph 0235 – “the portion of the 2D image within the mouth opening 1010 is deleted or otherwise removed from the 2D image 600 of the patient. In some embodiments, the 3D bite model 1040, or a 2D projection of the 3D bite model 1040 is placed or rendered behind the 2D image 600 such that the 3D bite model 1040, or at least a portion of the 3D bite model 1040, is visible through the mouth opening of the 2D image of the patient”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Mednikov to identify an inner mouth region and removing it from the patient’s face for the benefit of being able to replace the mouth region of the person’s face with a model of their teeth and accurately match them (Mednikov: Paragraph 0198). This assists in providing a “facial visualization of the 3D model of the shape and position of the teeth undergoing the treatment plan” (Mednikov: Paragraph 278). 
Ben-Hamadou modified by Mednikov still does not teach determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's face to the 3-dimensional representation of the patient's intra-oral scan; replacing the inner mouth region removed from the 3-dimensional representation with the 3-dimensional representation of the patient's dentition; and aligning the 3-dimensional representation of the patient's face with the 3- dimensional representation of the patient's dentition based on the scaled-rigid relative transform. However, Peng teaches determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's dentition to the 3-dimensional representation of the patient's face (Col. 6 lines 43-66 – “the electronic device obtains an rotation angle and a translation of the sample oral cavity model relative to the three-dimensional face model…The rotation angle refers to a rotation angle and a rotation direction of the sample oral cavity model relative to the three-dimensional face model in a space (a three-dimensional coordinate system)…The translation refers to a relative distance between the sample oral cavity model and the three-dimensional face model on a plane (such as projected onto an x or y plane)…According to the oral cavity position parameter and the expression parameter of the three-dimensional face model, the position and the scale of the sample oral cavity model are configured”; Note: rigid transformations, rotation and translation, are performed followed by scaling, which is equivalent to a scaled-rigid relative transform); replacing the inner mouth region removed from the 3-dimensional representation with the 3-dimensional representation of the patient's intra-oral scan (Fig. 5 – The mouth region of the 3D face is replaced with a 3D oral cavity model; see screenshot of Fig. 5 above); and aligning the 3-dimensional representation of the patient's face with the 3-dimensional representation of the patient's dentition based on the scaled-rigid relative transform (Col. 6 lines 43-54 – “the electronic device…obtains target coordinate information of the sample oral cavity model in the three-dimensional face model based on the rotation angle, the translation, the current coordinate information, coordinate information of the three-dimensional face model, and coordinate information of an average face model; and moves the sample oral cavity model to a target position indicated by the target coordinate information”; Note: the oral cavity model is aligned with the face model using coordinates based on the scaled-rigid relative transform, which is the rotation and translation, of the oral cavity model). 
Peng does not directly teach determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's face to the 3-dimensional representation of the patient's dentition. However, Peng teaches determining a scaled-rigid relative transform for the 3-dimensional representation of the patient's dentition to the 3-dimensional representation of the patient's face (Col. 6 lines 43-66 – “the electronic device obtains an rotation angle and a translation of the sample oral cavity model relative to the three-dimensional face model…The rotation angle refers to a rotation angle and a rotation direction of the sample oral cavity model relative to the three-dimensional face model in a space (a three-dimensional coordinate system)…The translation refers to a relative distance between the sample oral cavity model and the three-dimensional face model on a plane (such as projected onto an x or y plane)…According to the oral cavity position parameter and the expression parameter of the three-dimensional face model, the position and the scale of the sample oral cavity model are configured”; Note: rigid transformations, rotation and translation, are performed followed by scaling, which is equivalent to a scaled-rigid relative transform). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Peng to determine a scaled-rigid relative transform for the patient’s face to the dentition, because if the size and shape of the dentition is preferred to the size and shape of the face, then the face should be transformed and scaled to meet the size of the intra-oral scan. When combining the patient’s face and dentition, there is a finite number of ways to perform a scaled-rigid relative transform; either it is performed for the face relative to the dentition or for the dentition relative to the face. One of ordinary skill in the art could have performed a scaled-rigid relative transform for the patient’s face relative to the dentition with a reasonable expectation of success and would have done so in the case when the shape and size of the patient’s dentition is ideal or preferred in comparison to the shape and size of the patient’s face. Therefore, it would have been obvious to try the solution of performing a scaled-rigid relative transform for the patient’s face to the dentition. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Peng to scale/transform one of the models to the other. Since Ben-Hamadou places the models in the same space (Paragraph 0164 – “After these steps of reconstructing the face shape 105 and of reconstructing the dentition 125 have been performed, an assembly step 142 takes place. During this assembly step 142, the two models reconstructed separately are assembled in the reference space”), scaling/transforming one of the models would be beneficial for ensuring that the sizes and spacing of the models are accurate in reference to each other, which makes it easier to visualize the face and teeth for treatment.
Additionally, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Peng to replace the inner mouth region of the face model with the dentition model and align them for the benefit of better visualizing the patient’s teeth in relation to their face. In other words, having a face model by itself would not allow the user to be able to see all of the patient’s teeth, and having only a teeth model would not allow the user to see how the teeth shape and positions affect the patient’s face. Therefore, assembling the face model and teeth model together creates a more helpful visualization. 
Regarding claim 24, Ben-Hamadou in view of Wang, Induchoodan, Mednikov, and Peng teaches the system of claim 23. Ben-Hamadou further teaches wherein the inner mouth region is identified in a 2-dimensional space using a machine learning-based model (Paragraph 0161-0162 – “a step 210 of automatically learning how to recognize a tooth based on at least one captured image, the computation step 135 being based on the machine learning performed. The learning step 210 is carried out, for example, by utilizing a machine learning algorithm based on a sample of captured images or of a set of points extracted from captured images representative of determined teeth”; Note: the machine learning model identifies teeth from captured (2D) images. Teeth are part of the inner mouth region). 
Claims 13 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Ben-Hamadou in view of Wang, Induchoodan and Mednikov.
Regarding claim 13, Ben-Hamadou in view of Wang and Induchoodan teaches the method of claim 10. Ben-Hamadou does not teach wherein the 3-dimensional representation of the patient's dentition provides a visualization of the patient's teeth after the dental treatment plan is completed. However, Mednikov teaches wherein the 3-dimensional representation of the patient's dentition provides a visualization of the patient's teeth after the dental treatment plan is completed (Paragraph 0277 – “The 3D bite model 3220 can be a 3D digital visualization of the teeth, and it can be used to visualize the shape and position of the teeth in a current state or in a future state wherein the future state can be the shape and position of the teeth during or after undergoing a treatment plan to reposition the teeth of the patient”; Note: the 3D bite model, which is dentition, provides a visualization of the patient’s teeth after undergoing a treatment plan). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Mednikov to provide a visualization of the patient’s teeth after the dental treatment plan is completed for the benefit of “increasing the effectiveness and acceptance of orthodontic treatment” (Mednikov: Paragraph 0006) and assisting dentists in making decisions about what treatment a patient should have. 
Regarding claim 25, Ben-Hamadou in view of Wang and Induchoodan teaches the system of claim 14. Ben-Hamadou does not teach wherein the processing component is configured to output a final result of integrating the 3-dimensional representation of the patient's face with the 3-dimensional representation of the patient's dentition to one or more dental planning tools for further analysis. However, Mednikov teaches wherein the processing component is configured to output a final result to one or more dental planning tools for further analysis (Paragraph 0314-0316 – “FIG. 48 illustrates a treatment-comparison display 4800 for comparing a plurality of treatment plans. A plurality of modified images 4810, 4820, and 4830 of the patient are simultaneously displayed, each showing a modified image according to a different, alternative treatment plan… the treatment-comparison display 4800 illustrates a scrollbar positioned below one of the modified images, which can be used to adjust the displayed stage of treatment for each modified image of the plurality of treatment plans, allowing the patient to see each change simultaneously”; Note: modified images, which show the patient’s face and teeth, are output to a treatment-comparison display, which is a dental planning tool). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Mednikov to output the result of the patient’s face and intra-oral scan to a dental planning tool because it “allows patients, as well as medical professionals, to view the progress and outcome of alternative treatments in parallel, which can aid in directly perceiving the costs and benefits of different treatment options” (Mednikov: Paragraph 0316).
Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Ben-Hamadou in view of Wang, Induchoodan, and Lim et al. (US 20100328307 A1), hereinafter Lim. 
Regarding claim 28, Ben-Hamadou in view of Wang and Induchoodan teaches the method of claim 1. Ben-Hamadou does not teach wherein generating the 3-dimensional representation comprises: generating a 3-dimensional mesh that lacks color or texture information; and applying texturing to the 3-dimensional mesh based on color information from the plurality of 2-dimensional images to generate the 3-dimensional representation. However, Lim teaches generating a 3-dimensional mesh that lacks color or texture information (Fig. 7, Paragraph 0038 – “FIG. 7 illustrates an example of a template mesh model used in an image processing apparatus according to an embodiment”; Note: the template mesh has no color or texture, as shown in Fig. 7; see screenshot of Fig. 7 below. It is implied that the template mesh was generated, as it could not exist otherwise); and applying texturing to the 3-dimensional mesh based on color information from the plurality of 2-dimensional images to generate the 3-dimensional representation (Paragraph 0097, 0101 – “a color tone, a skin pattern, and the like of the face portion 210 (FIG. 2) of the input color image 200 (FIG. 2) may be applied to the entire skin of the template texture 300… Each of vertexes may be mapped with a particular color value in the template texture 300 (FIG. 3). When color values of the template texture 300 (FIG. 3) are applied to the vertexes of the template mesh model 700, a 3D facial model may be generated”; Note: the colors in the input color image are used to create a texture, and the texture is used on the mesh to generate a 3D model). 

    PNG
    media_image3.png
    415
    375
    media_image3.png
    Greyscale

Screenshot of Fig. 7 (taken from Lim)
Since Ben-Hamadou already teaches using a parametric model (Paragraph 0087 – “first step 120 of fitting the patients face shape onto a parametric model of at least one portion of the face based on at least one captured image of the face”), it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ben-Hamadou to incorporate the teachings of Lim to generate a plain 3D mesh and then apply texture to it based on the 2D images for the benefit of being able to “generate a three dimensional (3D) facial model of a human being in a relatively quick period of time, without a need for a particular photographing environment, and thereby enhance an efficiency of 3D modeling” (Lim: Paragraph 0009).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Salazar-Gamarra et al. (Monoscopic photogrammetry to obtain 3D models by a mobile device: a method for making facial prostheses) teaches a method of generating a 3D model of a person’s face based on multiple 2D images of the person, captured by a mobile device. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE HAU MA whose telephone number is (571)272-2187. The examiner can normally be reached M-Th 7-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571) 270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE HAU MA/Examiner, Art Unit 2617                                                                                                                                                                                                        
/KING Y POON/               Supervisory Patent Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Aug 29, 2023
Application Filed
May 20, 2025
Non-Final Rejection — §103
Aug 11, 2025
Applicant Interview (Telephonic)
Aug 11, 2025
Examiner Interview Summary
Sep 12, 2025
Response Filed
Oct 20, 2025
Final Rejection — §103
Jan 28, 2026
Request for Continued Examination
Jan 30, 2026
Response after Non-Final Action
Mar 10, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/528,488
Patent 12602750
DIFFERENTIABLE EMULATION OF NON-DIFFERENTIABLE IMAGE PROCESSING FOR ADJUSTABLE AND EXPLAINABLE NON-DESTRUCTIVE IMAGE AND VIDEO EDITING
2y 5m to grant Granted Apr 14, 2026
17/832,771
Patent 12597208
BUILDING INFORMATION MODELING SYSTEMS AND METHODS
2y 5m to grant Granted Apr 07, 2026
18/250,082
Patent 12573217
SERVER, METHOD AND COMPUTER PROGRAM FOR GENERATING SPATIAL MODEL FROM PANORAMIC IMAGE
2y 5m to grant Granted Mar 10, 2026
18/481,308
Patent 12561851
HIGH-RESOLUTION IMAGE GENERATION USING DIFFUSION MODELS
2y 5m to grant Granted Feb 24, 2026
18/193,076
Patent 12536734
Dynamic Foveated Point Cloud Rendering System
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
81%
Grant Probability
99%
With Interview (+36.4%)
2y 7m
Median Time to Grant
High
PTA Risk
Based on 21 resolved cases by this examiner. Grant probability derived from career allow rate.