DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 18–26 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because independent claim 18 recites a “computer readable storage medium including computer executable instructions ...” but does not include language limiting the medium to non-transitory forms. The specification also does not contain a clear definition or disclaimer indicating that the claimed medium excludes transitory embodiments, such as signals or carrier waves.
Accordingly, the broadest reasonable interpretation of claim 18 encompasses transitory signal embodiments, which are not statutory subject matter under 35 U.S.C. 101, as held in In re Nuijten, 500 F.3d 1346 (Fed. Cir. 2007). Transitory signals are not a machine, manufacture, composition of matter, or process, and therefore do not fall within any of the statutory categories of patentable subject matter. Therefore, claim 18 fails the eligibility requirement under 35 U.S.C. 101.
Dependent claims 19–26 fail to cure this deficiency of their independent claim 18 and are rejected accordingly.
Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1–20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention.
Independent claims 1, 9 and 18 recite computer-implemented functions including, among other limitations: (i) using a face landmark model to identify positions of facial features; and (ii) deriving a rough PD scale from an estimated pupillary distance, based on average iris diameter and average camera field of view; and (iii) estimating a distance from the user to the camera based on the detected size of a known reference object and the assumed field of view; and (iv) converting pixel-based landmark distances into real-world distances using the derived scale and estimated distance;
Applicant(s) is/are respectfully reminded that, for computer-implemented functional claims, “examiners should determine whether the specification discloses the computer and the algorithm (e.g., the necessary steps and/or flowcharts) that perform the claimed function in sufficient detail such that one of ordinary skill in the art can reasonably conclude that the inventor possessed the claimed subject matter.” MPEP §2161.01(I).
As an initial matter, the Examiner notes that claims 1, 9 and 18 appear to be originally-filed claims. However, originally-filed claim language does not necessarily satisfy written description. Accord Ariad Pharm., Inc. v. Eli Lilly & Co., 598 F.3d 1336, 1349 (Fed. Cir. 2010) (original claim language does not necessarily satisfy written description). That is to say, originally-filed claim 1, 9 and 18 themselves do not provide an algorithm that performs the functions in sufficient detail such that one of ordinary skill in the art can reasonably make and/or use the invention.
The Examiner further notes that independent claims 1, 9, and 18 recite the same functional limitations in system, method, and computer-readable medium form, including using a face landmark model to detect coplanar landmarks, deriving a rough PD scale, estimating user-to-camera distance, and calculating real-world landmark distances. Claims 2–8, 10–17, and 19–20 further recite only conventional or result-based variations of these same features, such as using specific facial points, reference object types, or measurement directions. These dependent claims do not provide additional structural or algorithmic disclosure that remedies the lack of written description in the independent claims. Accordingly, the subject matter for which written description is lacking resides in the core limitations of the independent claims, and the dependent claims fail to cure these deficiencies.
Furthermore, the specification does not describe an algorithm that performs the foregoing claimed functions in sufficient detail such that one of ordinary skill in the art can reasonably conclude that the inventor possessed the claimed subject matter at filing.
The specification states, in a results-oriented manner, that the system performs “3D face alignment using a face landmark model to identify positions of facial features… including the eyes, pupils… using the 3D landmark locations” (Fig. 3, Step 302), and that a user’s PD is estimated by “considering an average camera field of view (FOV) and the user's estimated iris diameter based on 2D iris landmarks” [0039] and that “a rough PD scale is derived and used to draw a rectangular region-of-interest” [0028]. It further states that the user-to-camera distance is determined using the card size in pixels and an estimated field of view [0039], and that a pixel-to-meter ratio is calculated and applied to determine physical PD or face width values [0040]. However, the specification does not provide concrete algorithmic steps for achieving those results across the full breadth of the claims. These are functional descriptions of results at a high level (functional descriptions define what a system, product, or process is expected to do, outlining its intended behavior, features, and capabilities without specifying the exact method for achieving them). The specification fails to disclose, in algorithmic terms, how the computer:
(i) uses a face landmark model to detect specific facial landmarks such as iris centers and face width points (e.g., what model architecture is used, what training data, output format, and inference criteria; how detection performs across various poses, occlusions, and lighting conditions); (ii) derives a rough PD scale based on assumed iris size and average FOV (e.g., what average iris diameter or FOV values are used, how biometric variation is accounted for, what “rough” means in terms of precision, and what formula or scaling procedure applies); (iii) estimates the user-to-camera distance using a detected reference object and an average FOV (e.g., how angular distortion or head tilt is addressed, what object-to-pixel conversion is performed, how deviations in device-specific FOVs are handled); and (iv) converts pixel distances between detected landmarks into real-world measurements (e.g., what scale is applied, how calibration errors or pose adjustments are handled, and what accuracy range is supported).
Applicant is also reminded: “if the specification does not provide a disclosure of the computer and algorithm in sufficient detail to demonstrate to one of ordinary skill in the art that the inventor possessed the invention including how to program the disclosed computer to perform the claimed function, a rejection under 35 U.S.C. 112(a) … for lack of written description must be made.” MPEP §2161.01(I).
Therefore, because an algorithm or structure for carrying out the functions of (i) using a face landmark model, (ii) deriving a rough PD scale, (iii) determining user-to-camera distance, and (iv) converting pixel distance to physical measurement, is not disclosed in sufficient detail such that one of ordinary skill in the art can reasonably conclude that the inventor invented and possessed the claimed subject matter, and in accordance with MPEP §2161.01(I), claims 1–20 are rejected for lack of written description under 35 U.S.C. §112(a).
Dependent claims 2–8, 10–17, and 19–20 fail to cure this deficiency of their respective independent claims and are rejected accordingly.
Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 5, 13, and 22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Each of claims 5, 13, and 22 refers to “from the RGB camera”. However, their respective independent claims (claims 1, 9, and 18) only introduce “a camera” and do not specify that the camera is an RGB camera. Although claims 4, 12, and 21 (respectively) narrow “a camera” to be “an RGB camera” but claims 5, 13, and 22 do not depend from those narrowing claims and therefore lack proper antecedent basis for the term “the RGB camera.”
Accordingly, it is unclear in each of claims 5, 13, and 22 whether “the RGB camera” refers to the same camera introduced earlier, to a newly introduced but undefined camera, or to a limitation only present in a separate dependent claim. This renders the scope of these claims unclear and indefinite under 35 U.S.C. §112(b). Appropriation is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1–3, 8–10, 16–20 and 25-26 are rejected under 35 U.S.C. §103 as being unpatentable over Choukroun (Choukroun et al, US 2016/0166145 A1, 2016) in view of Foley (Foley et al., US 6,535,223 B1, 2003).
Regarding claim 1, with deficiencies of Choukroun noted in square brackets [], Choukroun teaches a system for measuring facial features ([abstract], using an image sensor for determining at least one ocular measurement), the system comprising:
a camera configured to produce output signals on a channel corresponding to one or more images (in [0127], Choukroun discloses a system consisting of a camera (webcam, handheld camera, tablet, depth-map camera), a screen and along with other channel medium to acquire at least one image of the head of the user [0010])
a memory including computer-executable instructions; and a processor coupled to the memory and operative to execute the computer-executable instructions, the computer-executable instructions causing the processor to perform operations including (Choukroun discloses a system comprising: a processor as part of the system architecture [0129], a software package allowing the image capture and the calculations to be carried out may take the form of an application and/or an Internet site/plug-in for a web browser [0189], e.g. smartphone [0071]):
(i) performing 3D face alignment of a user's face using a face landmark model to identify positions of facial features, including 2D iris landmarks, wherein using the 3D landmark locations, a 3D pose of the user's face is estimated, wherein the 3D pose includes roll, pitch, yaw, x, y, and z coordinates (Choukroun teaches metric 3D facial detection and tracking to optimize the pose of the face for image capture [0166]. The system uses a 3D facial model matched to 2D facial landmarks for tracking [0073] and real-time feedback on facial orientation is provided [0070]. Further, Choukroun discusses transforming 3D points using rotation and translation matrices Rm and Tm between the camera and facial components like the gauge and pupils [0090–0108]. These transformations are the mathematical equivalent of estimating 3D pose, where: Rm (rotation matrix) corresponds to roll, pitch, and yaw, whereas Tm (translation vector) corresponds to x, y, z position. This combination is a standard formulation for 6-DOF pose estimation in computer vision. Choukroun does not literally list "roll, pitch, yaw, x, y, z", Choukroun describes the underlying mathematical operations that any person of ordinary skill in the art would recognize as implementation of those parameters in pose estimation);
(ii) performing reference card placement and detection, including [estimating the user's pupillary distance (PD) using an average camera field of view (FOV) and estimated iris diameter based on the 2D iris landmarks], wherein a derived rough PD scale is used to form a rectangular region-of-interest (ROI) on the user's forehead in an image captured by the camera, wherein the ROI illustrates correct placement of the reference card, and wherein the reference card is positioned in the ROI and detected by the camera ([0146] & [0171]: Choukroun teaches rectangular reference card placement on the forehead. Once PD is estimated, a ROI scale is derived by a POSITA to guide card placement, framing rectangles guide user in positioning face and card, card is placed on forehead and detected by camera); and
(iii) performing one or more facial measurement calculations, including converting the size of the detected reference card in pixels to real-world dimensions, forming a pixel-to-distance ratio, and using the calculated pixel-to-distance ratio to convert distance between landmarks in pixels to actual facial measurements in metric units, wherein given an estimated camera FOV, a distance between the user and the camera is determined, and calculating one or more facial measurements, wherein the one or more facial measurements are scaled according to the user's distance from the camera (in [0177-0188]: Choukroun teaches calculating the actual size of facial features using the known physical size of a rigid object, e.g. credit card or eyeglasses, visible in the image. The equation D = dP × tMM / tMP uses the pixel size of the reference card (tMP) and its known size (tMM) to derive a scaling ratio. Choukroun describes using the above ratio to convert pixel distance between pupils or other facial features into real-world millimeters. If focal length is not known, it may be estimated using the apparent size of the reference object; camera-to-user distance dC is derived using iris or gauge size and perspective. Pupillary distance (PD) and face width are explicitly calculated using this scaling method. Pupillary distance (PD) and face width are explicitly calculated using this scaling method. Choukroun adjusts final measurements based on camera distance dC, applying corrections when the distance is small, e.g. <60 cm).
As noted above in square brackets, Choukroun does not teach, but Foley teaches:
estimating the user's pupillary distance (PD) using an average camera field of view (FOV) and estimated iris diameter based on the 2D iris landmarks (Foley teaches estimating PD using iris known-diameter, human iris is of a relatively fixed size, and pupil spacing to estimate real PD [Fig.1 and/or Page.10, Col. 3, Line 35-50]).
It would have been prima facie obvious to a POSITA, before the effective filing date of the claimed invention, motivated to adapt Foley’s known iris-based pupillary distance estimation technique and biometric scaling principle as an initial calibration step integrated into Choukroun’s facial measurement system, wherein the reference object (e.g., credit card) is guided into position on the user’s forehead using a rectangular region of interest derived from an estimated PD and average camera field of view. This integration would enable (i) preliminary facial scale estimation using 2D iris landmarks and known anatomical dimensions, and (ii) guided reference card placement with improved accuracy and ease of use, thereby allowing Choukroun’s system to perform precise metric facial measurements while reducing reliance on manual card alignment or exact user-camera distance. The combination is a predictable integration of Foley’s biometric image-scaling approach with Choukroun’s camera-based facial measurement and visual alignment interface, to enhance user experience and improve facial measurement reliability in consumer-facing imaging systems.
Regarding claim 2, the combination of Choukroun and Foley teaches the system of claim 1, wherein the one or more calculated facial measurements include a pupillary distance (PD) ([0010]: Choukroun explicitly discloses measuring pupillary distance (PD), which is the primary measurement determined by Choukroun's system).
Regarding claim 3, the combination of Choukroun and Foley teaches the system of claim 1, wherein the one or more calculated facial measurements include a face width (FW) (in [0182-0183], Choukroun teaches using a reference object to determine a pixel-to-metric scaling ratio and then applying this ratio to various facial dimensions, including pupillary distance and distances between other facial landmarks. Foley reinforce use of facial landmarks for real-world metric calculations [Page. 11, Col. 6, Line 35-40]).
Regarding claim 8, the combination of Choukroun and Foley teaches the system of claim 1, wherein the reference object comprises a reference card ([0146]: Choukroun explicitly discloses using a reference card, such as a credit card, placed on the user’s forehead for scale calibration).
Regarding claims 9–13 and 16, 18–22 and 25, the rationale provided for claims 1–5 and 8 is incorporated herein. In addition, the system for measuring facial features of claim 1–5 and 8 corresponds to the method of claims 9–13 and 16, as well as the computer-readable storage medium of claims 18–22 and 25, and performs the steps disclosed herein. Therefore, the claims are all ineligible.
Regarding claim 17, the combination of Choukroun and Foley teaches the method of claim 9, wherein detecting the reference object comprises edge detection (in [0208] & [0211], Choukroun teaches how the card’s silhouette or outline is used to identify its projected shape: see the line / outline formed [corresponding to the edge detection] between these points and thus follow the silhouette of the card exactly, that form the card's perimeter; using this geometric shape to extract dimensions for scaling; tracking the projective deformation of the card outline in the image).
Regarding claim 26, the rationale provided for claim 17 is incorporated herein. In addition, the method of using a camera for measuring point distances of facial features of claim 17 corresponds to the computer-readable storage medium of claim 26, and performs the steps disclosed herein. Therefore, the claims are all ineligible.
Claims 4–5, 12–13 and 21–22 are rejected under 35 U.S.C. §103 as being unpatentable over Choukroun as modified by Foley, and further in view of Yang (Yang et al., US 2014/0160123 A1, 2014).
Regarding claim 4, Choukroun as modified by Foley fails to explicitly disclose where Yang teaches the system of claim 1, wherein the camera comprises an RGB camera including an RGB sensor, wherein the RGB camera is configured to produce output signals on an RGB channel corresponding to one or more RGB images (In [0005], Yang explicitly teaches using an RGB camera including an RGB sensor for facial/ head image capture and 3D modeling. Use of an RGB camera was well-known and expected in such systems, almost all standard consumer and professional digital cameras are RGB cameras).
It would have been prima facie obvious to a POSITA, before the effective filing date of the claimed invention, to implement Choukroun’s [as modified by Foley] facial measurement system using an RGB camera as taught by Yang, because Yang demonstrates that RGB cameras are well-suited and commonly used for facial tracking and analysis. Substituting Yang’s explicitly disclosed RGB camera into Choukroun’s system is a predictable use of prior art elements according to their established functions.
Regarding claim 5, Choukroun as modified by Foley fails to explicitly disclose where Yang teaches the system of claim 1, wherein the one or more images comprise a plurality of frames of video from the RGB camera (In [0005] & [0006], Yang explicitly teaches capturing a plurality of RGB video frames from an RGB camera for facial imaging).
It would have been prima facie obvious to a POSITA, before the effective filing date of the claimed invention, to implement Choukroun’s [as modified by Foley] facial measurement system using video frames from an RGB camera, as taught by Yang, in order to improve robustness and accuracy of facial tracking and alignment. The combination merely applies Yang’s known RGB video acquisition technique to Choukroun’s facial measurement workflow and yields predictable results.
Regarding claims 12–13, 21–22, the rationale provided for claims 4–5 is incorporated herein. In addition, the system for measuring facial features of claim 4–5 corresponds to the method of claims 12–13, as well as the computer-readable storage medium of claims 21–22, and performs the steps disclosed herein. Therefore, the claims are all ineligible
Claims 6–7, 14–15 and 23–24 are rejected under 35 U.S.C. §103 as being unpatentable over Choukroun as modified by Foley, and further in view of Tang (Tang et al., WO 2016/026063 A1, 2016).
Regarding claim 6, Choukroun as modified by Foley fails to explicitly disclose where Tang teaches the system of claim 1, wherein the face landmark model comprises a convolutional neural network ([0016]: Tang discloses a face landmark system comprises a convolutional neural network).
It would have been prima facie obvious to a POSITA, before the effective filing date of the claimed invention, to incorporate Tang’s convolutional detection neural network into Choukroun’s [as modified by Foley] facial measurement system in order to enhance accuracy and robustness of facial landmark localization. Tang teaches a multi-layer convolutional neural network (CNN) architecture trained to extract facial features and predict landmark positions, which includes deep layers and auxiliary tasks, [0025] & [0053]. The combination is a predictable integration of Tang’s known deep-learning-based facial landmark models with Choukroun’s image-based facial measurement system, improving precision in face alignment and supporting landmark detection under varied lighting and pose conditions.
Regarding claim 7, Choukroun as modified by Foley fails to explicitly disclose where Tang teaches the system of claim 1, wherein the face landmark model comprises a deep landmark detection network ([0025]: Tang discloses a face landmark detection system comprises a convolutional neural network, "a plurality of (for example, three) convolution-pooling layers comprising one or more (for example, three) convolutional layers and one or more (for example, three) pooling layers, one convolutional layer and one fully connected layer". A training unit optimizes multiple auxiliary tasks alongside landmark detection, creating a multi-task deep learning structure).
It would have been prima facie obvious to a POSITA, before the effective filing date of the claimed invention, to implement Tang’s deep landmark detection network into Choukroun’s [as modified by Foley] facial measurement system in order to improve accuracy and stability of facial landmark identification used for measurement and alignment. The combination is a predictable integration of Tang’s known deep-learning-based landmark detector into Choukroun’s image-based facial measurement system.
Regarding claims 14–15 and 23–24, the rationale provided for claims 6-7 is incorporated herein. In addition, the system for measuring facial features of claims 6-7 corresponds to the method of claims 14-15, as well as the computer-readable storage medium of claims 23-24, and performs the steps disclosed herein. Therefore, the claims are all ineligible.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEN KUDO whose telephone number is (571)272-4498. The examiner can normally be reached M-F 8am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
KEN KUDO
Examiner
Art Unit 2671
/KEN KUDO/Examiner, Art Unit 2671
/VINCENT RUDOLPH/Supervisory Patent Examiner, Art Unit 2671