Last updated: April 19, 2026
Application No. 18/443,920
METHOD FOR DETECTING SYNTHETIC CONTENT IN VIDEOS

Non-Final OA §103§112
Filed
Feb 16, 2024
Examiner
CHANG, DANIEL CHEOLJIN
Art Unit
2669
Tech Center
2600 — Communications
Assignee
Telefonica Innovacion Digital S L U
OA Round
1 (Non-Final)
Interview Optional

— +11.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 132 resolved cases, 2023–2026
Examiner Intelligence

CHANG, DANIEL CHEOLJIN View full profile →
Grants 89% — above average
Career Allow Rate
117 granted / 132 resolved
+26.6% vs TC avg
Moderate +12% lift
Without
With
+11.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
25 currently pending
Career history
157
Total Applications
across all art units
Statute-Specific Performance

§101
8.1%
-31.9% vs TC avg
§103
53.4%
+13.4% vs TC avg
§102
14.1%
-25.9% vs TC avg
§112
20.7%
-19.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 132 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Notice to Applicants
This communication is in response to the Application filed on 02/16/2024.
Claims 1-15 are pending.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 4 recites the limitations of “the head” (line 10). There is insufficient antecedent basis for this limitation in the claim. It is unclear if “the head” is referring back to “the subject's head” in claim 1 or something else.  Clarification/explanation is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, 4-9, 11, 12, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over GAO, Yuan et al. (U.S. Publication No. 2022/0309836) (hereafter, "GAO, Y") in view of Wong et al. (U.S. Publication No. 2018/0349682) (hereafter, "Wong") and further in view of LI et al. (U.S. Publication No. 2022/0277596) (hereafter, "LI").
GAO, Y teaches a computer-implemented method for detecting synthetic content in videos, the method comprising ([0006] provide an AI-based face recognition method and apparatus, a device, and a medium, which can defend against online face verification attacks whose attack types are a synthetic attack; [0013] A first liveness detection function and a second liveness detection function are combined. When both detection results of the two liveness detection functions indicate that a target face in a video frame is a liveness type, the video frame includes a live target face) obtaining at least an image to be analysed from a video, the image containing at least one body part of a subject ([0049] A 3D camera (including a color camera and a depth camera) is disposed on the terminal 120 for acquiring a face image (including at least one of a photo or a video frame) of a user 160; [0053] Step 301: Obtain n groups of input video frames, at least one group of video frames including a color video frame and a depth video frame of a target face, n being a positive integer; [0061] Step 303: Invoke a second liveness detection function to recognize the depth video frames in the n groups of video frames), the at least one body part comprising at least a head of the subject ([0049] the terminal 120 continuously shoots photos or shoots a video of the face of the user 160 ... The face image of the user 160 may be an image with additional actions such as blinking, head turning, smiling, and mouth opening, or an image without additional actions), and the method comprising the following steps executed by one or more processors: obtaining a plurality of … points corresponding to the at least one body part and collecting information for each of the obtained … points ([0060] The neural network model in the first liveness detection function may recognize a facial feature point on the target face in the color video frame, and obtain position coordinates of the facial feature point on the target face. The facial feature point is a feature point corresponding to a position of a facial part); calculating … vectors comprising information of position and movement of the plurality of the obtained … points to detect spatial positions of the body part ([0060] a distance change of the facial feature point is calculated by the calculation unit, or the distance change of the facial feature point is determined by the AI classifier, so as to determine whether the target face completes a target action; [0095] Step 4031: Invoke the first liveness detection function to calculate a ratio of a maximum horizontal distance to a maximum longitudinal distance between the facial feature points belonging to a same facial feature part, the facial feature part including at least one of an eye or a mouth).
GAO, Y does not expressly teach … three-dimensional … three-dimensional … in real-time three-dimensional … three-dimensional … detecting anomalies in real-time by comparing the calculated three-dimensional vectors with reference information of points corresponding to the at least one body part stored in matrices and verifying at least one criterion according to at least a frequency of eye blink or to a pose of the subject's head; and providing a result in real-time indicating whether a synthetic content is detected in the video, the result being based on the detected anomalies and each verified criterion.
However, Wong teaches … three-dimensional … three-dimensional ([0081] At 320, the authentication subsystem or an image analysis subsystem (e.g., image analysis subsystem 112) may extract feature points from each image frame of the sequence of image frames. The feature points may relate to features (e.g., landmarks) on the face present in the image frames, such as edges of the face, nose, eyes, mouth, ears, and eyebrows of the user being authenticated ... Facial features may be extracted for each image frame. The facial features may be represented by 2-D or 3-D feature points, which may be described by 2-D or 3-D coordinates) … in real-time three-dimensional … three-dimensional ([0083] the linear or angular velocity of the rotation of the face may be calculated based on the change of the locations of one or more feature points (e.g., the tip (apex) of the nose, center of an eye, a corner or center of the mouth) in consecutive image frames; [0081] The facial features may be represented by 2-D or 3-D feature points; [0073] At 230, the authentication subsystem may determine liveness detection results based upon the captured sequence of image frames, using one or more of eye blinking, smiling, or head movement-based detection techniques … the liveness detection may include pre-processing the captured sequence of image frames, such as extracting feature points in the image frames) … three-dimensional ([0083] the linear or angular velocity of the rotation of the face may be calculated based on the change of the locations of one or more feature points (e.g., the tip (apex) of the nose, center of an eye, a corner or center of the mouth) in consecutive image frames; [0081] The facial features may be represented by 2-D or 3-D feature points) … verifying at least one criterion according to at least a frequency of eye blink or to a pose of the subject's head; and ([0041] Techniques disclosed herein can be used to more accurately and more confidently detect the liveness of a face of a user being authenticated by detecting facial motions in, for example, eyes (e.g., blinking), mouth (e.g., smiling), and/or head (e.g., rotation) of the user being authenticated, to prevent spoofing attacks during image-based user authentication; [0074] the overall liveness detection condition may be determined based on one type of facial motion, such as eye blinks, and the face may be determined to be live if a criterion of the type of facial motion (e.g., the number of eye blinks in a certain time period) is satisfied ... the face may be determined to be live if more than 15 eye blinks are detected in 3 minutes; [0045]) … each verified criterion ([0041] Techniques disclosed herein can be used to more accurately and more confidently detect the liveness of a face of a user being authenticated by detecting facial motions in, for example, eyes (e.g., blinking), mouth (e.g., smiling), and/or head (e.g., rotation) of the user being authenticated, to prevent spoofing attacks during image-based user authentication; [0074] the number of eye blinks in a certain time period).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of GAO, Y to incorporate the step/system of calculating three-dimensional vectors in real-time by extracting 3-D feature points on the face and verifying criteria according to the frequency of eye blink and/or to a rotation of the user's head for preventing spoofing attacks taught by Wong.
The suggestion/motivation for doing so would have been to improve the accuracy of liveness detection to identify fake faces ([0004] techniques disclosed herein may be used to detect facial forgery (e.g., using still images) during image-based user authentication, based on more accurate liveness detection of the face of a user of a secure system). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results.
The combination GAO, Y and Wong does not expressly teach detecting anomalies in real-time by comparing the calculated … vectors with reference information of points corresponding to the at least one body part stored in matrices and providing a result in real-time indicating whether a synthetic content is detected in the video, the result being based on the detected anomalies and.
However, LI teaches detecting anomalies in real-time by comparing the calculated … vectors with reference information of points corresponding to the at least one body part stored in matrices and ([0128] by parsing face video data to obtain eye movement probability sets that reflect an eye movement state of a to-be-detected face, that is, an eye movement state sequence, analyzing whether an abnormal blink behavior exists in the face video data according to two particular eye movement probability sets; [0129] the foregoing two particular eye movement probability sets are respectively processed by using two classification models, to output a first probability value and a second probability value that respectively reflect a probability of existence of an excessively fast or frequent abnormal blink behavior in the face video data; [0141]) providing a result in real-time indicating whether a synthetic content is detected in the video, the result being based on the detected anomalies and ([0102] The frequent blink behavior is an abnormal blink behavior, indicating that the quantity of blinks exceeds the quantity of blinks generated by normal physiological activities, and then it is determined that there is a high possibility that the face video data is fake face video data; [0103] The excessively fast blink behavior is an abnormal blink behavior, indicating that the blink frequency is excessively high and exceeds a blink frequency generated by normal physiological activities, and then it is determined that there is a high possibility that the face video data is fake face video data).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of combination of GAO, Y and Wong to incorporate the step/system of detecting anomalies by comparing the eye movement with the two particular eye movement probability sets and determining whether a fake face is detected in the video based on detected anomalies taught by LI.
The suggestion/motivation for doing so would have been to improve the accuracy in recognition of highly realistic fake face videos ([0006] provide a face anti-spoofing recognition method and apparatus, a device, and a storage medium, which can improve the accuracy in recognition of highly realistic fake face videos, thereby improving the security of face recognition; [0054] determining whether the to-be-detected face is a real face, the accuracy in recognition of highly realistic fake face videos is improved, thereby improving the security of face recognition, and effectively preventing lawbreakers from forging identities of others by using fake face videos). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine GAO, Y and Wong with LI to obtain the invention as specified in claim 1.
Regarding claim 4, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. Wong teaches wherein the at least one verified criterion is the pose of the subject's head and ([0041] detect the liveness of a face of a user being authenticated by detecting facial motions in … head (e.g., rotation) of the user being authenticated, to prevent spoofing attacks during image-based user authentication; [0045]  the angle of rotation of the head (e.g., yaw, pitch, or roll) in response to an instruction may be determined based on the feature points in the series of image frames and used for liveness detection) the detected anomalies comprise at least one of: i) movements of the subject's head having a speed that exceeds a predetermined first threshold, ii) turns of the subject's body part having a speed that exceeds a predetermined second threshold, iii) anomalies with respect to facial symmetry, iv) movements of another body part of the subject, different from the head, having a speed that exceeds a predetermined first threshold anomalies with respect to a focal point, v) anomalies with respect to facial expressions related to emotions, and/or vi) anomalies with respect to movement of lips of the subject ([0124] changes to certain portions of the mouth of the subject present in the image frames may be used to detect facial motions, such as smiles; [0125] image frame 1200 depicting the mouth of a subject in one state, such as a no-smile state ... feature points may be extracted from the image frame to represent the mouth of the subject; [0126] image frame 1250 depicting the mouth of the subject in another state, such as an open-smile state; [0127] Different motions in the mouth may cause different changes in at least some of these parameters. For example, at least some of these parameters may be different for a mouth at different states, such as a no-smile state, a closed-smile state, and an open-smile state. Thus, based on the changes in at least some of these parameters, different states of the mouth may be determined in the image frames; [0129] Based on statistical data, a respective threshold value may be determined for each of the parameters associated with a mouth ... a threshold for the change in total height of the mouth may be set to 25%, a threshold for the change in width-to-height ratio of the mouth may be set to 50%, and so on. In some implementations, if the values of the parameters in a state indicate that more than two parameters (e.g., three or more parameters) have a change greater than the corresponding threshold with respect to the no-smile state, a smile (or another motion of the mouth) may be detected).
Regarding claim 5, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. Wong teaches wherein the at least one verified criterion is the frequency of eye blink and detecting anomalies comprises at least one of: calculating the frequency from a start time and an end time of eye blink and ([0041] Techniques disclosed herein can be used to more accurately and more confidently detect the liveness of a face of a user being authenticated by detecting facial motions in, for example, eyes (e.g., blinking), mouth (e.g., smiling), and/or head (e.g., rotation) of the user being authenticated, to prevent spoofing attacks during image-based user authentication; [0074] the overall liveness detection condition may be determined based on one type of facial motion, such as eye blinks, and the face may be determined to be live if a criterion of the type of facial motion (e.g., the number of eye blinks in a certain time period) is satisfied ... the face may be determined to be live if more than 15 eye blinks are detected in 3 minutes).
Wong does not expressly teach calculating a speed of eye closure.
However, LI teaches calculating a speed of eye closure ([0086] Compare the eye feature information in the time domain to obtain an eye movement change trend; [0118] The fact that the probability is smaller than the predetermined threshold is a basis for determining that there is no excessively fast … blink behavior in the face video data. The probability is a probability that reflects existence of an abnormal blink behavior of the to-be-detected face; [0061]; [0128]; [0141]; [0144]).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of combination of Wong to incorporate the step/system of detecting anomalies by obtaining an eye movement speed taught by LI.
Motivation for this combination has been stated in claim 1.
Regarding claim 6, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. Wong teaches wherein the information of the obtained three-dimensional points is collected either in real-time from the video being currently captured by a camera or from the video previously recorded by the camera ([0081] an image analysis subsystem (e.g., image analysis subsystem 112) may extract feature points from each image frame of the sequence of image frames. The feature points may relate to features (e.g., landmarks) on the face present in the image frames, such as edges of the face, nose, eyes, mouth, ears, and eyebrows of the user being authenticated ... Facial features may be extracted for each image frame. The facial features may be represented by 2-D or 3-D feature points; [0071] prompt ... may instruct the user to face a camera on the user device or making some facial motions. [0072] image capture subsystem 122 may comprise a camera, which when activated by image capture app 124, is configured to capture a video of the user's face, the video comprising a sequence of image frames).
Regarding claim 7, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. GAO, Y teaches wherein detecting anomalies is performed during a video call ([0241] The camera component 1206 is configured to acquire an image or a video. In one embodiment, the camera assembly 1206 includes a front-facing camera and a rear-facing camera. Generally, the front-facing camera is configured to implement a video call or self-portrait; [0013] The first liveness detection function can resist a copy attack and a mask attack, and the second liveness detection function can resist a synthetic attack and the copy attack, which can protect user information security more comprehensively).
Regarding claim 8, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. Wong teaches wherein detecting anomalies is adjusted to a frame rate defined by frames per second of the video ([0096] a spontaneous eye blink rate may be, for example, about 15-30 or fewer blinks per minute, which may vary depending on factors such as fatigue, stress, amount of sleep etc. ... a person's eyes may blink approximately once every 2-4 seconds or longer. An eye blink may last about 100 milliseconds (ms) to about 400 ms, and typically about 250 ms. Thus, if the image frame capturing rate is, for example, 25 fps, 30 fps, or higher, two or more frames (e.g., 6-8 or more image frames) may be captured during an eye blink; [0112] The step size for the sliding windows may be determined based on, for example, the image frame capturing rate, and may be larger if the image frame capturing rate is higher. The step size for the sliding windows may be, for example, one, two, five, or more. For example, if the step size is two, the first window may include closity values for image frames N to N+M−1, the second window may include closity values for image frames N+2 to N+M+1).
Regarding claim 9, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. Wong teaches wherein the video is received from an input source selected from a webcam, a video conference and a video file ([0041] a series of captured image frames (e.g., from a video stream captured by a camera) of a person's face; [0072] The image frames may be saved ... as frames in a video file).
Regarding claim 11, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. LI teaches wherein the provided result is a subset of verification data obtained in verifying the at least one criterion ([0129] the foregoing two particular eye movement probability sets are respectively processed by using two classification models, to output a first probability value and a second probability value that respectively reflect a probability of existence of an excessively fast or frequent abnormal blink behavior in the face video data; and whether the to-be-detected face is a real face is determined according to whether the two probability values are respectively less than a first threshold and a second threshold; [0121] By setting the first threshold, the effect of face anti-spoofing recognition can be effectively improved; [0122] By setting the second threshold, the effect of face anti-spoofing recognition can be effectively improved).
Regarding claim 12, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. LI teaches wherein the provided result is a binary evaluation between real human and deep fake ([0106] The first dataset is processed by using a first classification model, to obtain the first probability value. The first probability value is a probability that the to-be-detected face is determined to be a real face according to the first dataset; [0107] The first classification model is a first support vector machine (SVM) classifier, configured to determine a first probability value corresponding to the first dataset according to the inputted first dataset. SVM is a generalized linear classifier that performs binary classification on data in a supervised learning manner).
Regarding claim 14, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. Wong teaches further comprising displaying by a user interface the obtained three-dimensional points ([0104] FIG. 10A ... the three feature points on the left side of the eye, and the second angle may be determined based on, for example, the three feature points on the right side of the eye; [0105] FIG. 10B ... the three feature points on the left side of the eye, and the second angle may be determined based on, for example, the three feature points on the right side of the eye; [0153] Mobile device 1600 may include a display module 1660 and a user input module 1670 ... display module 1660 may comprise a multi-touch-sensitive display. User input module 1670 may include, without limitation, a touchscreen, a touch pad).
Regarding claim 15, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. GAO, Y teaches wherein the steps are executed by one processor of a personal computer, a laptop, a tablet, a smartphone or any programmable device providing a video player ([0049] the terminal 120 is merely used as an example for description. Types of a terminal device include at least one of a smartphone, a tablet computer, an e-book reader, a Moving Picture Experts Group Audio Layer III (MP3) player, a Moving Picture Experts Group Audio Layer IV (MP4) player, a laptop portable computer, or a desktop computer. The following embodiment is described by using an example in which the terminal includes a smartphone).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over GAO, Yuan et al. (U.S. Publication No. 2022/0309836) (hereafter, "GAO, Y") in view of Wong et al. (U.S. Publication No. 2018/0349682) (hereafter, "Wong") and further in view of LI et al. (U.S. Publication No. 2022/0277596) (hereafter, "LI") and KONDO (U.S. Publication No. 2024/0257326).
Regarding claim 2, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. LI teaches wherein detecting anomalies ([0128] by parsing face video data to obtain eye movement probability sets that reflect an eye movement state of a to-be-detected face, that is, an eye movement state sequence, analyzing whether an abnormal blink behavior exists in the face video data according to two particular eye movement probability sets; [0141]; [0144]).
LI does not expressly teach further comprises using a combination of translation and rotation to calculate motion values of the obtained three-dimensional points and generating homogeneous transformation matrices with the calculated motion values.
However, Wong teaches further comprises using a combination of translation and rotation to calculate motion values of the obtained three-dimensional points and ([0083] the linear or angular velocity of the rotation of the face may be calculated based on the change of the locations of one or more feature points (e.g., the tip (apex) of the nose, center of an eye, a corner or center of the mouth) in consecutive image frames; [0089] all extracted feature points (represented by circles) are rotated counter-clockwise by an angle, such as angle θ determined from FIG. 5, with respect to feature point #30. The resultant feature points are represented by the black dots. For example, feature point #27 may be moved to a new location represented by feature point #27′, while feature point #30 may not be moved. As such, each feature point, except the reference point (i.e., feature point #30), may have new x-y coordinates; [0090] the x-y coordinates of the feature points may be shifted with respect to a reference point, such that the relative coordinates of the feature points with respect to the reference point (relative coordinates of (0,0)) may be determined; [0081] The facial features may be represented by 2-D or 3-D feature points).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of combination of LI to incorporate the step/system of calculating motion values of the extracted 3-D feature points using shift and rotation taught by Wong.
Motivation for this combination has been stated in claim 1.
The combination LI and Wong does not expressly teach generating homogeneous transformation matrices with the calculated motion values.
However, KONDO teaches generating homogeneous transformation matrices with the calculated motion values ([0059] In S150 (FIG. 3), the processor 110 calculates a homography matrix according to the results of the keypoint matching. The homography matrix is a matrix that represents homography (also called projective transformation); [0156] the homography transformation of the captured image ip in S190 (FIG. 3) includes the coordinate transformation by the homography matrix H. The homography matrix H includes elements h11, h12, h21, and h22 forming the first submatrix SM1 of two rows and two columns. The first submatrix SM1 represents coordinate transformation including rotation and scaling between a two-dimensional coordinate system (indicating coordinates x and y); [0143] the processor 110 uses the homography matrix H to calculate an abnormality indicator value).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of combination of LI and Wong to incorporate the step/system of generating homogeneous transformation matrix with the motion values for calculating an abnormality indicator taught by KONDO.
The suggestion/motivation for doing so would have been to improve the accuracy in detection of visual anomalies on the object ([0004] there is room for improvement in the process for determining whether an object has an abnormality). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine LI and Wong with KONDO to obtain the invention as specified in claim 2.

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over GAO, Yuan et al. (U.S. Publication No. 2022/0309836) (hereafter, "GAO, Y") in view of Wong et al. (U.S. Publication No. 2018/0349682) (hereafter, "Wong") and further in view of LI et al. (U.S. Publication No. 2022/0277596) (hereafter, "LI") and GAO, XING et al. (WO2021227360) (hereafter, "GAO, X").
Regarding claim 3, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. The combination of GAO, Y and Wong with LI does not expressly teach wherein detecting anomalies further comprises using a camera projection matrix of a camera configured to capture the video, the camera projection matrix being used to map the obtained three-dimensional points in space to their two-dimensional projections in the obtained image.
However, GAO, X teaches wherein detecting anomalies further comprises using a camera projection matrix of a camera configured to capture the video (Page 10, line 45-49, the RANSAC (Random Sample Consensus) algorithm is used to eliminate false matching points. After the matching of the feature points is completed, the basic matrix and homography matrix of the two-dimensional picture and video frame are obtained, and the matched feature points are screened by the RANSAC algorithm based on the basic matrix and the homography matrix to eliminate the feature points with matching errors; Page 3, line 17-19, The two-dimensional feature point coordinates and the three-dimensional feature point coordinates are substituted into the PnP algorithm and the nonlinear optimization algorithm to obtain the camera pose matrix), the camera projection matrix being used to map the obtained three-dimensional points in space to their two-dimensional projections in the obtained image (Page 11, line 40-41, the PnP (Perspective-n-Point) algorithm is a method for solving 3D to 2D point pair motion, which can be solved by P3P; Page 12, line 13-15, The camera pose matrix, and then optimize the camera parameters through a nonlinear optimization algorithm to obtain focal length information and/or distortion parameters).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of combination of GAO, Y, Wong and LI to incorporate the step/system of detecting/eliminating false matching points by using camera pose matrix which maps 3D points to 2D image taught by GAO, X.
The suggestion/motivation for doing so would have been to improve the accuracy of projecting the video frame (Page 5, line 38-41, without the need for the staff to manually configure the camera parameters accurately, and improve Video projection efficiency, and by matching the video frame and the two-dimensional picture, the video frame can be projected on the correct position of the three-dimensional model, effectively improving the video projection effect). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine GAO, Y, Wong and LI with GAO, X to obtain the invention as specified in claim 3.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over GAO, Yuan et al. (U.S. Publication No. 2022/0309836) (hereafter, "GAO, Y") in view of Wong et al. (U.S. Publication No. 2018/0349682) (hereafter, "Wong") and further in view of LI et al. (U.S. Publication No. 2022/0277596) (hereafter, "LI") and STEWART et al. (U.S. Publication No. 2021/0327431) (hereafter, "STEWART").
Regarding claim 10, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. LI teaches wherein the provided result is ([0102] The frequent blink behavior is an abnormal blink behavior, indicating that the quantity of blinks exceeds the quantity of blinks generated by normal physiological activities, and then it is determined that there is a high possibility that the face video data is fake face video data; [0103] The excessively fast blink behavior is an abnormal blink behavior, indicating that the blink frequency is excessively high and exceeds a blink frequency generated by normal physiological activities, and then it is determined that there is a high possibility that the face video data is fake face video data).
LI does not expressly teach a weighted average of a confidence score of the video being a real human calculated for each of the criteria, where a weight is assigned to each criterion according to a relevance of the criterion.
However, STEWART teaches a weighted average of a confidence score of the video being a real human calculated for each of the criteria ([0115] LipSecure can be used as a ‘liveness’ check to validate that a real person is present; [0352] A liveness detection system comprising; [0355] (iii) a computer vision subsystem configured to analyse the video stream received, and to determine, using a lip reading or viseme processing subsystem, if the end-user has spoken or mimed the or each word, letter, character or digit, and to output a confidence score that the end-user is a “live” person), where a weight is assigned to each criterion according to a relevance of the criterion ([0190] The confidence scoring algorithm is an adaptively weighted scoring process, which is based on the principle that a selection of visemes and resulting words are more difficult to identify than others).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of combination of GAO, Y, Wong and LI to incorporate the step/system of producing the confidence score that the user is a real person by using an adaptively weighted scoring process for each of the criteria (visemes) according to a relevance (more difficult to identify) of the criteria taught by STEWART.
The suggestion/motivation for doing so would have been to prevent spoofing using a video ([0115] LipSecure is a cloud service, which provides a liveness check to user authentication services to prevent spoofing; [0008] checking liveness during biometric identification to prevent spoofing using a video or static photograph of a person (a.k.a. ‘replay attack’)). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine GAO, Y, Wong and LI with STEWART to obtain the invention as specified in claim 10.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over GAO, Yuan et al. (U.S. Publication No. 2022/0309836) (hereafter, "GAO, Y") in view of Wong et al. (U.S. Publication No. 2018/0349682) (hereafter, "Wong") and further in view of LI et al. (U.S. Publication No. 2022/0277596) (hereafter, "LI") and MICHAELI et al. (U.S. Publication No. 2024/0127630) (hereafter, "MICHAELI").
Regarding claim 13, the combination of GAO, Y and Wong with LI teaches all the limitations of claim 1 above. The combination of GAO, Y and Wong with LI does not expressly teach wherein the provided result is a warning generated to notify a user about a deep fake.
However, MICHAELI teaches wherein the provided result is a warning generated to notify a user about a deep fake ([0013] Where any of the sequential analyses detect an anomaly, an alert indicating the presence of deepfake content is generated; [0088] the alert is an electronic message ... the alert includes one or more pixel locations or coordinates at which there is deepfake content present in an observation of the visual content; [0089] The alert may be configured to be presented by display in a graphical user interface; [0105] deepfake detection method 200 adds a warning to the audio-visual content at the frame at which the anomaly occurred to visually indicate the deepfake content).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of combination of GAO, Y, Wong and LI to incorporate the step/system of generating an alert for detecting deepfake content taught by MICHAELI.
The suggestion/motivation for doing so would have been to improve the accuracy in identification of deepfake content ([0018] the ability to analyze the audio-video at fine resolutions in real time enables more accurate and more sensitive identification of deepfake content). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine GAO, Y, Wong and LI with MICHAELI to obtain the invention as specified in claim 13.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL C. CHANG whose telephone number is (571)270-1277. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan S. Park can be reached at (571) 272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL C CHANG/Examiner, Art Unit 2669                                                                                                                                                                                                        /CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669
Read full office action
Prosecution Timeline

Feb 16, 2024
Application Filed
Feb 02, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/749,754
Patent 12592097
REAL-TIME, FINE-RESOLUTION HUMAN INTRA-GAIT PATTERN RECOGNITION BASED ON DEEP LEARNING MODELS
2y 5m to grant Granted Mar 31, 2026
18/346,139
Patent 12579672
STEREO VISION-BASED HEIGHT CLEARANCE DETECTION
2y 5m to grant Granted Mar 17, 2026
18/149,678
Patent 12573047
Control Method, Device, Equipment and Storage Medium for Interactive Reproduction of Target Object
2y 5m to grant Granted Mar 10, 2026
17/895,857
Patent 12548296
Spatially Preserving Flattening in Deep Learning Neural Networks
2y 5m to grant Granted Feb 10, 2026
18/012,584
Patent 12541868
Image Registration Method and Apparatus, Electronic Apparatus, and Storage Medium
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+11.7%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 132 resolved cases by this examiner. Grant probability derived from career allow rate.