DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This is in response to applicant’s amendment/response filed on 02/18/2026, which has been entered and made of record. Claims 1,4, and 8-11 have been amended. Claims 2-3, 7, and 12 have been cancelled. Claims 1, 4-6, 8-11, 13-16 are pending in the application and claims 13-16 are withdrawn.
The amendment of specification is OK to Enter.
Response to Arguments
Applicant's arguments filed on 02/18/2026 have been fully considered but they are rendered moot in view of the new grounds of rejection presented below (as necessitated by the amendment to claim 1).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 5, and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 9538130 to Ford et al. in view of U.S. PGPubs 2022/0386759 to Fu et al., further in view of U.S. PGPubs 2021/0144338 to Astarabadi et al..
Regarding claim 1, Ford et al. teach a method of applying effects to a sequence of input video frames to define a sequence of output video frames of a video chat, a video conference or a teleconsultation (abstract, “ A computing device participating in a video conference may determine that a frame of a video stream includes features of a face, extract a portion of the frame that includes a first pair of eyes, and determine that the first pair of eyes are looking in a non-forward direction. The computing device may retrieve, from a database, a stored portion that includes a second pair of eyes that are looking in a forward direction, and modify the frame by substituting the stored portion for the portion in the frame to create a modified frame”), the method comprising:
receiving the sequence of frames comprising a first frame followed by a second frame (col 3:1-26, “the software application may capture multiple video frames from the imaging device. The software application may store one or more of the captured video frames for use in the video conference”, col 9:5-25, “The camera 110(1) may capture a series of frames of a video stream, such as frames 202(1), 202(2), 202(3), 202(4), 202(5), and 202(6). The matcher 120(1) may determine whether each of the frames 202 includes a particular part of a face (e.g., one eye or both eyes) that specifies particular criteria (e.g., each eye appears to be looking forward)”);
processing the first frame to determine face landmarks for a user's face in the first frame (col 3:58-67 and col 4:1-5, “During the video conference, the software application may capture frames in a video stream, identify those frames in which the participant does not appear to be gazing forward, and blend a previously extracted portion of a frame that includes the participant's eyes (e.g., gazing at the camera) with the frames to create modified frames in which the participant appears to be gazing forward (e.g. at the other participants in the video conference)”, col 10:56-67 and col 11:1-29, “ in FIG. 1, in response to determining (e.g., using a classifier) that the frame 114(1) includes the face of a human, the decomposer 118(1) may decompose the frame 114(1) into the portions 116(1). At least one of the portions 116(1) may include a part (e.g., an eye, a pair of eyes, a mouth, a nose, an eyebrow, a forehead, etc.) of a face. The decomposer 118(1) may determine whether one or more of the portions 116(1) that include a particular part of a face satisfy one or more criteria …. in FIG. 1, one or more of the stored portions 126(1) may be combined with the frame 114(1) or the portions 116(1) to create the modified frame 123(1) in which the eyes appear to be looking forward.”, col 13:59-57 and col 14:1-52, “the classifier may be trained to recognize facial features to determine whether a frame includes a face of a human being …. the frame may be modified by substituting the portion (e.g., in which the eyes do not appear to be looking forward) with the stored portion (e.g., in which the eyes appear to be looking forward) to create a modified frame. At 514, the modified frame may be sent to a server, and the process may proceed to 502 to receive a next frame. For example, in FIG. 1, if the classifier determines that the frame 114(1) includes a first pair of eyes that do not appear to be looking forward, the matcher 120(1) may identify one (or more) of the stored portions 126(1) that include a second pair of eyes that appear to be looking in the forward direction. The frame modifier 122(1) may modify the frame 114(1) using one (or more) of the stored portions 126(1) to create the modified frame 123(1)”);
processing the second frame to determine face landmarks for the user's face in the second frame (col 10:56-67 and col 11:1-29, “ in FIG. 1, in response to determining (e.g., using a classifier) that the frame 114(1) includes the face of a human, the decomposer 118(1) may decompose the frame 114(1) into the portions 116(1). At least one of the portions 116(1) may include a part (e.g., an eye, a pair of eyes, a mouth, a nose, an eyebrow, a forehead, etc.) of a face. The decomposer 118(1) may determine whether one or more of the portions 116(1) that include a particular part of a face satisfy one or more criteria …. in FIG. 1, one or more of the stored portions 126(1) may be combined with the frame 114(1) or the portions 116(1) to create the modified frame 123(1) in which the eyes appear to be looking forward.”, col 13:59-57 and col 14:1-52, “the classifier may be trained to recognize facial features to determine whether a frame includes a face of a human being …. the frame may be modified by substituting the portion (e.g., in which the eyes do not appear to be looking forward) with the stored portion (e.g., in which the eyes appear to be looking forward) to create a modified frame. At 514, the modified frame may be sent to a server, and the process may proceed to 502 to receive a next frame. For example, in FIG. 1, if the classifier determines that the frame 114(1) includes a first pair of eyes that do not appear to be looking forward, the matcher 120(1) may identify one (or more) of the stored portions 126(1) that include a second pair of eyes that appear to be looking in the forward direction. The frame modifier 122(1) may modify the frame 114(1) using one (or more) of the stored portions 126(1) to create the modified frame 123(1)”, col 11:22-54, “At 314, a modified frame may be created based on the frame and the stored portions. At 316, the modified frame may be sent to the server, and the process may proceed to 302, to receive a next frame”, col 14:23-51, “the modified frame may be sent to a server, and the process may proceed to 502 to receive a next frame”); and
providing the output frame for the video chat, the video conference, or the teleconsultation (col 14:23-51, “The computing device 102(1) may send a video stream that includes multiple frames to the server 104 for distribution to other participating devices (e.g., the computing device 102(N))”).
But Ford et al. keep silent for teaching applying one or more makeup effects to the user's face in the first frame to define an output frame, the one or more effects applied relative to at least some of the face landmarks, wherein the one or more makeup effects simulate one or more makeup products applied to the user's face; wherein processing the first frame to determine face landmarks comprises detecting an occlusion of the user's face, and wherein applying the one or more makeup effects to the user's face comprises selectively hiding or rendering one or more parts of the one or more makeup effects based on the occlusion.
In related endeavor, Fu et al. teach applying one or more makeup effects to the user's face in the first frame to define an output frame, the one or more effects applied relative to at least some of the face landmarks (abstract, “the method comprising providing a facial image of a user with makeups being applied thereto, locating facial landmarks from the facial image of the user in one or more regions…. The disclosure also provides systems and methods for virtually generating output effects on an input image having a face, for creating dynamic texturing to a lip region of a facial image, for a virtual eye makeup add-on that may include multiple layers, a makeup recommendation system based on a trained neural network model, a method for providing a virtual makeup tutorial, a method for fast facial detection and landmark tracking which may also reduce lag associated with fast movement and to reduce shaking from lack of movement, a method of adjusting brightness and of calibrating a color and a method for advanced landmark location and feature detection using a Gaussian mixture model”, par 0024-0025, “generating an output effect on an input image having a face, comprising: (a) providing a facial image of a user with facial landmarks; (b) locating the facial landmarks from the facial image of the user, wherein the facial landmarks include a first region, and wherein the landmarks associated with the first region are associated with lips of the facial image having a lip color and the first region includes a lip region; (c) converting the lip region of the image into at least one color channel and detecting and analyzing a light distribution of the lip region; (d) feeding the at least one color channel into histogram matching over a varying light distribution to identify a histogram having a pre-defined light distribution that varies from the light distribution of the lip region thereby generating at least one output effect; and (e) combining the output effect with the first image to provide a resultant image having the lip color and the at least one output effect applied to the lip”, par 0125, “landmark detection techniques which use landmarks for facial feature extraction, and particularly preferred for use with lip region extraction, are enhanced to take into account situations wherein an input image may include difficult to detect facial regions, particularly lips such as those having lip gestures (puckered kiss face or a large distorted smile) or lips having occlusions within the lip region (finger tips, teeth, tongue or any object cover the lips)”), wherein the one or more makeup effects simulate one or more makeup products applied to the user's face (abstract, “The present disclosure provides systems and methods for virtual facial makeup simulation through virtual makeup removal and virtual makeup add-ons, virtual end effects and simulated textures”, par 0206, “ images with makeup (that are annotated as noted above) are also used as templates for lip color annotations and FIG. 10C shows related lip texture annotations (output effects) with the colors labeled as shown and the output effects or textures identified with a discrete corresponding value”, par 0207-0215, “The make-up recommendation 7050 can be derived from a makeup recommender 7020 from the trained system and models such as trained models 4040, although a separate trained model may be created solely for use with a recommendation system. Product matching 7030 can also be used using a makeup product database, which may be the same or different from the makeup database 7045 (as shown in FIG. 11, it is the same database)”); processing the second frame to determine face landmarks for the user's face in the second frame (par 0037, “generating a set of candidate key frames based on frame differences, color histograms and/or camera motion, and selecting final key frames based on a set of criteria and whether a different type of makeup on a prior or next frame “, par 0212, “generating a set of candidate key-frames 7020b using general video key-frame detection methods based on frame differences, color histograms, and/or camera motion; selecting the final key-frames based on specified makeup-related criteria, e.g., frontal face, face occlusion, hand motion, and/or face expression (usually having a smile), and whether there exists different makeup between its prior or next key-frames “, par 0220, “when an input in the form of a video frame having a face 8010 is input to the system 8000 for detecting the face and adjusting the brightness, a facial landmark detection algorithm is used to detect the face region and facial landmarks from the input image in step 8020, so as to obtain the face position and shapes in the image. Then, the system uses a skin color estimator 8030 based on the landmark information from the image to estimate the normalized skin color of the face. In the other path 8040, with the facial landmark detected, the system assigns different weighting factors to the face region, image center region, and the border region, and then calculates the average brightness of the image 8045”); wherein processing the first frame to determine face landmarks comprises detecting an occlusion of the user's face (par 0125, “ landmark detection techniques which use landmarks for facial feature extraction, and particularly preferred for use with lip region extraction, are enhanced to take into account situations wherein an input image may include difficult to detect facial regions, particularly lips such as those having lip gestures (puckered kiss face or a large distorted smile) or lips having occlusions within the lip region (finger tips, teeth, tongue or any object cover the lips). In such situations use of only landmarks does not typically provide an accurate facial region, such as an accurate lip region. The present embodiment utilizes color information to further improve landmark detection results to obtain and detect an optimal facial region, such as a preferred optimal lip region”, par 0212, “generating a set of candidate key-frames 7020b using general video key-frame detection methods based on frame differences, color histograms, and/or camera motion; selecting the final key-frames based on specified makeup-related criteria, e.g., frontal face, face occlusion, hand motion, and/or face expression (usually having a smile), and whether there exists different makeup between its prior or next key-frames”), and wherein applying the one or more makeup effects to the user's face comprises selectively hiding or rendering one or more parts of the one or more makeup effects based on the occlusion (par 0129, “This method can be used in a makeup removal method for replacing a colored lip with a plain lip or in a makeup add-on method to remove an existing lip region and replace it with another colored lip region. The goal of this method is to refine the lip region based on a landmark parsing result, since in many cases a landmark detection may not provide a true lip region, particularly based on distortion or occlusion”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Ford et al. to include applying one or more makeup effects to the user's face in the first frame to define an output frame, the one or more effects applied relative to at least some of the face landmarks, wherein the one or more makeup effects simulate one or more makeup products applied to the user's face; wherein processing the first frame to determine face landmarks comprises detecting an occlusion of the user's face, and wherein applying the one or more makeup effects to the user's face comprises selectively hiding or rendering one or more parts of the one or more makeup effects based on the occlusion as taught by Fu et al. to improve virtual facial makeup simulation, including virtual makeup tutorials, makeup recommendations, automatic adjustment of brightness and calibration of color using a color map and standard, a framework of fast facial landmarks detection and tracking to lead to true, consistent or realistic results.
But Ford et al. as modified by Fu et al. keep silent for teaching in parallel to applying the one or more makeup effects to the user's face in the first frame, processing the second frame to determine face landmarks for the user's face in the second frame.
In related endeavor, Astarabadi et al. teach applying one or more makeup effects to the user's face in the first frame to define an output frame, the one or more effects applied relative to at least some of the face landmarks (par 0034, “The first instance of the application can also prompt the first user to select or activate augmentation schema, such as: a virtual accessory schema (e.g., glasses); a virtual makeup schema (e.g., lipstick, blush, eyeliner, eyelashes); a virtual grooming schema (e.g., eyebrow profiles, beard profiles, haircuts and styles); a virtual jewelry schema (e.g., earrings, a nose ring, a lip ring); and/or virtual clothing schema (e.g., a suit and tie, business causal dress, casual dress, beachwear, athletic wear, sleepwear). The first instance of the application can then return a command to the second device to inject virtual representations of glasses, makeup schema, etc.—thus selected by the user—into the synthetic video feed generated at the second device during the upcoming video call”, par 0085, “ the remote computer system can train the conditional generative adversarial network to output a synthetic face image based on a set of input conditions, including: a facial landmark container, which captures relative locations (and/or sizes, orientations) of facial landmarks that represent a facial expression; and a face model, which contains a (pseudo-) unique set of coefficients characterizing a unique human face and secondary physiognomic features (e.g., face shape, skin tone, facial hair, makeup, freckles, wrinkles, eye color, hair color, hair style, and/or jewelry)”, par 0099, “the device can execute the foregoing process to tune coefficients within a face model for the user such that insertion of this face model and the facial landmark container—extracted from the authentic face image—into the synthetic face generator produces a realistic approximation of the facial expression, face shape, skin tone, facial hair, makeup, freckles, wrinkles, eye color, hair color, hair style, and/or jewelry, etc. depicted in the authentic face image”, par 0117-0119, “Insertion of this look model and a first facial landmark container—extracted from a look image—into the synthetic face generator produces a realistic approximation of the facial expression, face shape, skin tone, facial hair, makeup, freckles, wrinkles, eye color, hair color, hair style, and/or jewelry, etc. depicted in the look image”), wherein the one or more makeup effects simulate one or more makeup products applied to the user's face (par 0034, “The first instance of the application can also prompt the first user to select or activate augmentation schema, such as: a virtual accessory schema (e.g., glasses); a virtual makeup schema (e.g., lipstick, blush, eyeliner, eyelashes); a virtual grooming schema (e.g., eyebrow profiles, beard profiles, haircuts and styles); a virtual jewelry schema (e.g., earrings, a nose ring, a lip ring); and/or virtual clothing schema (e.g., a suit and tie, business causal dress, casual dress, beachwear, athletic wear, sleepwear). The first instance of the application can then return a command to the second device to inject virtual representations of glasses, makeup schema, etc.—thus selected by the user—into the synthetic video feed generated at the second device during the upcoming video call”, par 0117-0119, “ insertion of this look model and a different facial landmark container—such as extracted from a video frame captured by the device during a later video call—into the synthetic face generator produces a realistic approximation of: the face shape, skin tone, facial hair, makeup, freckles, wrinkles, eye color, hair color, hair style, and/or jewelry, etc. depicted in the look image; and the facial expression depicted in the video frame”); in parallel to applying the one or more makeup effects to the user's face in the first frame, processing the second frame to determine face landmarks for the user's face in the second frame (par 0066, “rendering the first synthetic face image at a second time in Block S260; outputting the first audio packet at approximately (e.g., within 50 milliseconds of) the second time in Block S262; capturing a second video feed in Block S210; for a second frame, in the second video feed, captured at approximately (e.g., within one second of) the first time, detecting a second constellation of facial landmarks in the second frame in Block S220 and representing the second constellation of facial landmarks in a second facial landmark container in Block S222; and transmitting the second facial landmark container to the first device in Block S230”, par 0112-0113, “Insertion of this face model and a first facial landmark container—extracted from a first frame in this set—into the synthetic face generator produces a first realistic approximation of the facial expression, face shape, skin tone, facial hair, makeup, freckles, wrinkles, eye color, hair color, hair style, and/or jewelry, etc. depicted in the first frame. Similarly, insertion of this face model and a second facial landmark container—extracted from a second frame in this set—into the synthetic face generator produces a second realistic approximation of the facial expression, face shape, skin tone, facial hair, makeup, freckles, wrinkles, eye color, hair color, hair style, and/or jewelry, etc. depicted in the second frame”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Ford et al. as modified by Fu et al. to include in parallel to applying the one or more makeup effects to the user's face in the first frame, processing the second frame to determine face landmarks for the user's face in the second frame as taught by Astarabadi et al. to implement facial deconstruction and facial reconstruction models—such as trained on a population of users or the first user specifically based on deep learning or artificial intelligence techniques—to rapidly decompose a first video feed recorded at the first device into a first facial landmark feed and to reconstruct this first facial landmark feed into a first synthetic video depicting highest-import content from the first video feed (i.e., the first user's face).
Regarding claim 5, Ford et al. as modified by Fu et al. and Astarabadi et al. teach all the limitation of claim 1, and Ford et al. further teach wherein the method is performed by a video chat or video conferencing application (abstract).
Regarding claim 9, Ford et al. as modified by Fu et al. and Astarabadi et al. teach all the limitation of claim 1, and Fu et al. further teach wherein detecting the occlusion of the user's face comprises using a deep neural network to classify, localize, or segment an occluding object (par 0264-0276, “After the landmarks are generated, a 3-layer Neural Network Model is used as a correctness validation model 3070 to filter the wrong shapes. The neural network layers are preferably a convolution layer, an up-sample layer and a mapping layer…. the correlation coefficient may also be used to classify which landmarks are occluded …. The new framework is even smarter since it has the patches coefficient match module to detect which landmark is occluded”).
Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 9538130 to Ford et al. in view of U.S. PGPubs 2022/0386759 to Fu et al., further in view of U.S. PGPubs 2021/0144338 to Astarabadi et al., further in view of U.S. PGPubs 2023/0215118 to Causse et al..
Regarding claim 4, Ford et al. as modified by Fu et al. and Astarabadi et al. teach all the limitation of claim 1, and Fu et al. further teach further comprising providing a user interface presenting a plurality of makeup effects associated with respective products for selection through user input (par 0206, “ images with makeup (that are annotated as noted above) are also used as templates for lip color annotations and FIG. 10C shows related lip texture annotations (output effects) with the colors labeled as shown and the output effects or textures identified with a discrete corresponding value”, par 0207-0215, “The make-up recommendation 7050 can be derived from a makeup recommender 7020 from the trained system and models such as trained models 4040, although a separate trained model may be created solely for use with a recommendation system. Product matching 7030 can also be used using a makeup product database, which may be the same or different from the makeup database 7045 (as shown in FIG. 11, it is the same database)”), but keep silent for teaching the user interface configured to provide access to an e-commerce interface to conduct a product purchase transaction.
In related endeavor, Causse et al. further teach further comprising providing a user interface presenting a plurality of makeup effects associated with respective products for selection through user input, the user interface configured to provide access to an e-commerce interface to conduct a product purchase transaction (par 0076, “The database 120 also stores data of products in a product table 316, which enables the product catalog system 124 to perform operations related to providing an augmented reality experience with respect to a product (e.g., a given physical item that may be available for purchase or sale)”, par 0123, “imaging processing algorithms and recognition techniques may be used to detect the user's face in the image. Based on the selected AR content generator, the augmented reality content generator module 706 can generate and render an AR experience based on the selected AR content generator from the carousel interface for display on a given client device (e.g., the client device 102)”, Fig 11, par 0149-0155, “a SKU selector for a product(s) [0151] a call to action button for which different product URLs can be linked [0152] a display of relevant purchasing information (e.g., price, SKU name, merchant, and the like) [0153] a way to favorite, save, or bookmark individual products shown in the AR content generator or experience …. additional information 1150 related to the product (e.g., a description or product name) can be included in the interface 1100. To revert to the interface 1000, an input can be received anywhere outside of additional information 1150 or product information 1160”, par 0158, “The AR content generator can then render for display AR content upon a user's facial features based on a corresponding product (e.g., a beauty product such as lip stick, makeup, and the like) of the product card.”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Ford et al. as modified by Fu et al. and Astarabadi et al. to include further comprising providing a user interface presenting a plurality of makeup effects associated with respective products for selection through user input, the user interface configured to provide access to an e-commerce interface to conduct a product purchase transaction as taught by Causse et al. to apply AR content related to the product to a representation of the user's face or other part of the user's body (e.g., arm, leg, and the like) render and display how a beauty product would appear on the representation of the user's face on a given client device. rendering how a beauty product would appear on the representation of the user's face to providing augmented reality experiences of products (which can be purchased).
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 9538130 to Ford et al. in view of U.S. PGPubs 2022/0386759 to Fu et al., further in view of U.S. PGPubs 2021/0144338 to Astarabadi et al., further in view of U.S. PGPubs 2022/0207802 to Troutman et al..
Regarding claim 6, Ford et al. as modified by Fu et al. and Astarabadi et al. teach all the limitation of claim 1, but keep silent for teaching wherein the method is performed by a teleconsultation application.
In related endeavor, Troutman et al. teach wherein the method is performed by a teleconsultation application (par 0008, par 0078, par 0108, “Once the mobile application is started, the User 530 may be given a choice of using the Digital Makeup Artist 520 for teaching in a tutorial or providing advice in a makeup consultation session (to be described later)”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Ford et al. as modified by Fu et al. and Astarabadi et al. to include wherein the method is performed by a teleconsultation application as taught by Troutman et al. to analyze the user's face image to determine facial characteristics, and generate image frames to be displayed in synchronization with the interaction with the digital makeup artist to provide the advice, based on the analyzed face image, needs of the user, the stored cosmetic routine information, common makeup looks, cosmetic products for skin types and ethnicity, and the user look preferences.
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 9538130 to Ford et al. in view of U.S. PGPubs 2022/0386759 to Fu et al., further in view of U.S. PGPubs 2021/0144338 to Astarabadi et al., further in view of U.S. PGPubs 2021/0158021 to Wu et al..
Regarding claim 8, Ford et al. as modified by Fu et al. and Astarabadi et al. teach all the limitation of claim 1, but keep silent for teaching wherein selectively hiding or rendering one or more parts of the one or more makeup effects based on the occlusion comprises representing an occluding object with occlusion mask information at a pixel level according to the occlusion as detected to render the one or more makeup effects in portions of the user's face that are not occluded and hide or not render the one or more makeup effects in portions of the user's face that are occluded.
In related endeavor, Troutman et al. teach wherein selectively hiding or rendering one or more parts of the one or more makeup effects based on the occlusion comprises representing an occluding object with occlusion mask information at a pixel level according to the occlusion as detected to render the one or more makeup effects in portions of the user's face that are not occluded and hide or not render the one or more makeup effects in portions of the user's face that are occluded(par 0006, “acquiring an occlusion mask of the target face image, wherein the occlusion mask is configured to indicate a face visible area which is not subject to an occluder and a face invisible area which is subject to the occluder in the target face image; and generating a second fusion image based on the occlusion mask and the first fusion image”, par 0033-0035, “An occlusion mask is configured to distinguish a face visible area not occluded by the occluder (that is, not occluded) from a face invisible area occluded by the occluder (that is, having the occluder) in the face image. As an example, pixels taking a first value in the occlusion mask indicates the face visible area, and pixels taking a second value indicates the face invisible area”, par 0041-0043, “the face visible area not occluded by the occluder in the face image may be distinguished from the face invisible area occluded by the occluder based the image semantic segmentation ….the virtual special effect includes, but is not limited to, virtual makeup or virtual accessories. For example, the virtual makeup or virtual accessories selected by the user may be fused with face parts matched in the face image”, 0047-0049, “Since the user face in the face image is not occluded by the occluder, the virtual lipstick special effect can be completely fitted to the lip area. However, with respect to FIG. 3, after performing the face key point detection and fitting a lip area based on the face key point detection result, the virtual lipstick special effect is finally rendered on the user's finger since the user's face in the face image is occluded by the finger”, par 0083-0086, “by taking that the face part the user wants to apply makeup is lips and the virtual special effect is lipstick as an example, an occluder (specifically, the user's finger), as shown in FIG. 8, appears in the face image and partially covers the lips. Thus, the lipstick special effect may be directly added to the lip area after fitting the lip area requiring the makeup based on the face key point detection result obtained by performing the face key detection on the face image”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Ford et al. as modified by Fu et al. and Astarabadi et al. to include wherein selectively hiding or rendering one or more parts of the one or more makeup effects based on the occlusion comprises representing an occluding object with occlusion mask information at a pixel level according to the occlusion as detected to render the one or more makeup effects in portions of the user's face that are not occluded and hide or not render the one or more makeup effects in portions of the user's face that are occluded as taught by Wu et al. to enhance beauty of the image with a virtual special effect.
Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 9538130 to Ford et al. in view of U.S. PGPubs 2022/0386759 to Fu et al., further in view of U.S. PGPubs 2021/0144338 to Astarabadi et al., further in view of U.S. PGpubs 2021/0165998 to Cao et al..
Regarding claim 10, Ford et al. as modified by Fu et al. and Astarabadi et al. teach all the limitation of claim 1, but keep silent for teaching further comprising stabilizing object landmarks for the second frame in accordance with a prediction of the location of the object landmarks for the second frame using an optical flow function.
In related endeavor, Cao et al. teach further comprising stabilizing object landmarks for the second frame in accordance with a prediction of the location of the object landmarks for the second frame using an optical flow function (par 0024-0027, “Embodiments of the present disclosure describe systems and methods for improving model expressiveness and rigid stability for real-time monocular object tracking using region-based models. This approach may incorporate dense motion-guided correctives from fast optical flow to improve tracking fidelity and reduce residual expression-fitting errors while improving rigid stability in a joint optimization framework for rigid object pose and expression parameters”, par 0063-0065, “the detected landmarks are too sparse to recover the complete motion of the face, especially in regions where landmarks are absent (e.g., the cheek regions). In such circumstances, besides landmark locations, other denser motion cues may be leveraged to extract true local motion and to correct landmark detection errors. In on example, a fast optical flow estimation method is employed on the input video stream inside the face region on-the-fly to extract dense motion flow and then map this motion flow to each face vertex projection in the screen space through bilinear interpolation, annotated by U.sub.i. Given rigid pose T′ and expression coefficients B′ from a previous frame, the L.sub.2 norm of the flow residuals, e.sub.flow.sup.k between the current projections of each face vertex I and the flow-predicted locations P.sup.k(T′, B′.sup.k).sub.i+U.sub.i, should be minimized ….. The dynamic rigidity weights w.sup.k with current dense motion flow U may be used to enforce stronger stabilization to still frames while relaxing restrictions on fast moving frames”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Ford et al. as modified by Fu et al. and Astarabadi et al. to include comprising stabilizing object landmarks for the second frame in accordance with a prediction of the location of the object landmarks for the second frame using an optical flow function as taught by Cao et al. to incorporate dense motion-guided correctives from fast optical flow to improve tracking fidelity and reduce residual expression-fitting errors while improving rigid stability in a joint optimization framework for rigid object pose and expression parameters enable tracking of objects in real-time with improved accuracy and stability compared to previous systems.
Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 9538130 to Ford et al. in view of U.S. PGPubs 2022/0386759 to Fu et al., further in view of U.S. PGPubs 2021/0144338 to Astarabadi et al., further in view of U.S. PGPubs 2022/0139107 to Cheng et al..
Regarding claim 11, Ford et al. as modified by Fu et al. and Astarabadi et al. teach all the limitation of claim 1, but keep silent for teaching further comprises computing an optical flow function in relation to the second frame for predicting locations of the face landmarks within the second frame responsive to locations in the first frame, determining an optical flow error for the second frame, skipping a detecting of pixel locations for the second frame responsive to the optical flow error and using pixel locations responsive to the optical flow function.
In related endeavor, Cheng et al. further teach further comprises computing an optical flow function in relation to the second frame for predicting locations of the face landmarks within the second frame responsive to locations in the first frame, determining an optical flow error for the second frame, skipping a detecting of pixel locations for the second frame responsive to the optical flow error and using pixel locations responsive to the optical flow function (par 0016, “ the first image and the second image may be images in a stream of images and/or a video stream from a camera which is providing a live video stream of images that include a face. The facial detection engine uses a facial detection model, and the like to detect facial landmarks in each of the images, and the optical flow landmark engine uses an optical flow model to predict the movement of the facial landmarks from image to image. In particular, for the first image and the second image, for a given facial landmark, a detected facial landmark position is detected, and an optical flow landmark position is determined”, Fig 6, par 0083-0099, “Denote p.sub.m,(1,2) as the mth facial landmark in the second image 602, obtained by applying an optical flow model (e.g., using the optical flow landmark engine 120, and the like) on p.sub.m,1, from the first image 601 to the second image 602. …. In the first image 601, there are p.sub.m,1 and p.sub.m,((1,2),1) and in second image 602, there are p.sub.m,2 and p.sub.m,(1,2), where p.sub.m,1 and p.sub.m,2 are determined using facial detection (e.g., using the landmark detection engine 110, and the like), and p.sub.m,((1,2),1) and p.sub.m,(1,2) are determined using an optical flow model detection (e.g., using the optical flow landmark engine 120, and the like) …. Define an outer distance of between eyes (not depicted) of the face 603 as the distance between two facial landmarks (e.g., with reference to FIG. 2, a distance between two facial landmarks p.sub.m,1 which correspond to LM37 and LM46 of the facial landmarks 200 may be determined) as determined using the facial detection (e.g., using the optical flow landmark engine 120, and the like) in the first image 601”, par 0110, “n Equation (3), p.sub.m,3(smoothed) is the set of smoothed temporal landmarks for a third image, p.sub.m,(1,2,3) is the set of optical flow landmark positions for the third image determined using optical flow of a detected optical landmark from a first image to a second image to the third image, p.sub.m,(2,3) is the set of optical flow landmark positions for the third image determined using optical flow of a detected optical landmark from the second image to the third image, and p.sub.m,3 is the set of detected landmark positions for the third image. In Equation (3), α,β and γ are respective weights determined using differences between detected landmark positions in the first image and the second image, and sigmoid functions, similar to Equation (1)”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Ford et al. to include further comprises computing an optical flow function in relation to the second frame for predicting locations of the face landmarks within the second frame responsive to locations in the first frame, determining an optical flow error for the second frame, skipping a detecting of the pixel locations for the second frame responsive to the optical flow error and using pixel locations responsive to the optical flow function as taught by Cheng et al. to determine the landmark positions in the images based on optical flow of the landmarks and neural network model between the images to accurately determinate facial expressions and/or emotions based on the landmark tracking from image to image in a video stream.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jin Ge whose telephone number is (571)272-5556. The examiner can normally be reached 8:00 to 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at (571)272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
JIN . GE
Examiner
Art Unit 2619
/JIN GE/Primary Examiner, Art Unit 2619