DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 8 is objected to because of the following informalities:
As to claim 8, “the first probability” in line 8 of the claim should be changed to “a first probability”, since “a first probability” was not previously recited in the claim. Appropriate correction is required.
As to claim 8, “the second probability” in line 9 of the claim should be changed to “a second probability”, since “a second probability” was not previously recited in the claim. Appropriate correction is required.
As to claim 8, the extra period “. “ at the end of the claim should be deleted. Each claim ends with a period. MPEP 608.01(m). Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 6-7, 9-16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhu et al. (CN 111353546 A, cited in the IDS (Information Disclosure Statement), attached English machine translation is used in the rejection) in view of Cao et al. (CN111401216 A, cited in the IDS (Information Disclosure Statement), attached English machine translation is used in the rejection).
As to claim 1, Zhu et al. teaches a method for training a face swap model, performed by a computer device ([0002]: computer device;[0069]: training applied to a face-swapping), the method comprising: acquiring a sample triplet, the sample triplet comprising a source face image, a template image, and a reference image ([0091-0092]: obtain triplet sample, which include source image, first image and target image. Object in the image is a face); concatenating an expression feature of the template image and an identity feature of the source face image to obtain a combined feature ([0068]: inputs the source image and the first image into the generator in the image processing model to obtain the output image; [0095-0097]: the source image and the target image both contain a male face A. The face image containing female face B is selected as the first image. The male face A in the source image and the male face A in the target image are different in terms of posture, expression, makeup, and background. The female face B in the face image has the same posture, expression, makeup and background as the male face A in the target image. The generator in the image processing model inputs the terminal source image and the first image, and replaces the objects in the source image into the first image to obtain the output image;[0103]); respectively predicting an image attribute discrimination result of the swapped face image and an image attribute discrimination result of the reference image by using a discriminator network of the face swap model, an image attribute comprising forged image and non-forged image ([0097-0101]: Input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination result. The image attributes include fake images and non-fake images); and
updating the generator network and the discriminator network, based on the image attribute discrimination result of the swapped face image and the image attribute discrimination result of the reference image ([0097-0102]: Input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination result. The image attributes include fake images and non-fake images. Adjust the parameters of the generator and the discriminator based on the image attribute discrimination results), but does not explicitly disclose performing encoding based on the source face image and the template image by using a generator network of the face swap model to obtain an encoding feature required for face swap; fusing the encoding feature and the combined feature to obtain a fused feature; performing decoding based on the fused feature by using the generator network of the face swap model to obtain a swapped face image; calculating a difference between an expression feature of the swapped face image and the expression feature of the template image, calculating a difference between an identity feature of the swapped face image and an identity feature of the source face image, and updating the generator network and the discriminator network based on the calculated difference of the expression features between the swapped face image and the template image, and the calculated difference of the identity features between the swapped face image and the source face image. However, Cao et al. teaches performing encoding based on the source face image and the template image by using a generator network of the face swap model to obtain an encoding feature required for face swap ([0011];[0108-0109]: the initial facial image and the template facial image are encoded using encoding model); fusing the encoding feature and the combined feature to obtain a fused feature ([0011-0012];[0031];[0115]: fusing the facial identity features, the attribute features, and the common encoded feature); performing decoding based on the fused feature by using the generator network of the face swap model to obtain a swapped face image ([[0126-0128]: feature fusion model is the input to the decoding model;[0162]: target facial image (i.e., the face-swapping result image) is obtained through the decoding model (decoding module)); calculating a difference between an expression feature of the swapped face image and the expression feature of the template image ([0084]:attribute features include expression; [0137];[0150]: loss function constructed based on the loss of the difference between the target face image and template face image;[0162]: target facial image (i.e., the face-swapping result image)) , calculating a difference between an identity feature of the swapped face image and an identity feature of the source face image ([0084]: facial identity features obtained through calculation;[0137];[0140]: constructing loss function for the generator network based on the differences in facial identity features between the first initial facial image sample and the first target facial image sample;[0162]: target facial image (i.e., the face-swapping result image)), and updating the generator network and the discriminator network, based on the calculated difference of the expression features between the swapped face image and the template image ([0084]:attribute features include expression;[0132];[0134];[0137]: constructing loss function for the generator network based on discriminant loss of the discriminant network;[0150-0151]: loss function constructed based on the loss of the difference between the target face image and template face image;[0141]; use weighted sum of the discriminator loss, the reconstruction loss, the identity loss (the difference between Xid and Zid), and the attribute loss (the difference between Xatt and Yatt) as the supervised training loss function for the adversarial training generative network and the discriminator network;[0162]: target facial image (i.e., the face-swapping result image), and the calculated difference of the identity features between the swapped face image and the source face image ([0084]: facial identity features obtained through calculation;[0137];[0140-0141]: constructing loss function for the generator network based on the differences in facial identity features between the first initial facial image sample and the first target facial image sample, use weighted sum of the discriminator loss, the reconstruction loss, the identity loss (the difference between Xid and Zid), as the supervised training loss function for the adversarial training generative network and the discriminator network;[0162]: target facial image (i.e., the face-swapping result image). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by encoding, fusing, decoding calculating, and updating the generator network and the discriminator network as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 2, Zhu et al. in view of Cao et al. teaches the method according to claim 1, wherein the acquiring of the sample triplet comprises: acquiring a first image and a second image, the first image and the second image corresponding to a same identity attribute and corresponding to different non-identity attributes (Zhu et al., [0095]: the source image and the target image both contain a male face A. The male face A in the source image and the male face A in the target image are different in terms of posture, expression, makeup, and background); acquiring a third image, the third image and the first image corresponding to different identity attributes (Zhu et al., [0210]: obtain second image, second image and the target image correspond to different identity attributes) ; replacing an object in the second image with an object in the third image to obtain a fourth image (Zhu et al., [0210]: replaces the objects in the target image with the objects in the second image to obtain the first image); and constructing the sample triplet by using the first image as the source face image (Zhu et al., [0210]: source image), the fourth image as the template image (Zhu et al., [0210]: first image), and the second image as the reference image (Zhu et al., [0210]: target image).
As to claim 3, Zhu et al. teaches the method as discussed above, but does not explicitly disclose further comprising: extracting a feature from the template image by using an expression recognition network of the face swap model to obtain the expression feature of the template image; and extracting a feature from the source face image by using a face recognition network of the face swap model to obtain the identity feature of the source face image, the expression recognition network and the face recognition network both being pre-trained neural network models. However, Cao et al. teaches extracting a feature from the template image by using an expression recognition network of the face swap model to obtain the expression feature of the template image ([0016]: encode template facial image to obtain attribute features of template facial image; [0084]:attribute features include expression; [0071]: artificial intelligence systems extracting information from images;[0108]; [0158]: image extracted from template image); and extracting a feature from the source face image by using a face recognition network of the face swap model to obtain the identity feature of the source face image ([0016]: encode initial facial image to obtain facial identity features; [0108];[0158]: extract facial regions from initial image), the expression recognition network and the face recognition network both being pre-trained neural network models ([0016]: encode template facial image to obtain attribute features of template facial image, encode initial facial image to obtain facial identity features; [0084]: attribute features include expression; [0085]: encode the initial facial image and the template facial image using a machine learning-based neural network;[0088]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by extracting a feature from the template image by using an expression recognition network of the face swap model to obtain the expression feature of the template image, extracting a feature from the source face image by using a face recognition network of the face swap model to obtain the identity feature of the source face image, the expression recognition network and the face recognition network both being pre-trained neural network models as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 4, Zhu et al. teaches the method as discussed above, concatenating the source face image and the template image to obtain an input image ([0068];[0095];[0097]: the source image and the target image both contain a male face A. The face image containing female face B is selected as the first image. The male face A in the source image and the male face A in the target image are different in terms of posture, expression, makeup, and background. The female face B in the face image has the same posture, expression, makeup and background as the male face A in the target image); inputting the input image into the face swap model ([0068];[0095];[0097]: The generator in the image processing model inputs the terminal source image and the first image); but does not explicitly disclose wherein performing the encoding based on the source face image and the template image by using the generator network of the face swap model to obtain the encoding feature required for face swap comprises: encoding the input image by using the generator network of the face swap model to obtain the encoding feature required for face swap on the template image. However, Cao et al. teaches wherein performing the encoding based on the source face image and the template image by using the generator network of the face swap model to obtain the encoding feature required for face swap ([0011];[0108-0109]: the initial facial image and the template facial image are encoded using encoding model) comprises: encoding the input image by using the generator network of the face swap model to obtain the encoding feature required for face swap on the template image ([0011];[0110]: encoding the initial facial image and the template facial image together to obtain the common encoded features corresponding to the initial facial image and the template facial image;[0180-0082]: face swapping).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by encoding based on the source face image and the template image by using the generator network of the face swap model to obtain the encoding feature required for face swap comprising encoding the input image by using the generator network of the face swap model to obtain the encoding feature required for face swap on the template image as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 6, Zhu et al. in view of Cao et al. teaches the method according to claim 1, wherein respectively predicting the image attribute discrimination result of the swapped face image and the image attribute discrimination result of the reference image by using the discriminator network of the face swap model (Zhu et al., [0097-0101]: Input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination result. The image attributes include fake images and non-fake images); comprises: inputting the swapped face image into the discriminator network of the face swap model, and predicting a first probability that the swapped face image is a non-forged image by using the discriminator network (Zhu et al., [0011];[0022]: input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination results. The image attributes include non-forged images;[0073]: use the source image, the first image, and the target image as the discriminator in the first combination input image processing model to obtain probability representing non-forged image;[0133]: face image from the face-swapped video as source image); and inputting the reference image into the discriminator network of the face swap model, and predicting a second probability that the reference image is a non-forged image by using the discriminator network (Zhu et al.,[0022]:[0073]: use the source image, the first image, and the target image as the discriminator in the first combination input image processing model to obtain probability representing non-forged image). As to claim 7, Zhu et al. teaches the method as discussed above, but does not explicitly disclose wherein after the swapped face image is obtained, the method further comprises:extracting a feature from the swapped face image by using the expression recognition network of the face swap model to obtain the expression feature of the swapped face image; and extracting a feature from the swapped face image by using the face recognition network of the face swap model to obtain the identity feature of the swapped face image, the expression recognition network and the face recognition network both being pre-trained neural network models.
However, Cao et al. teaches wherein after the swapped face image is obtained ([0036];[0171];[0179-0180]: face swap), the method further comprises: extracting a feature from the swapped face image by using the expression recognition network of the face swap model to obtain the expression feature of the swapped face image ([0013]: target facial image is matched with the attribute features of the template facial image; [0084]:attribute features include expression;[0137]); and extracting a feature from the swapped face image by using the face recognition network of the face swap model to obtain the identity feature of the swapped face image ([0013]: target facial image is matched with the facial identity features of the initial facial image;[0137]), the expression recognition network and the face recognition network both being pre-trained neural network models ([0013]: the target facial image is matched with the facial identity features of the initial facial image and with the attribute features of the template facial image; [0084]: attribute features include expression;[0072];[0084];[0088]: neural network). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by extracting a feature from the swapped face image by using the expression recognition network of the face swap model to obtain the expression feature of the swapped face image, and extracting a feature from the swapped face image by using the face recognition network of the face swap model to obtain the identity feature of the swapped face image, the expression recognition network and the face recognition network both being pre-trained neural network models as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 9, Zhu et al. in view of Cao et al. teaches the method according to claim 1, further comprising: respectively recognizing facial key points in the template image and facial key points in the swapped face image by using a pre-trained facial key point network to obtain facial key point information of the template image and facial key point information of the swapped face image (Zhu et al. , [0080]: obtain the features of the output image and the target image;[0092]: Object in the image is a face;[0125];[0183-0185];[0196]); and constructing a key point loss based on a difference between the facial key point information of the template image and the facial key point information of the swapped face image, the key point loss being configured for participating in training for the generator network of the face swap model (Zhu et al., [0092]: Object in the image is a face;[0116];[0119];[0173]: determining difference between the output image and the target image;[0260]). As to claim 10, Zhu et al. in view of Cao et al. teaches the method according to claim 1, further comprising: respectively extracting an image feature from the swapped face image and an image feature from the reference image by using a pre-trained feature extraction network to obtain the image feature of the swapped face image and the image feature of the reference image (Zhu et al. , [0080]: obtain the features of the output image and the target image;[0125];[0183-0185];[0196];[0260]); and constructing a similarity loss based on a difference between the image feature of the swapped face image and the image feature of the reference image, the similarity loss being configured for participating in the training for the generator network of the face swap model (Zhu et al.,[0156]: construct loss function based on the feature vector similarity between the output image and the target image, the channel space similarity, the multi-level feature similarity, and the feature vector similarity between the output image and the source image;[0158];[0260]: loss function is constructed to train the generator). As to claim 11, Zhu et al. teaches the method as discussed above, but does not explicitly disclose further comprising: constructing a reconstruction loss based on a pixel-level difference between the swapped face image and the reference image, the reconstruction loss being configured for participating in the training for the generator network of the face swap model.
However, Cao et al. teaches further comprising: constructing a reconstruction loss based on a pixel-level difference between the swapped face image and the reference image, the reconstruction loss being configured for participating in the training for the generator network of the face swap model ([0097];[0134]: loss function is related to pixel differences between first target face image sample and label image;[0140-0141]: loss function of the generative network can be constructed based on the loss between the first target facial image sample Result and the label image Target used as the training label (pixel reconstruction loss)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by constructing a reconstruction loss based on a pixel-level difference between the swapped face image and the reference image, the reconstruction loss being configured for participating in the training for the generator network of the face swap model as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 12, Zhu et al. teaches the method as discussed above, further comprising: acquiring a to-be-face-swapped video and a source face image comprising a target face ([0087-0088]: image and video to be swapped, replaces all the faces in the video to be face-swapped with the faces in the image to be face-swapped); acquiring, for each video frame of the to-be-face-swapped video, an expression feature of the video frame ([0080]: obtain the features of the output image and the target image;[0093]: in video face-swapping scenario, the face in the source image and the face in the target image are the same person's face, but the facial expression, makeup, the person's posture, and the background in the image can be partially the same, completely different, or completely identical;[0125]); acquiring an identity feature of the source face image comprising the target face ([0093]: the face in the source image and the face in the target image are the same person's face; [0121];[0123]); concatenating the expression feature and the identity feature to obtain a combined feature ([0095];[0121-0123]: non-identity attributes such as expression. Original video and the target video have the same non-identity attributes. The original video and the target video have different identity attributes. If object A is in the original video, the target video is obtained by replacing object A with object B); and outputting a swapped face video in which an object in the video frame is replaced with the target face ([0087]: replaces all the faces in the video to be face-swapped with the faces in the image to be face-swapped;[0123]), but does not explicitly disclose performing encoding based on the source face image comprising the target face and the video frame by using the trained generator network of the face swap model to obtain an encoding feature required for face swap, performing decoding based on a fused feature obtained by fusing the encoding feature and the combined feature.
However, Cao et al. teaches performing encoding based on the source face image comprising the target face and the video frame by using the trained generator network of the face swap model to obtain an encoding feature required for face swap ([0016]: encoding module;[0086]: encode video frame; [0108-0109]: encoding initial image; [0192]), performing decoding based on a fused feature obtained by fusing the encoding feature and the combined feature ([0016-0018]: The fusion module is used to fuse the facial identity features, the attribute features, and the common encoded features to obtain the target features, decode target feature).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by performing encoding based on the source face image comprising the target face and the video frame by using the trained generator network of the face swap model to obtain an encoding feature required for face swap, and performing decoding based on a fused feature obtained by fusing the encoding feature and the combined feature.as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 13, Zhu et al. teaches a computer device ([0002]: computer device), comprising: one or more processors and a memory containing a computer-executable program that, when being executed, causes the one or more processors (0024];[0300]: memory storing a computer program, and the processor executing the computer program) to perform: acquiring a sample triplet, the sample triplet comprising a source face image, a template image, and a reference image ([0091-0092]: obtain triplet sample, which include source image, first image and target image. Object in the image is a face); concatenating an expression feature of the template image and an identity feature of the source face image to obtain a combined feature ([0068]: inputs the source image and the first image into the generator in the image processing model to obtain the output image; [0095-0097]: the source image and the target image both contain a male face A. The face image containing female face B is selected as the first image. The male face A in the source image and the male face A in the target image are different in terms of posture, expression, makeup, and background. The female face B in the face image has the same posture, expression, makeup and background as the male face A in the target image. The generator in the image processing model inputs the terminal source image and the first image, and replaces the objects in the source image into the first image to obtain the output image); respectively predicting an image attribute discrimination result of the swapped face image and an image attribute discrimination result of the reference image by using a discriminator network of the face swap model, an image attribute comprising forged image and non-forged image ([0097-0101]: Input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination result. The image attributes include fake images and non-fake images); and
updating the generator network and the discriminator network, based on the image attribute discrimination result of the swapped face image and the image attribute discrimination result of the reference image ([0097-0102]: Input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination result. The image attributes include fake images and non-fake images. Adjust the parameters of the generator and the discriminator based on the image attribute discrimination results), but does not explicitly disclose performing encoding based on the source face image and the template image by using a generator network of the face swap model to obtain an encoding feature required for face swap; fusing the encoding feature and the combined feature to obtain a fused feature; performing decoding based on the fused feature by using the generator network of the face swap model to obtain a swapped face image; calculating a difference between an expression feature of the swapped face image and the expression feature of the template image, calculating a difference between an identity feature of the swapped face image and an identity feature of the source face image, and updating the generator network and the discriminator network based on the calculated difference of the expression features between the swapped face image and the template image, and the calculated difference of the identity features between the swapped face image and the source face image. However, Cao et al. teaches performing encoding based on the source face image and the template image by using a generator network of the face swap model to obtain an encoding feature required for face swap ([0011];[0108-0109]: the initial facial image and the template facial image are encoded using encoding model); fusing the encoding feature and the combined feature to obtain a fused feature ([0011-0012];[0031];[0115]: fusing the facial identity features, the attribute features, and the common encoded feature); performing decoding based on the fused feature by using the generator network of the face swap model to obtain a swapped face image ([0126-0128]: feature fusion model is the input to the decoding model;[0162]: target facial image (i.e., the face-swapping result image) is obtained through the decoding model (decoding module)); calculating a difference between an expression feature of the swapped face image and the expression feature of the template image ([0084]:attribute features include expression; [0137];[0150]: loss function constructed based on the loss of the difference between the target face image and template face image;[0162]: target facial image (i.e., the face-swapping result image)) , calculating a difference between an identity feature of the swapped face image and an identity feature of the source face image ([0084]: facial identity features obtained through calculation;[0137];[0140]: constructing loss function for the generator network based on the differences in facial identity features between the first initial facial image sample and the first target facial image sample;[0162]: target facial image (i.e., the face-swapping result image)), and updating the generator network and the discriminator network, based on the calculated difference of the expression features between the swapped face image and the template image ([0084]:attribute features include expression;[0132];[0134];[0137]: constructing loss function for the generator network based on discriminant loss of the discriminant network;[0150-0151]: loss function constructed based on the loss of the difference between the target face image and template face image;[0141]; use weighted sum of the discriminator loss, the reconstruction loss, the identity loss (the difference between Xid and Zid), and the attribute loss (the difference between Xatt and Yatt) as the supervised training loss function for the adversarial training generative network and the discriminator network;[0162]: target facial image (i.e., the face-swapping result image), and the calculated difference of the identity features between the swapped face image and the source face image ([0084]: facial identity features obtained through calculation;[0137];[0140-0141]: constructing loss function for the generator network based on the differences in facial identity features between the first initial facial image sample and the first target facial image sample, use weighted sum of the discriminator loss, the reconstruction loss, the identity loss (the difference between Xid and Zid), as the supervised training loss function for the adversarial training generative network and the discriminator network;[0162]: target facial image (i.e., the face-swapping result image). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by encoding, fusing, decoding calculating, and updating the generator network and the discriminator network as taught by Cao et al. in order to improve efficiency of image processing.
As to clam 14, Zhu et al. in view of Cao et al. the device according to claim 13, wherein the one or more processors (Zhu et al., [0300]: processor) are configured to perform: acquiring a first image and a second image, the first image and the second image corresponding to a same identity attribute and corresponding to different non-identity attributes (Zhu et al., [0095]: the source image and the target image both contain a male face A. The male face A in the source image and the male face A in the target image are different in terms of posture, expression, makeup, and background); acquiring a third image, the third image and the first image corresponding to different identity attributes (Zhu et al., [0210]: obtain second image, second image and the target image correspond to different identity attributes) ; replacing an object in the second image with an object in the third image to obtain a fourth image (Zhu et al., [0210]: replaces the objects in the target image with the objects in the second image to obtain the first image); and constructing the sample triplet by using the first image as the source face image (Zhu et al., [0210]: source image), the fourth image as the template image (Zhu et al., [0210]: first image), and the second image as the reference image (Zhu et al., [0210]: target image).
As to claim 15, Zhu et al. teaches the device as discussed above, but does not explicitly disclose wherein the one or more processors are configured to perform: extracting a feature from the template image by using an expression recognition network of the face swap model to obtain the expression feature of the template image; and extracting a feature from the source face image by using a face recognition network of the face swap model to obtain the identity feature of the source face image, the expression recognition network and the face recognition network both being pre-trained neural network models. However, Cao et al. teaches wherein the one or more processors ([0211]: processor) are configured to perform: extracting a feature from the template image by using an expression recognition network of the face swap model to obtain the expression feature of the template image ([0016]: encode template facial image to obtain attribute features of template facial image; [0084]:attribute features include expression; [0071]: artificial intelligence systems extracting information from images;[0108]; [0158]: image extracted from template image); and extracting a feature from the source face image by using a face recognition network of the face swap model to obtain the identity feature of the source face image ([0016]: encode initial facial image to obtain facial identity features; [0108];[0158]: extract facial regions from initial image), the expression recognition network and the face recognition network both being pre-trained neural network models ([0016]: encode template facial image to obtain attribute features of template facial image, encode initial facial image to obtain facial identity features; [0084]: attribute features include expression; [0085]: encode the initial facial image and the template facial image using a machine learning-based neural network;[0088]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by extracting a feature from the template image by using an expression recognition network of the face swap model to obtain the expression feature of the template image, extracting a feature from the source face image by using a face recognition network of the face swap model to obtain the identity feature of the source face image, the expression recognition network and the face recognition network both being pre-trained neural network models as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 16, Zhu et al. teaches the device as discussed above, wherein the one or more processors (Zhu et al., [0300]: processor) are configured to perform: concatenating the source face image and the template image to obtain an input image; ([0068];[0095];[0097]: the source image and the target image both contain a male face A. The face image containing female face B is selected as the first image. The male face A in the source image and the male face A in the target image are different in terms of posture, expression, makeup, and background. The female face B in the face image has the same posture, expression, makeup and background as the male face A in the target image); inputting the input image into the face swap model ([0068];[0095];[0097]: The generator in the image processing model inputs the terminal source image and the first image); but does not explicitly disclose encoding the input image by using the generator network of the face swap model to obtain the encoding feature required for face swap on the template image. However, Cao et al. teaches encoding the input image by using the generator network of the face swap model to obtain the encoding feature required for face swap on the template image ([0011];[0110]: encoding the initial facial image and the template facial image together to obtain the common encoded features corresponding to the initial facial image and the template facial image;[0180-0082]: face swapping).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by encoding the input image by using the generator network of the face swap model to obtain the encoding feature required for face swap on the template image as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 18, Zhu et al. in view of Cao et al. teaches the device as discussed above, wherein the one or more processors (Zhu et al., [0300]: processor) are configured to perform: inputting the swapped face image into the discriminator network of the face swap model, and predicting a first probability that the swapped face image is a non-forged image by using the discriminator network (Zhu et al., [0011];[0022]: input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination results. The image attributes include non-forged images;[0073]: use the source image, the first image, and the target image as the discriminator in the first combination input image processing model to obtain probability representing non-forged image;[0133]: face image from the face-swapped video as source image); and inputting the reference image into the discriminator network of the face swap model, and predicting a second probability that the reference image is a non-forged image by using the discriminator network (Zhu et al., [0022]:[0073]: use the source image, the first image, and the target image as the discriminator in the first combination input image processing model to obtain probability representing non-forged image). As to claim 19, Zhu et al. teaches the device as discussed above, but does not explicitly disclose wherein the one or more processors are configured to perform: extracting a feature from the swapped face image by using the expression recognition network of the face swap model to obtain the expression feature of the swapped face image; and extracting a feature from the swapped face image by using the face recognition network of the face swap model to obtain the identity feature of the swapped face image, the expression recognition network and the face recognition network both being pre-trained neural network models. However, Cao et al. teaches wherein the one or more processors ([0211]: processor) are configured to perform: extracting a feature from the swapped face image by using the expression recognition network of the face swap model to obtain the expression feature of the swapped face image ([0013]: target facial image is matched with the attribute features of the template facial image; [0084]:attribute features include expression;[0137]); and extracting a feature from the swapped face image by using the face recognition network of the face swap model to obtain the identity feature of the swapped face image ([0013]: target facial image is matched with the facial identity features of the initial facial image;[0137]), the expression recognition network and the face recognition network both being pre-trained neural network models ([0013]: the target facial image is matched with the facial identity features of the initial facial image and with the attribute features of the template facial image; [0084]: attribute features include expression;[0072];[0084];[0088]: neural network). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by extracting a feature from the swapped face image by using the expression recognition network of the face swap model to obtain the expression feature of the swapped face image, and extracting a feature from the swapped face image by using the face recognition network of the face swap model to obtain the identity feature of the swapped face image, the expression recognition network and the face recognition network both being pre-trained neural network models as taught by Cao et al. in order to improve efficiency of image processing.
As to claim 20, Zhu et al. teaches a non-transitory computer-readable storage medium, storing a computer-executable instruction that, when being executed, causes one or more processors ([0024]: memory storing computer program, and processor executing computer program) to perform: acquiring a sample triplet, the sample triplet comprising a source face image, a template image, and a reference image ([0091-0092]: obtain triplet sample, which include source image, first image and target image. Object in the image is a face); concatenating an expression feature of the template image and an identity feature of the source face image to obtain a combined feature ([0068]: inputs the source image and the first image into the generator in the image processing model to obtain the output image; [0095-0097]: the source image and the target image both contain a male face A. The face image containing female face B is selected as the first image. The male face A in the source image and the male face A in the target image are different in terms of posture, expression, makeup, and background. The female face B in the face image has the same posture, expression, makeup and background as the male face A in the target image. The generator in the image processing model inputs the terminal source image and the first image, and replaces the objects in the source image into the first image to obtain the output image); respectively predicting an image attribute discrimination result of the swapped face image and an image attribute discrimination result of the reference image by using a discriminator network of the face swap model, an image attribute comprising forged image and non-forged image ([0097-0101]: Input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination result. The image attributes include fake images and non-fake images); and
updating the generator network and the discriminator network, based on the image attribute discrimination result of the swapped face image and the image attribute discrimination result of the reference image ([0097-0102]: Input the source image, the first image, the target image, and the output image into the discriminator in the image processing model to obtain image attribute discrimination result. The image attributes include fake images and non-fake images. Adjust the parameters of the generator and the discriminator based on the image attribute discrimination results), but does not explicitly disclose performing encoding based on the source face image and the template image by using a generator network of the face swap model to obtain an encoding feature required for face swap; fusing the encoding feature and the combined feature to obtain a fused feature; performing decoding based on the fused feature by using the generator network of the face swap model to obtain a swapped face image; calculating a difference between an expression feature of the swapped face image and the expression feature of the template image, calculating a difference between an identity feature of the swapped face image and an identity feature of the source face image, and updating the generator network and the discriminator network based on the calculated difference of the expression features between the swapped face image and the template image, and the calculated difference of the identity features between the swapped face image and the source face image. However, Cao et al. teaches performing encoding based on the source face image and the template image by using a generator network of the face swap model to obtain an encoding feature required for face swap ([0011];[0108-0109]: the initial facial image and the template facial image are encoded using encoding model); fusing the encoding feature and the combined feature to obtain a fused feature ([0011-0012];[0031];[0115]: fusing the facial identity features, the attribute features, and the common encoded feature); performing decoding based on the fused feature by using the generator network of the face swap model to obtain a swapped face image ([[0126-0128]: feature fusion model is the input to the decoding model;[0162]: target facial image (i.e., the face-swapping result image) is obtained through the decoding model (decoding module)); calculating a difference between an expression feature of the swapped face image and the expression feature of the template image ([0084]:attribute features include expression; [0137];[0150]: loss function constructed based on the loss of the difference between the target face image and template face image;[0162]: target facial image (i.e., the face-swapping result image)) , calculating a difference between an identity feature of the swapped face image and an identity feature of the source face image ([0084]: facial identity features obtained through calculation;[0137];[0140]: constructing loss function for the generator network based on the differences in facial identity features between the first initial facial image sample and the first target facial image sample;[0162]: target facial image (i.e., the face-swapping result image)), and updating the generator network and the discriminator network, based on the calculated difference of the expression features between the swapped face image and the template image ([0084]:attribute features include expression;[0132];[0134];[0137]: constructing loss function for the generator network based on discriminant loss of the discriminant network;[0150-0151]: loss function constructed based on the loss of the difference between the target face image and template face image;[0141]; use weighted sum of the discriminator loss, the reconstruction loss, the identity loss (the difference between Xid and Zid), and the attribute loss (the difference between Xatt and Yatt) as the supervised training loss function for the adversarial training generative network and the discriminator network;[0162]: target facial image (i.e., the face-swapping result image), and the calculated difference of the identity features between the swapped face image and the source face image ([0084]: facial identity features obtained through calculation;[0137];[0140-0141]: constructing loss function for the generator network based on the differences in facial identity features between the first initial facial image sample and the first target facial image sample, use weighted sum of the discriminator loss, the reconstruction loss, the identity loss (the difference between Xid and Zid), as the supervised training loss function for the adversarial training generative network and the discriminator network;[0162]: target facial image (i.e., the face-swapping result image). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the device of Zhu et al. by encoding, fusing, decoding calculating, and updating the generator network and the discriminator network as taught by Cao et al. in order to improve efficiency of image processing.
Allowable Subject Matter
Claims 5, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 8 would be allowable if rewritten to overcome the claim objections set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STACY KHOO whose telephone number is (571)270-3698. The examiner can normally be reached Mon-Fri 8:00 am-5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Eason can be reached at 571-270-7230. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/STACY KHOO/Primary Examiner, Art Unit 2624