Last updated: April 19, 2026
Application No. 18/416,382
IMAGE PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM

Non-Final OA §103
Filed
Jan 18, 2024
Examiner
ROBERTS, RACHEL L
Art Unit
2674
Tech Center
2600 — Communications
Assignee
Tencent Technology (Shenzhen) Company Limited
OA Round
1 (Non-Final)
Interview Optional

— +14.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 19 resolved cases, 2023–2026
Examiner Intelligence

ROBERTS, RACHEL L View full profile →
Grants 90% — above average
Career Allow Rate
17 granted / 19 resolved
+27.5% vs TC avg
Moderate +14% lift
Without
With
+14.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
12.1%
-27.9% vs TC avg
§103
65.1%
+25.1% vs TC avg
§102
7.9%
-32.1% vs TC avg
§112
12.1%
-27.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 19 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged that application is a 371 of PCT/CN2023/113992. Applicant claims the benefit of Foreign Priority from Application No CN202211075798.7, filed 09/05/2022. Claims 1-20 have been afforded the benefit of this filing date.

Information Disclosure Statement
The IDS dated 01/18/2024 have been considered and placed in the application file.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.


Claims 1-2, 4-8, 12-14, and 16-19 are rejected under 35 U.S.C. 103 as unpatentable over Cao et al CN111401216A (using translation from epsace.net and images translated from google translate) hereafter referred to as Cao) in view of Zhu et al. CN111353546A ((using translation from epsace.net and images translated from google translate) hereafter referred to as Zhu).

Regarding Claim 1, Cao teaches an image processing method (Cao Pg 1, ¶05 and Pg 1 ¶09 discloses an image processing method) performed by a computer device (Cao Pg 2 ¶03 and Pg 3 ¶06 discloses a computer device with memory to execute a computer program), the image processing method comprising:
comprising a first source image (Cao Pg 2 ¶04 discloses an initial facial image) and a fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) being based on identity swapping processing of the real labeled image (Cao Pg 2 ¶14 discloses the swapping of the target feature on the facial image so that the target facial image matches the facial identity characteristics of the initial facial image, and matches the attribute characteristics of the template facial image), the first source image (Cao Pg 2 ¶04 discloses an initial facial image) having a same identity attribute (Cao Pg 6 ¶04 disclose the initial facial image provides facial identity features), and the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) having a same non-identity attribute (Cao Pg 6 ¶04 disclose the template image providing the attribute features);
inputting (Cao Fig 5 discloses inputting the image into the model) the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) into an identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) and performing identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) based on the first source image (Cao Pg 2 ¶04 discloses an initial facial image) to obtain a first identity swapping image of the fake template image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character);
comprising a second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), a real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) the fake labeled image being based on identity swapping processing (Cao Pg 2 ¶14 discloses the swapping of the target feature on the facial image so that the target facial image matches the facial identity characteristics of the initial facial image, and matches the attribute characteristics of the template facial image) of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image)  based on the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image) having a same identity attribute (Cao Pg 6 ¶04 disclose the initial facial image provides facial identity features), and the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image)  having a same non-identity attribute (Cao Pg 6 ¶04 disclose the template image providing the attribute features);
inputting (Cao Fig 5 discloses inputting the image into the model)  the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) into the identity swapping model  (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) and performing identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) based on the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image)  to obtain a second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image); and 
training (Cao Pg 2 ¶17 disclose the method for training the model) the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature)
the first identity swapping image  (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)
and the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image)  to generate a trained identity swapping model (Cao Pg 13 ¶08 discloses training the model) to perform identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on a target template image (Cao Pg 9 ¶04 discloses the template facial image reference) based on a target source image (Cao Pg 12 ¶05 and Pg 16 ¶02 disclose a target image).  
Cao does not explicitly disclose obtaining a fake template sample group, a real labeled image, and the real labeled image, and the real labeled image, obtaining a fake labeled sample group, and a fake labeled image, and the fake labeled image, and the fake labeled image, based on the fake template sample group, the fake labeled sample group.
 Zhu is in the same field of image analysis to produce synthesized images or video. Further, Zhu teaches obtaining a fake template sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image), a real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image), and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image), and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) obtaining a fake labeled sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample group that includes an image with attributes including a fake image) and a fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image), and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image), and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image) based on the fake template sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image) the fake labeled sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image). 
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Cao by including real labeled images as part of the fake template sample group and updating the parameters of the model and assigning weights to the pixel differences as taught by Zhu, to make an invention that can more accurately differentiate between the features need to complete a successful and clear swapped image; thus one of ordinary skilled in the art would be motivated to combine the references since there is a need  to improve the stability and robustness of the image processing model, the image processing model obtained by the triple sample training can ensure the synthesis of area and the original image or the original video in shape, illumination, the action is consistent, thereby improving the quality of the image and a synthetic video (Zhu Pg 14 ¶06). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 

Regarding Claim 2, Cao in view of Zhu teaches the image processing method according to claim 1, wherein the training comprises: 
determining a pixel reconstruction loss (Cao Pg 13 ¶03-¶04 discloses a pixel reconstruction loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) based on a first pixel difference (Cao Pg 16 ¶10 discloses the pixel difference between the first images) between the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) and a second pixel difference (Cao Pg 13 ¶08 discloses a pixel difference between the second images) between the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image);  
determining a feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature)  based on a feature difference (Cao Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function) between the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image); 
extracting face features (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows) of the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)  the first source image (Cao Pg 2 ¶04 discloses an initial facial image), the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image), the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), and the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) to determine an identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature); 
performing discriminative processing (Pg 13 ¶089 discloses inputting the image into a discriminant network) on the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)   and the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) to obtain an adversarial loss (Cao Pg 13 ¶04 discloses a loss function of the adversarial training generation network) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature); and 
performing summation (Cao Pg 13 ¶04 discloses performing weighted summation) on the pixel reconstruction loss (Cao Pg 13 ¶03-¶04 discloses a pixel reconstruction loss), the feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function), the identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss), and the adversarial loss (Cao Pg 13 ¶04 discloses a loss function of the adversarial training generation network) of the identity swapping model(Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature), to obtain loss information of the identity swapping model (Cao Pg 13 ¶03 discloses obtaining loss information from the model), and updating model parameters of the identity swapping model (Zhu Fig 11 1108 Pg 20 ¶01-03 discloses updating the samples to continue training the model) based on the loss information of the identity swapping model (Cao Pg 13 ¶03 discloses obtaining loss information from the model) to train (Cao Pg 2 ¶17 disclose the method for training the model) the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature). See claim 1 for rationale, its parent claim.

Regarding Claim 4, Cao in view of Zhu teaches the image processing method according to claim 2, wherein the identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) comprises a first identity loss (Cao Pg 13 ¶04 disclose performing an identity loss) and a second identity loss (Cao Pg 14 ¶04 discloses the identity loss also being calculated for the second image); 
and the extracting comprises:
determining the first identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss) based on a similarity between face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)  the first source image (Cao Pg 2 ¶04 discloses an initial facial image), and face features (Cao Pg 13 ¶02 discloses facial identity features) of the first source image (Cao Pg 2 ¶04 discloses an initial facial image) and a similarity between face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) and face features (Cao Pg 13 ¶02 discloses facial identity features) of the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image); and 
determining the second identity loss (Cao Pg 14 ¶04 discloses the identity loss also being calculated for the second image) based on a similarity between the face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and face features (Cao Pg 13 ¶02 discloses facial identity features) of the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), a similarity between the face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the first source image (Cao Pg 2 ¶04 discloses an initial facial image) and the face features (Cao Pg 13 ¶03 discloses the face similarity between the images)  of the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), a similarity between the face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image)  and face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), and a similarity between the face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image) and the face features of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image).  See claim 1 for rationale, its parent claim.

Regarding Claim 5, Cao in view of Zhu teaches the image processing method according to claim 2, wherein the performing discriminative processing (Pg 13 ¶089 discloses inputting the image into a discriminant network) on the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) comprises: 
obtaining a discriminative model (Pg 13 ¶089 discloses inputting the image into a discriminant network); 
inputting the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) into the discriminative model (Cao Pg 13 ¶089 discloses inputting the image into a discriminant network) and performing discriminative processing (Pg 13 ¶089 discloses inputting the image into a discriminant network) on the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)  to obtain a first discriminative result (Cao Pg 12 ¶06 discloses the output being the different in pixels between first images); 
inputting the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image)  into the discriminative model (Pg 13 ¶089 discloses inputting the image into a discriminant network) and performing discriminative processing  (Pg 13 ¶089 discloses inputting the image into a discriminant network) on the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) to obtain a second discriminative result (Cao Pg 13 ¶08 discloses the output being the second target facial image samples being used as negative samples); and 
determining the adversarial loss (Cao Pg 13 ¶04 discloses a loss function of the adversarial training discrimination network) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) based on the first discriminative result (Cao Pg 12 ¶06 discloses the output being the different in pixels between first images) and the second discriminative result (Cao Pg 13 ¶08 discloses the output being the second target facial image samples being used as negative samples).  See claim 1 for rationale, its parent claim.

Regarding Claim 6, Cao in view of Zhu teaches the image processing method according to claim 2, wherein determining a pixel reconstruction loss (Cao Pg 13 ¶03-¶04 discloses a pixel reconstruction loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) based on a first pixel difference (Cao Pg 16 ¶10 discloses the pixel difference between the first images) between the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) and a second pixel difference (Cao Pg 13 ¶08 discloses a pixel difference between the second images) between the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image) comprises: 
obtaining a first weight (Zhu Pg 14 ¶02 discloses assigning weighted for each feature vector) corresponding to the first pixel difference (Zhu Pg 14 ¶02 discloses assigning weighted for each feature vector corresponding to the similarity of each feature vector) and a second weight (Zhu Pg 14 ¶02 discloses assigning weighted for each feature vector) corresponding to the second pixel difference (Zhu Pg 14 ¶02 discloses assigning weighted for each feature vector corresponding to the similarity of each feature vector); 
performing weighted processing (Cao Pg 13 ¶4 and Pg 14 ¶05 disclose weighted summation being performed) on the first pixel difference (Cao Pg 16 ¶10 discloses the pixel difference between the first images) based on the first weight (Zhu Pg 14 ¶02 discloses assigning weighted for each feature vector), to obtain a first weighted pixel difference (Cao Pg 16 ¶10 discloses the pixel difference between the first images Pg 7 ¶06 discloses weight addition on characteristics for use in the network); 
performing weighted processing (Cao Pg 13 ¶4 and Pg 14 ¶05 disclose weighted summation being performed) on the second pixel difference (Cao Pg 13 ¶08 discloses a pixel difference between the second images) based on the second weight (Zhu Pg 14 ¶02 discloses assigning weighted for each feature vector), to obtain a second weighted pixel difference (Cao Pg 13 ¶08 discloses a pixel difference between the second images and Pg 7 ¶06 discloses weight addition on characteristics for use in the network); and 
performing summation (Cao Pg 13 ¶04 discloses performing weighted summation) on the first weighted pixel difference(Cao Pg 16 ¶10 discloses the pixel difference between the first images Pg 7 ¶06 discloses weight addition on characteristics for use in the network)  and the second weighted pixel difference (Cao Pg 13 ¶08 discloses a pixel difference between the second images and Pg 7 ¶06 discloses weight addition on characteristics for use in the network) , to obtain the pixel reconstruction loss (Cao Pg 13 ¶03-¶04 discloses a pixel reconstruction loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature). See claim 1 for rationale, its parent claim.

Regarding Claim 7, Cao in view of Zhu teaches the image processing method according to claim 1, wherein the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) comprises an encoding network (Cao Pg 7 ¶03 discloses a neural network which is an encoding model) and a decoding network (Cao Pg 10 ¶02 discloses a decoding model); and 
performing the identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)   on the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) based on the first source image (Cao Pg 2 ¶04 discloses an initial facial image) comprises: 
calling the encoding network  (Cao Pg 7 ¶03 discloses a neural network which is an encoding model) to perform fusion encoding processing (Cao Pg 10 ¶03 discloses the feature fusion model being an encoding model) on the first source image (Cao Pg 2 ¶04 discloses an initial facial image) and the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), to obtain an encoding result (Cao Pg 8 ¶04 discloses the result of the encoding being obtaining the facial features of the first image); and 
calling the decoding network (Cao Pg 10 ¶02 discloses a decoding model)  to perform decoding processing (Cao Pg 11 ¶4 discloses the decoding model is used to decode the target feature to obtain an image that matches the facial identity feature of the initial facial image sample) on the encoding result (Cao Pg 8 ¶04 discloses the result of the encoding being obtaining the facial features of the first image) to obtain the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) of the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image). See claim 1 for rationale, its parent claim.

Regarding Claim 8, Cao in view of Zhu teaches the image processing method according to claim 7, wherein calling the encoding network  (Cao Pg 7 ¶03 discloses a neural network which is an encoding model) to perform fusion encoding processing (Cao Pg 10 ¶03 discloses the feature fusion model being an encoding model) on the first source image (Cao Pg 2 ¶04 discloses an initial facial image) and the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) comprises:
performing splicing processing (Cao Pg 20 ¶04 discloses performing splicing) on the first source image (Cao Pg 2 ¶04 discloses an initial facial image) and the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), to obtain a spliced image (Cao Pg 20 ¶04 discloses splicing the image features from the image); 
performing feature learning (Cao Pg 15 ¶09 and Pg 8 ¶06 discloses a recognition feature coding model and using machine learning for facial identity features) on the spliced image (Cao Pg 20 ¶04 discloses splicing the image features from the image) to obtain identity swapping features (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character); 
performing face feature recognition (Cao Pg 15 ¶09 and Pg 8 ¶06 discloses a recognition feature coding model and using machine learning for facial identity features) on the first source image (Cao Pg 2 ¶04 discloses an initial facial image) to obtain face features of the first source image (Cao Pg 8 ¶2 discloses obtaining the facial identity characteristics of the initial facial image); and 
performing feature fusion processing (Cao Pg 10 ¶02-¶04 discloses a feature fusion model used to obtain target features including: the facial identity feature, attribute feature) on the identity swapping features (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the face features of the first source image (Cao Pg 8 ¶2 discloses obtaining the facial identity characteristics of the initial facial image) to obtain the encoding result (Cao Pg 8 ¶04 discloses the result of the encoding being obtaining the facial features of the first image). See claim 1 for rationale, its parent claim.

Regarding Claim 12, Cao in view of Zhu teaches the image processing method according to claim 1, wherein training (Cao Pg 2 ¶17 disclose the method for training the model) the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature), to use a trained identity swapping model (Cao Pg 13 ¶08 discloses training the model) to perform the identity swapping processing on the  (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on a target template image (Cao Pg 9 ¶04 discloses the template facial image reference) based on a target source image (Cao Pg 12 ¶05 and Pg 16 ¶02 disclose a target image): 
receiving the target source image (Cao Pg 12 ¶05 and Pg 16 ¶02 disclose a target image) and the target template image (Cao Pg 9 ¶04 discloses the template facial image reference) that are to be processed (Cao Pg 16 ¶03 discloses the image and template being processed); and 
inputting the target template image (Cao Figure 5 discloses inputting the template face image into a model) into the trained identity swapping model  (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) and performing identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on the target template image (Cao Pg 9 ¶04 discloses the template facial image reference)  based on the target source image (Cao Pg 12 ¶05 and Pg 16 ¶02 disclose a target image) to obtain an identity swapping image (Cao Figure 5 discloses the output being the target facial image result) of the target template image (Cao Pg 9 ¶04 discloses the template facial image reference), wherein the target source image(Cao Pg 12 ¶05 and Pg 16 ¶02 disclose a target image)  and the identity swapping image (Cao Figure 5 discloses the output being the target facial image result) of the target template image(Cao Pg 9 ¶04 discloses the template facial image reference)  have a same identity attribute (Cao Pg 6 ¶04 disclose the initial facial image provides facial identity features), and the target template image (Cao Pg 9 ¶04 discloses the template facial image reference) and the identity swapping image (Cao Figure 5 discloses the output being the target facial image result) of the target template image  (Cao Pg 9 ¶04 discloses the template facial image reference) have a same non-identity attribute (Cao Pg 6 ¶04 disclose the template image providing the attribute features).  See claim 1 for rationale, its parent claim.

Regarding Claim 13, Cao teaches an image processing apparatus (Cao Pg 1 ¶01 and Pg 1 ¶15 discloses an image processing device), comprising: 
at least one memory (Cao Pg 2 ¶03 discloses a memory) configured to store program code (Cao Pg 2 ¶03 discloses the memory storing a computer program); and 
at least one processor (Cao Pg 2 ¶03 discloses a processor) configured to read the program code and operate as instructed by the program code (Cao Pg 2 ¶03 discloses a processor reads and executes the program), the program code comprising: 
obtaining code (Cao Pg 2 ¶03 discloses a computer program) configured to cause at least one of the at least one processor (Cao Pg 2 ¶03 discloses a processor reads and executes the program) comprising a first source image (Cao Pg 2 ¶04 discloses an initial facial image) and a fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) being based on identity swapping processing of the real labeled image (Cao Pg 2 ¶14 discloses the swapping of the target feature on the facial image so that the target facial image matches the facial identity characteristics of the initial facial image, and matches the attribute characteristics of the template facial image), the first source image (Cao Pg 2 ¶04 discloses an initial facial image) having a same identity attribute (Cao Pg 6 ¶04 disclose the initial facial image provides facial identity features), and the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) having a same non-identity attribute (Cao Pg 6 ¶04 disclose the template image providing the attribute features); and
processing code (Cao Pg 2 ¶03 discloses a processor reads and executes the program)   configured to cause at least one of the at least one processor (Cao Pg 2 ¶03 discloses a processor) to input (Cao Fig 5 discloses inputting the image into the model) the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) into an identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) and performing identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) based on the first source image (Cao Pg 2 ¶04 discloses an initial facial image) to obtain a first identity swapping image of the fake template image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character), wherein the obtaining code (Cao Pg 2 ¶03 discloses a computer program) is further configured to cause at least one of the at least one processor (Cao Pg 2 ¶03 discloses a processor reads and executes the program) 
comprising a second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), a real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) the fake labeled image being based on identity swapping processing (Cao Pg 2 ¶14 discloses the swapping of the target feature on the facial image so that the target facial image matches the facial identity characteristics of the initial facial image, and matches the attribute characteristics of the template facial image) of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image)  based on the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image) having a same identity attribute (Cao Pg 6 ¶04 disclose the initial facial image provides facial identity features), and the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) having a same non-identity attribute (Cao Pg 6 ¶04 disclose the template image providing the attribute features); and
the processing code (Cao Pg 2 ¶03 discloses a processor reads and executes the program)  is further configured to cause at least one of the at least one processor (Cao Pg 2 ¶03 discloses a processor reads and executes the program) to: 
input (Cao Fig 5 discloses inputting the image into the model)  the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) into the identity swapping model  (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) and performing identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) based on the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image)  to obtain a second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image); and 
train (Cao Pg 2 ¶17 disclose the method for training the model) the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) the first identity swapping image  (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character), and the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image)  to generate a trained identity swapping model (Cao Pg 13 ¶08 discloses training the model) to perform identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on a target template image (Cao Pg 9 ¶04 discloses the template facial image reference) based on a target source image (Cao Pg 12 ¶05 and Pg 16 ¶02 disclose a target image).  
Cao does not explicitly disclose to obtain a fake template sample group, a real labeled image, and the real labeled image, and the real labeled image, to obtain a fake labeled sample group, and a fake labeled image, and the fake labeled image, and the fake labeled image, based on the fake template sample group, the fake labeled sample group. 
 Zhu is in the same field of image analysis to produce synthesized images or video. Further, Zhu teaches to obtain a fake template sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image) a real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image), and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image), and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image), to obtain a fake labeled sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample group that includes an image with attributes including a fake image) and a fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image) and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image) and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Cao by including real labeled images as part of the fake template sample group and updating the parameters of the model and assigning weights to the pixel differences as taught by Zhu, to make an invention that can more accurately differentiate between the features need to complete a successful and clear swapped image; thus one of ordinary skilled in the art would be motivated to combine the references since there is a need  to improve the stability and robustness of the image processing model, the image processing model obtained by the triple sample training can ensure the synthesis of area and the original image or the original video in shape, illumination, the action is consistent, thereby improving the quality of the image and a synthetic video (Zhu Pg 14 ¶06). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 

Regarding Claim 14, Cao in view of Zhu teaches the image processing apparatus according to claim 13, wherein the processing code (Cao Pg 2 ¶03 discloses a processor reads and executes the program)  is further configured to cause at least one of the at least one processor (Cao Pg 2 ¶03 discloses a processor reads and executes the program) to determine a pixel reconstruction loss (Cao Pg 13 ¶03-¶04 discloses a pixel reconstruction loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) based on a first pixel difference (Cao Pg 16 ¶10 discloses the pixel difference between the first images) between the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) and a second pixel difference (Cao Pg 13 ¶08 discloses a pixel difference between the second images) between the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image);
determine a feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature)  based on a feature difference (Cao Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function) between the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image);
extract face features (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows) of the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)  the first source image (Cao Pg 2 ¶04 discloses an initial facial image), the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image), the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), and the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) to determine an identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature); 
perform discriminative processing (Pg 13 ¶089 discloses inputting the image into a discriminant network) on the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)   and the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) to obtain an adversarial loss (Cao Pg 13 ¶04 discloses a loss function of the adversarial training generation network) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature); and 
perform summation (Cao Pg 13 ¶04 discloses performing weighted summation) on the pixel reconstruction loss (Cao Pg 13 ¶03-¶04 discloses a pixel reconstruction loss), the feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function), the identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss), and the adversarial loss (Cao Pg 13 ¶04 discloses a loss function of the adversarial training generation network) of the identity swapping model(Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature), to obtain loss information of the identity swapping model (Cao Pg 13 ¶03 discloses obtaining loss information from the model), and updating model parameters of the identity swapping model (Zhu Fig 11 1108 Pg 20 ¶01-03 discloses updating the samples to continue training the model) based on the loss information of the identity swapping model (Cao Pg 13 ¶03 discloses obtaining loss information from the model) to train (Cao Pg 2 ¶17 disclose the method for training the model) the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature).  See claim 13 for rationale, its parent claim.

Regarding Claim 16, Cao in view of Zhu teaches the image processing apparatus according to claim 14, wherein the identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) comprises a first identity loss (Cao Pg 13 ¶04 disclose performing an identity loss) and a second identity loss (Cao Pg 14 ¶04 discloses the identity loss also being calculated for the second image); and 
the processing code (Cao Pg 2 ¶03 discloses a processor reads and executes the program)  is further configured to cause at least one of the at least one processor (Cao Pg 2 ¶03 discloses a processor reads and executes the program): 
determine the first identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss) based on a similarity between face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)  the first source image (Cao Pg 2 ¶04 discloses an initial facial image), and face features (Cao Pg 13 ¶02 discloses facial identity features) of the first source image (Cao Pg 2 ¶04 discloses an initial facial image) and a similarity between face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) and face features (Cao Pg 13 ¶02 discloses facial identity features) of the second source image(Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image); and 
determine the second identity loss (Cao Pg 14 ¶04 discloses the identity loss also being calculated for the second image) based on a similarity between the face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and face features (Cao Pg 13 ¶02 discloses facial identity features) of the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), a similarity between the face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the first source image (Cao Pg 2 ¶04 discloses an initial facial image) and the face features (Cao Pg 13 ¶03 discloses the face similarity between the images)  of the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), a similarity between the face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image)  and face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), and a similarity between the face features (Cao Pg 13 ¶03 discloses the face similarity between the images) of the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image) and the face features of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image).  See claim 13 for rationale, its parent claim.

Regarding Claim 17, Cao in view of Zhu teaches the image processing apparatus according to claim 14, wherein the processing code(Cao Pg 2 ¶03 discloses a processor reads and executes the program)  is further configured to cause at least one of the at least one processor (Cao Pg 2 ¶03 discloses a processor reads and executes the program) to: 
obtain a discriminative model (Pg 13 ¶089 discloses inputting the image into a discriminant network);
 input the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) into the discriminative model (Cao Pg 13 ¶089 discloses inputting the image into a discriminant network) and performing discriminative processing (Pg 13 ¶089 discloses inputting the image into a discriminant network) on the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)  to obtain a first discriminative result (Cao Pg 12 ¶06 discloses the output being the different in pixels between first images);
input the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image)  into the discriminative model (Pg 13 ¶089 discloses inputting the image into a discriminant network) and performing discriminative processing  (Pg 13 ¶089 discloses inputting the image into a discriminant network) on the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) to obtain a second discriminative result (Cao Pg 13 ¶08 discloses the output being the second target facial image samples being used as negative samples); and 
determine the adversarial loss (Cao Pg 13 ¶04 discloses a loss function of the adversarial training discrimination network) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) based on the first discriminative result (Cao Pg 12 ¶06 discloses the output being the different in pixels between first images) and the second discriminative result (Cao Pg 13 ¶08 discloses the output being the second target facial image samples being used as negative samples). See claim 13 for rationale, its parent claim.

Regarding Claim 18, Cao teaches a non-transitory computer-readable storage medium (Cao Pg 21 ¶04 discloses the memory of the computer device includes a non-volatile storage medium and an internal memory, which is a form of non-transitory media) storing computer code (Cao Pg 21 ¶04 discloses the storage medium including a computer program) which, when executed by at least one processor (Cao Pg 21 ¶04 discloses the computer program is executed by the processor to realize an image processing/model training method), causes the at least one processor to at least:
comprising a first source image (Cao Pg 2 ¶04 discloses an initial facial image) and a fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) being based on identity swapping processing of the real labeled image (Cao Pg 2 ¶14 discloses the swapping of the target feature on the facial image so that the target facial image matches the facial identity characteristics of the initial facial image, and matches the attribute characteristics of the template facial image), the first source image (Cao Pg 2 ¶04 discloses an initial facial image) having a same identity attribute (Cao Pg 6 ¶04 disclose the initial facial image provides facial identity features), and the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), having a same non-identity attribute (Cao Pg 6 ¶04 disclose the template image providing the attribute features);
input (Cao Fig 5 discloses inputting the image into the model) the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) into an identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) and performing identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) based on the first source image (Cao Pg 2 ¶04 discloses an initial facial image) to obtain a first identity swapping image of the fake template image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character);
comprising a second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), a real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), the fake labeled image being based on identity swapping processing (Cao Pg 2 ¶14 discloses the swapping of the target feature on the facial image so that the target facial image matches the facial identity characteristics of the initial facial image, and matches the attribute characteristics of the template facial image) of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image)  based on the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image) having a same identity attribute (Cao Pg 6 ¶04 disclose the initial facial image provides facial identity features), and the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) having a same non-identity attribute (Cao Pg 6 ¶04 disclose the template image providing the attribute features);
input (Cao Fig 5 discloses inputting the image into the model)  the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) into the identity swapping model  (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) and performing identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) based on the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image)  to obtain a second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) of the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image); and 
train (Cao Pg 2 ¶17 disclose the method for training the model) the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) the first identity swapping image  (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character),and the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image)  to generate a trained identity swapping model (Cao Pg 13 ¶08 discloses training the model) to perform identity swapping processing (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) on a target template image (Cao Pg 9 ¶04 discloses the template facial image reference) based on a target source image (Cao Pg 12 ¶05 and Pg 16 ¶02 disclose a target image).  
Cao does not explicitly disclose obtain a fake template sample group, a real labeled image, and the real labeled image, and the real labeled image, obtain a fake labeled sample group, and a fake labeled image, and the fake labeled image, and the fake labeled image, based on the fake template sample group, the fake labeled sample group. 
 Zhu is in the same field of image analysis to produce synthesized images or video. Further, Zhu teaches obtain a fake template sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image), a real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image), and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image), and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image), obtain a fake labeled sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample group that includes an image with attributes including a fake image), and a fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image), and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image), and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image) based on the fake template sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image), the fake labeled sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image). 
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Cao by including real labeled images as part of the fake template sample group and updating the parameters of the model and assigning weights to the pixel differences as taught by Zhu, to make an invention that can more accurately differentiate between the features need to complete a successful and clear swapped image; thus one of ordinary skilled in the art would be motivated to combine the references since there is a need  to improve the stability and robustness of the image processing model, the image processing model obtained by the triple sample training can ensure the synthesis of area and the original image or the original video in shape, illumination, the action is consistent, thereby improving the quality of the image and a synthetic video (Zhu Pg 14 ¶06). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 
Regarding Claim 19, Cao in view of Zhu teaches the non-transitory computer-readable storage medium (Cao Pg 21 ¶04 discloses the memory of the computer device includes a non-volatile storage medium and an internal memory, which is a form of non-transitory media) according to claim 18, wherein the train comprises: 
determining a pixel reconstruction loss (Cao Pg 13 ¶03-¶04 discloses a pixel reconstruction loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) based on a first pixel difference (Cao Pg 16 ¶10 discloses the pixel difference between the first images) between the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) and a second pixel difference (Cao Pg 13 ¶08 discloses a pixel difference between the second images) between the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) and the fake labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶01-¶02 and  discloses labeling the second combination as a fake image);  
determining a feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature)  based on a feature difference (Cao Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function) between the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image); 
extracting face features (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows) of the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)  the first source image (Cao Pg 2 ¶04 discloses an initial facial image), the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image), the second source image (Cao Pg 11 ¶02 and Pg 13 ¶08 discloses a second initial facial image), and the real template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image) to determine an identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature); 
performing discriminative processing (Pg 13 ¶089 discloses inputting the image into a discriminant network) on the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)   and the second identity swapping image (Cao Pg 13 ¶08 discloses a second output image) to obtain an adversarial loss (Cao Pg 13 ¶04 discloses a loss function of the adversarial training generation network) of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature); and 
performing summation (Cao Pg 13 ¶04 discloses performing weighted summation) on the pixel reconstruction loss (Cao Pg 13 ¶03-¶04 discloses a pixel reconstruction loss), the feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function), the identity loss (Cao Pg 13 ¶04 and Pg 14 ¶04 discloses an identity loss), and the adversarial loss (Cao Pg 13 ¶04 discloses a loss function of the adversarial training generation network) of the identity swapping model(Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature), to obtain loss information of the identity swapping model (Cao Pg 13 ¶03 discloses obtaining loss information from the model), and updating model parameters of the identity swapping model (Zhu Fig 11 1108 Pg 20 ¶01-03 discloses updating the samples to continue training the model) based on the loss information of the identity swapping model (Cao Pg 13 ¶03 discloses obtaining loss information from the model) to train (Cao Pg 2 ¶17 disclose the method for training the model) the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature). See claim 18 for rationale, its parent claim.

Claims 3, 9-11, 15, and 20 are rejected under 35 U.S.C. 103 as unpatentable over Cao in view of Zhu in further view of Berlin et al (US Patent No 11,308,957 hereafter referred to as Berlin). 

Regarding Claim 3, Cao in view of Zhu teaches the image processing method according to claim 2, wherein determining the feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function) comprises:
to perform image feature extraction (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows) on the first identity swapping image  (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)
to perform the image feature extraction (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows)  on the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) to obtain a second feature extraction result (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows),
calculating feature differences (Cao Pg 13 ¶03 discloses calculating the facial identity feature difference)
performing summation (Cao Pg 13 ¶04 discloses performing weighted summation) of the feature differences (Cao Pg 13 ¶03 discloses calculating the facial identity feature difference) to obtain the feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function)  of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature).  
Cao in view of Zhu does not explicitly disclose obtaining an image feature extraction network comprising a plurality of image feature extraction layers;
calling the image feature extraction network,
to obtain a first feature extraction result, the first feature extraction result comprising an identity swapping image feature extracted from each image feature extraction layer of the plurality of image feature extraction layers;
calling the image feature extraction network the second feature extraction result comprising a labeled image feature extracted from each image feature extraction layer of the plurality of image feature extraction layers;
between the identity swapping image feature and the labeled image feature that are extracted from each image feature extraction layer; and
of the image feature extraction layers.
 Berlin is in the same field of image analysis to produce synthesized images or video. Further, Berlin teaches obtaining an image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features) comprising a plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
calling the image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features)
to obtain a first feature extraction result (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features the result being eyes, mouth nose), the first feature extraction result comprising an identity swapping image feature (Berlin Col 10 Lines 50-60 disclose the facial features being used tin the facial swapping process) extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers and Col 8 Lines 50-55 disclose extracting only necessary information in the encoder layer) of the plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
calling the image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features) the second feature extraction result comprising a labeled image (Berlin Col 10 Line 54-58 disclose automatically adds text, image, and/or graphic tags (e.g., timestamps) to the video) feature extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers and Col 8 Lines 50-55 disclose extracting only necessary information in the encoder layer) of the plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
between the identity swapping image feature  (Berlin Col 10 Lines 50-60 disclose the facial features being used tin the facial swapping process) and the labeled image feature (Berlin Col 10 Line 54-58 disclose automatically adds text, image, and/or graphic tags (e.g., timestamps) to the video) that are extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers); and
of the image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Cao in view of Zhu by including a feature extraction network including multiple layer and calculating the mean and variance of features as taught by Berlin, to make an invention that can more accurately differentiate between the features need to complete a successful and clear swapped image; thus one of ordinary skilled in the art would be motivated to combine the references since there is a need to include techniques that may provide a more accurate model while reducing the time and computer resources needed to create the CGI model. (Berlin Col 7 Lines 40-46). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 

Regarding Claim 9, Cao in view of Zhu teaches the image processing method according to claim 8, wherein (Cao Pg 10 ¶02-¶04 discloses a feature fusion model used to obtain target features including: the facial identity feature, attribute feature) on the identity swapping features (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the face features of the first source image (Cao Pg 8 ¶2 discloses obtaining the facial identity characteristics of the initial facial image) to obtain the encoding result (Cao Pg 8 ¶04 discloses the result of the encoding being obtaining the facial features of the first image) comprises:
of the identity swapping features (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) of the identity swapping features (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character);
of the face features (Cao Pg 8 ¶2 discloses obtaining the facial identity characteristics of the initial facial image) of the face features (Cao Pg 8 ¶2 discloses obtaining the facial identity characteristics of the initial facial image); and
performing the feature fusion processing (Cao Pg 10 ¶02-¶04 discloses a feature fusion model used to obtain target features including: the facial identity feature, attribute feature) on the identity swapping features (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the face features (Cao Pg 8 ¶2 discloses obtaining the facial identity characteristics of the initial facial image)
of the identity swapping features (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) 
of the identity swapping features (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)
of the face features (Cao Pg 8 ¶2 discloses obtaining the facial identity characteristics of the initial of the face features (Cao Pg 8 ¶2 discloses obtaining the facial identity characteristics of the initial facial image) to obtain the encoding result (Cao Pg 8 ¶04 discloses the result of the encoding being obtaining the facial features of the first image).  
Cao in view of Zhu does not explicitly disclose calculating a mean and a variance calculating a mean and a variance calculating a mean and a variance calculating a mean and a variance. 
 Berlin is in the same field of image analysis to produce synthesized images or video. Further, Berlin teaches calculating a mean (Berlin Col 14 Lines 50-55 disclose calculating the mean of the facial feature alignment) and a variance (Berlin Col 28 Lines 53-60 disclose calculating the variance) calculating a mean (Berlin Col 14 Lines 50-55 disclose calculating the mean of the facial feature alignment) and a variance (Berlin Col 28 Lines 53-60 disclose calculating the variance) calculating a mean (Berlin Col 14 Lines 50-55 disclose calculating the mean of the facial feature alignment) and a variance (Berlin Col 28 Lines 53-60 disclose calculating the variance) calculating a mean (Berlin Col 14 Lines 50-55 disclose calculating the mean of the facial feature alignment) and a variance (Berlin Col 28 Lines 53-60 disclose calculating the variance). 
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Cao in view of Zhu by including a feature extraction network including multiple layer and calculating the mean and variance of features as taught by Berlin, to make an invention that can more accurately differentiate between the features need to complete a successful and clear swapped image; thus one of ordinary skilled in the art would be motivated to combine the references since there is a need to include techniques that may provide a more accurate model while reducing the time and computer resources needed to create the CGI model. (Berlin Col 7 Lines 40-46). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 

Regarding Claim 10, Cao in view of Zhu teaches the image processing method according to claim 1, wherein obtaining the fake template sample group  (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image)  comprises: 
obtaining an initial source image (Zhu Pg 5 ¶2 and  discloses a source image and a first image) corresponding to the first source image (Cao Pg 2 ¶04 discloses an initial facial image), and obtaining an initial labeled image (Zhu Pg 12 ¶07 discloses the first combination belonging to a non fake image and corresponding label) corresponding to the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image);
	on the initial source image (Zhu Pg 5 ¶2 and  discloses a source image and a first image) corresponding to the first source image (Cao Pg 2 ¶04 discloses an initial facial image), to obtain the first source image (Cao Pg 2 ¶04 discloses an initial facial image) on the initial labeled image (Zhu Pg 12 ¶07 discloses the first combination belonging to a non fake image and corresponding label) corresponding to the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) , to obtain the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image);
obtaining a reference source image (Cao Pg 9 ¶04 and Pg 10 ¶01 discloses template facial image reference), and performing the identity swapping processing (Cao Pg 2 ¶14 discloses the swapping of the target feature on the facial image so that the target facial image matches the facial identity characteristics of the initial facial image, and matches the attribute characteristics of the template facial image) on the real labeled image(Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) based on the reference source image(Cao Pg 9 ¶04 and Pg 10 ¶01 discloses template facial image reference), to obtain the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image); and 
generating the fake template sample group (Zhu Pg 1 ¶10-¶12 discloses a triplet sample that includes an image with attributes including a fake image)  based on the first source image(Cao Pg 2 ¶04 discloses an initial facial image), the fake template image (Cao Pg 1 ¶10-¶12 and  discloses a template facial image), and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image). 
Cao in view of Zhu does not explicitly disclose performing face region cropping, and performing the face region cropping. 
 Berlin is in the same field of image analysis to produce synthesized images or video. Further, Berlin teaches performing face region cropping (Berlin Col 12 Line 59-63 disclose cropping the video to remove unneeded video content ( e.g., footage that does not include a specified face)) and performing the face region cropping (Berlin Col 12 Line 59-63 disclose cropping the video to remove unneeded video content ( e.g., footage that does not include a specified face)). 
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Cao in view of Zhu by including a feature extraction network including multiple layer and calculating the mean and variance of features and performing face cropping as taught by Berlin, to make an invention that can more accurately differentiate between the features need to complete a successful and clear swapped image; thus one of ordinary skilled in the art would be motivated to combine the references since there is a need to include techniques that may provide a more accurate model while reducing the time and computer resources needed to create the CGI model. (Berlin Col 7 Lines 40-46). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 

Regarding Claim 11, Cao in view of Zhu in further view of Berlin teaches the image processing method according to claim 10, wherein performing the face region cropping (Berlin Col 12 Line 59-63 disclose cropping the video to remove unneeded video content ( e.g., footage that does not include a specified face)) on the initial source image (Zhu Pg 5 ¶2 and  discloses a source image and a first image) corresponding to the first source image (Cao Pg 2 ¶04 discloses an initial facial image) comprises: 
performing face detection (Cao Pg 15 ¶08 and  disclose performing face detection) on the initial source image (Zhu Pg 5 ¶2 and  discloses a source image and a first image) corresponding to the first source image (Cao Pg 2 ¶04 discloses an initial facial image), to determine a face region (Cao Pg 16 ¶02- ¶04 and Pg 17 ¶09 discloses identifying a facial region) in the initial source image (Zhu Pg 5 ¶2 and  discloses a source image and a first image) corresponding to the first source image (Cao Pg 2 ¶04 discloses an initial facial image); 
performing, in the face region (Cao Pg 16 ¶02- ¶04 and Pg 17 ¶09 discloses identifying a facial region), face registration (Cao Pg 15 ¶08 disclose performing face registration) on the initial source image (Zhu Pg 5 ¶2 and  discloses a source image and a first image) corresponding to the first source image (Cao Pg 2 ¶04 discloses an initial facial image), to determine face key points (Cao Pg 15 ¶04 disclose key points that characterize facial features) in the initial source image (Zhu Pg 5 ¶2 and  discloses a source image and a first image) corresponding to the first source image (Cao Pg 2 ¶04 discloses an initial facial image); and 
performing cropping processing (Berlin Col 12 Line 59-63 disclose cropping the video to remove unneeded video content ( e.g., footage that does not include a specified face)) on the initial source image (Zhu Pg 5 ¶2 and  discloses a source image and a first image) corresponding to the first source image (Cao Pg 2 ¶04 discloses an initial facial image) based on the face key points (Cao Pg 15 ¶04 disclose key points that characterize facial features) , to obtain the first source image (Cao Pg 2 ¶04 discloses an initial facial image).   See Claim 10 for rationale, its parent claim. 

Regarding Claim 15, Cao in view of Zhu teaches the image processing apparatus according to claim 14, wherein the processing code (Cao Pg 2 ¶03 discloses a processor reads and executes the program)  is further configured to cause at least one of the at least one processor (Cao Pg 2 ¶03 discloses a processor reads and executes the program) 
to perform image feature extraction (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows) on the first identity swapping image  (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character)
to perform the image feature extraction (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows)  on the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) to obtain a second feature extraction result (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows),
calculate feature differences (Cao Pg 13 ¶03 discloses calculating the facial identity feature difference) 
perform summation (Cao Pg 13 ¶04 discloses performing weighted summation) of the feature differences (Cao Pg 13 ¶03 discloses calculating the facial identity feature difference) to obtain the feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function)  of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature).  
Cao in view of Zhu does not explicitly disclose obtain an image feature extraction network comprising a plurality of image feature extraction layers;
calling the image feature extraction network to obtain a first feature extraction result, the first feature extraction result comprising an identity swapping image feature extracted from each image feature extraction layer of the plurality of image feature extraction layers;
calling the image feature extraction network the second feature extraction result comprising a labeled image feature extracted from each image feature extraction layer of the plurality of image feature extraction layers 
between the identity swapping image feature and the labeled image feature that are extracted from each image feature extraction layer; and 
of the image feature extraction layers. 
 Berlin is in the same field of image analysis to produce synthesized images or video. Further, Berlin teaches obtain an image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features) comprising a plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
calling the image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features) to obtain a first feature extraction result (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features the result being eyes, mouth nose), the first feature extraction result comprising an identity swapping image feature (Berlin Col 10 Lines 50-60 disclose the facial features being used tin the facial swapping process) extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers and Col 8 Lines 50-55 disclose extracting only necessary information in the encoder layer) of the plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
calling the image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features) the second feature extraction result comprising a labeled image (Berlin Col 10 Line 54-58 disclose automatically adds text, image, and/or graphic tags (e.g., timestamps) to the video) feature extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers and Col 8 Lines 50-55 disclose extracting only necessary information in the encoder layer) of the plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
between the identity swapping image feature  (Berlin Col 10 Lines 50-60 disclose the facial features being used tin the facial swapping process) and the labeled image feature (Berlin Col 10 Line 54-58 disclose automatically adds text, image, and/or graphic tags (e.g., timestamps) to the video) that are extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers); and 
of the image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers). 
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Cao in view of Zhu by including a feature extraction network including multiple layer and calculating the mean and variance of features as taught by Berlin, to make an invention that can more accurately differentiate between the features need to complete a successful and clear swapped image; thus one of ordinary skilled in the art would be motivated to combine the references since there is a need to include techniques that may provide a more accurate model while reducing the time and computer resources needed to create the CGI model. (Berlin Col 7 Lines 40-46). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 

Regarding Claim 20, Cao in view of Zhu teaches the non-transitory computer-readable storage medium (Cao Pg 21 ¶04 discloses the memory of the computer device includes a non-volatile storage medium and an internal memory, which is a form of non-transitory media) according to claim 19, wherein determining the feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function)  of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature) based on the feature difference (Cao Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function) between the first identity swapping image (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character) and the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) comprises:
to perform image feature extraction (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows) on the first identity swapping image  (Cao Fig 9 discloses how the fusion model takes facial features on a input image and fuses them with the template to create an image using the facial features of the original on a video game or movie character),
to perform the image feature extraction (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows)  on the real labeled image (Zhu Pg 5 ¶09 and Pg 6 ¶02 and  discloses labeling the first combination as a non fake image) to obtain a second feature extraction result (Cao Pg 15 ¶04 discloses feature points from the images including eyes, nose, mouth, eyebrows),
calculating feature differences (Cao Pg 13 ¶03 discloses calculating the facial identity feature difference), 
performing summation (Cao Pg 13 ¶04 discloses performing weighted summation) of the feature differences (Cao Pg 13 ¶03 discloses calculating the facial identity feature difference) to obtain the feature reconstruction loss (Cao Pg 13 ¶04 and Pg 16 ¶10 discloses the difference in face identity features used to construct the training loss function)  of the identity swapping model (Cao Fig 10, 1003 and Pg 2 ¶01 discloses a fusion model that fuses the facial identity feature and the attribute feature to obtain the target feature).  
Cao in view of Zhu does not explicitly disclose obtaining an image feature extraction network comprising a plurality of image feature extraction layers;
calling the image feature extraction network to obtain a first feature extraction result, the first feature extraction result comprising an identity swapping image feature extracted from each image feature extraction layer of the plurality of image feature extraction layers,
calling the image feature extraction network the second feature extraction result comprising a labeled image feature extracted from each image feature extraction layer of the plurality of image feature extraction layers;
between the identity swapping image feature and the labeled image feature that are extracted from each image feature extraction layer; and
of the image feature extraction layers.
 Berlin is in the same field of image analysis to produce synthesized images or video. Further, Berlin teaches obtaining an image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features) comprising a plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
calling the image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features) to obtain a first feature extraction result (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features the result being eyes, mouth nose), the first feature extraction result comprising an identity swapping image feature (Berlin Col 10 Lines 50-60 disclose the facial features being used tin the facial swapping process) extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers and Col 8 Lines 50-55 disclose extracting only necessary information in the encoder layer) of the plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
calling the image feature extraction network (Berlin Col 10 Lines 50-55 disclose a learning engine identifying and classifying facial features) the second feature extraction result comprising a labeled image (Berlin Col 10 Line 54-58 disclose automatically adds text, image, and/or graphic tags (e.g., timestamps) to the video) feature extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers and Col 8 Lines 50-55 disclose extracting only necessary information in the encoder layer) of the plurality of image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers);
between the identity swapping image feature  (Berlin Col 10 Lines 50-60 disclose the facial features being used tin the facial swapping process) and the labeled image feature (Berlin Col 10 Line 54-58 disclose automatically adds text, image, and/or graphic tags (e.g., timestamps) to the video) that are extracted from each image feature extraction layer (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers); and
of the image feature extraction layers (Berlin Col 16 Lines 1-2 disclose this engine may have one or more hidden layers).
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Cao in view of Zhu by including a feature extraction network including multiple layer and calculating the mean and variance of features as taught by Berlin, to make an invention that can more accurately differentiate between the features need to complete a successful and clear swapped image; thus one of ordinary skilled in the art would be motivated to combine the references since there is a need to include techniques that may provide a more accurate model while reducing the time and computer resources needed to create the CGI model. (Berlin Col 7 Lines 40-46). 
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention. 

Reference Cited
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
US Patent US-8472722-B2 to Nayar et al. discloses methods, systems, and media for swapping faces in images.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RACHEL LYNN ROBERTS whose telephone number is (571)272-6413. The examiner can normally be reached Monday- Friday 7:30am- 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Oneal Mistry can be reached on (313) 446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/RACHEL L ROBERTS/Examiner, Art Unit 2674                                                                                                                                                                                                        
/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674
Read full office action
Prosecution Timeline

Jan 18, 2024
Application Filed
Jan 16, 2026
Non-Final Rejection — §103
Mar 06, 2026
Applicant Interview (Telephonic)
Mar 06, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/270,688
Patent 12581132
LARGE-SCALE POINT CLOUD-ORIENTED TWO-DIMENSIONAL REGULARIZED PLANAR PROJECTION AND ENCODING AND DECODING METHOD
2y 5m to grant Granted Mar 17, 2026
18/306,404
Patent 12569208
PET APPARATUS, IMAGE PROCESSING METHOD, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
17/968,823
Patent 12564324
IMAGE PROCESSING APPARATUS AND IMAGE PROCESSING SYSTEM FOR ABNORMALITY DETECTION
2y 5m to grant Granted Mar 03, 2026
18/216,545
Patent 12561773
METHOD AND APPARATUS FOR PROCESSING IMAGE, ELECTRONIC DEVICE, CHIP AND MEDIUM
2y 5m to grant Granted Feb 24, 2026
18/333,131
Patent 12525028
CONTACT OBJECT DETECTION APPARATUS AND NON-TRANSITORY RECORDING MEDIUM
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
90%
Grant Probability
99%
With Interview (+14.3%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 19 resolved cases by this examiner. Grant probability derived from career allow rate.