Last updated: April 19, 2026
Application No. 18/033,237
METHOD AND APPARATUS FOR RECONSTRUCTING FACE IMAGE BY USING VIDEO IDENTITY CLARIFICATION NETWORK

Final Rejection §103
Filed
Apr 21, 2023
Examiner
CROCKETT, JOSHUA BRIGHAM
Art Unit
2661
Tech Center
2600 — Communications
Assignee
Seoul National University R&Db Foundation
OA Round
2 (Final)
Interview Optional

— +27.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 18 resolved cases, 2023–2026
Examiner Intelligence

CROCKETT, JOSHUA BRIGHAM View full profile →
Grants 72% — above average
Career Allow Rate
13 granted / 18 resolved
+10.2% vs TC avg
Strong +28% interview lift
Without
With
+27.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
6.0%
-34.0% vs TC avg
§103
47.5%
+7.5% vs TC avg
§102
10.1%
-29.9% vs TC avg
§112
35.1%
-4.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 18 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. 18/033,237 (the instant application), filed on 04/21/2023.

Response to Arguments
Claims 1-3, 5, 12-14, and 18 have been amended. Claims 1-20 are pending in this action.
Applicant’s arguments, see pg. 8-11 section “Rejection of claims 1-9 and 12-18 under 35 U.S.C. 103”, filed 18 November 2025, with respect to the rejections of claims 1-9 and 12-18 under 35 U.S.C. 103 have been fully considered and are partially persuasive. The applicant argues that Li does not disclose that the face image is tracked from a series of frames of an input video. The examiner agrees. The applicant argues that Shi’s methods are different from and do not disclose the amendments to claim 1 in claim 1 lines 11-16. The examiner finds that Shi discloses:
determining a reference image of the plurality of face images ([0059] a key image frame, i.e. a reference image, is obtained from the video stream, i.e. the plurality of face images), performing inter-frame motion estimation between the reference image and at least one other face image of the plurality of face images ([0061] the face is tracked in the complete sequence from the target image frame. Tracking the face is understood as inter-frame motion estimation), combining the reference image and the at least one other face image so as to generate the reconstructed face image ([0064] the facial sequence, which includes the reference image and other images, is used to obtain a super-resolution facial image. As plural images are the input and the output is a singular super-resolution facial image, it is understood that the input images are in some way combined);
However, Shi does not disclose expressly warping between the reference image and at least one other face image of the plurality of face images. Therefore, the rejection has been withdrawn. However, upon further consideration, a new grounds of rejection is made under 35 U.S.C. 103 in view of Wheeler et al. ("Multi-Frame Super-Resolution for Face Recognition", full reference on the PTO-892 included in this action; hereafter, Wheeler).
Wheeler discloses:
warping between the reference image and at least one other face image of the plurality of face images (pg. 3 col. 1 para. 1, the images are warped to a mean face shape which is understood as the reference image),
The full rejection including motivations to combine is included below in the section “Claim Rejections - 35 USC § 103”. Therefore, claim 1 remains rejected under 35 U.S.C. 103. Claim 12 recites subject matter substantially similar to claim 1 and is likewise rejected. Claims 2-9 and 13-18 depend on claims 1 and 12 respectively and likewise remain rejected.
Applicant’s arguments, see pg. 11-12 section “Rejection of claims 10, 11, 19 and 20 under 35 U.S.C. 103”, filed 18 November 2025, with respect to the rejections of claims 10, 11, 19, and 20 under 35 U.S.C. 103 have been fully considered and, in light of the above response to arguments of claims 1-9 and 12-18, are not persuasive. Therefore, claims 10, 11, 19, and 20 remain rejected under 35 U.S.C. 103.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-9 and 12-18 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. ("Learning Face Image Super-Resolution Through Facial Semantic Attribute Transformation and Self-Attentive Structure Enhancement" the full reference is contained on the PTO-892 included in this action; hereafter, Li) in view of Shi et al. (CN 111488779 A; hereafter, Shi) in further view of Wheeler et al. ("Multi-Frame Super-Resolution for Face Recognition", full reference on the PTO-892 included in this action; hereafter, Wheeler).
Regarding claim 1, Li discloses:
A face image reconstruction method executed by a face image reconstruction device comprising a processor (pg. 476 col. 2 para. 4, the method is performed on a GPU which is a processor),

    PNG
    media_image1.png
    36
    398
    media_image1.png
    Greyscale
 
the method comprising: acquiring training data (pg. 476 col. 2 para. 5, training data is acquired from large databases);

    PNG
    media_image2.png
    54
    348
    media_image2.png
    Greyscale

and training a video identity clarification model (video identity clarification network (VICN)) on the basis of the training data (pg. 471 col. 1 para. 2 and Fig. 3, the model this method is based around generates high resolution face images from low resolution images and is therefore understood to clarify the identity. Pg. 476 col. 1 para. 4, the model is trained),

    PNG
    media_image3.png
    86
    384
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    92
    388
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    692
    520
    media_image5.png
    Greyscale

wherein the training comprises: generating, by executing a generator of the video identity clarification model, a reconstructed face image (it is commonly understood in the art that a model is trained by operating the model and then adjusting factors within the model according to some loss function. Therefore, references which describe the operating of a model are understood to include the operation during training of a model. pg. 471 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, a "hallucinated" face is generated which is understood as a reconstructed face image) in which identity of a face shown in the at least one face image has been clarified (pg. 471 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, the hallucinated face is an optimized attempt to approximate the high resolution face image which is understood as the face image being clarified),

    PNG
    media_image6.png
    36
    394
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    40
    380
    media_image7.png
    Greyscale

and discriminating the reconstructed face image on the basis of the ground truth face image by executing a discriminator of the video identity clarification model (pg. 472 col. 1 para. 1 and Fig. 3, several discriminators are run comparing the generated image against the ground truth image),

    PNG
    media_image8.png
    220
    384
    media_image8.png
    Greyscale

which is in a generative adversarial network (GAN) competition relationship with the generator (pg. 471 col. 2 para. 2, the generator is in a relationship with the discriminator).

    PNG
    media_image9.png
    72
    380
    media_image9.png
    Greyscale

Li does not disclose expressly acquiring training data comprising a plurality of face images tracked from a series of frames of an input video, acquiring ground truth images for the plurality of face images, determining a reference image of the plurality of face images, performing inter-frame motion estimation between the reference image and at least one other face image, and combining the reference image and the at least one other face image so as to generate the reconstructed face image.
Shi discloses:
acquiring training data comprising a plurality of face images ([0058] at least two images containing a face are extracted from a video which is a plurality of face images) tracked from a series of frames of an input video ([0060] the face image is tracked through continuous frames, [0061] such as a video) and a ground truth face image for the plurality of face images ([0069] the video data used includes high definition videos, i.e. ground-truth, and blurred videos. [0073] generate a facial feature pair between images of two resolutions, high and low, see [0071]. [0074] Use the feature pair to train the neural network. Therefore, the high definition video is understood as ground truth face images);
determining a reference image of the plurality of face images ([0059] a key image frame, i.e. a reference image, is obtained from the video stream, i.e. the plurality of face images), performing inter-frame motion estimation between the reference image and at least one other face image of the plurality of face images ([0061] the face is tracked in the complete sequence from the target image frame. Tracking the face is understood as inter-frame motion estimation), combining the reference image and the at least one other face image so as to generate the reconstructed face image ([0064] the facial sequence, which includes the reference image and other images, is used to obtain a super-resolution facial image. As plural images are the input and the output is a singular super-resolution facial image, it is understood that the input images are in some way combined);
Li and Shi are combinable because they are from the same field of endeavor of improving the resolution of face images (Li, pg. 468 col. 2 para. 2; Shi, [0006]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection comprising a plurality of images of Shi with the invention of Li.
The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine using the reference face image operations of Shi with the invention of Li.
The motivation for doing so would have been “the target image frame [i.e. reference image] contains complete image information of the face to be recognized” (Shi, [0059]).
Therefore, it would have been obvious to combine Shi with Li.
Li in view of Shi does not disclose expressly warping between the reference image and at least one other face image of the plurality of face images. 
Wheeler discloses:
warping between the reference image and at least one other face image of the plurality of face images (pg. 3 col. 1 para. 1, the images are warped to a mean face shape which is understood as the reference image),
Wheeler is combinable with Li in view of Shi because it is from the same field of endeavor of facial image super-resolution (Wheeler, pg. 1 col. 1 para. 3).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the warping of Wheeler with the invention of Li in view of Shi.
The motivation for doing so would have been "The warping scales up and aligns each face image" (Wheeler, pg. 4 col. 1 para. 1).
Therefore, it would have been obvious to combine Wheeler with Li in view of Shi to obtain the invention as specified in claim 1.
Regarding claim 2, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1.
Li does not disclose expressly that the one face image is extracted from a series of frames based on face feature point information.
Shi discloses: 
wherein the acquiring of the training data comprises extracting the plurality of face images from the series of frames ([0058] at least two images containing a face are extracted from a video which is a plurality of face images), based on face feature point tracking information ([0063] facial key points are identified in the image frames) between successive frames of the series of frames (the above calculation of facial feature points is performed on the [0061] sequence of frames containing the face which is understood as successive frames).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection of Shi with the invention of Li.
The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]).
Therefore, it would have been obvious to combine Shi with Li to obtain the invention as specified in claim 2.
Regarding claim 3, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1.
Li further discloses:
wherein the generating of the reconstructed face image comprises executing a multi- frame face resolution enhancer of the generator (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, portions of the Attribute Transformation Network may be understood as a multi-frame face resolution enhancer) so as to generate an intermediate-reconstructed face image (pg. 472 col. 1 para. 1 and Fig. 4a, an intermediate high resolution face image is generated which is understood as the reconstructed face. It is generated from the low resolution image which may be understood as at least one face image).

    PNG
    media_image10.png
    128
    388
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    86
    380
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    168
    724
    media_image12.png
    Greyscale

Li does not disclose expressly to generate an intermediate-reconstructed face image from the plurality of face images.
Shi discloses:
to generate an intermediate-reconstructed face image from the plurality of face images ([0064] the reconstructed image is generated from a model which may be understood as a face resolution enhancer. The model receives as input a facial feature point heat map. [0063] the facial feature point heat map is a probability distribution graph. As this is generated after the input step of the process and before the reconstruction step and as a graph is an image this is understood as an intermediate image and is generated from a sequence of images, see [0061]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the generation of the intermediate image from the plurality of images of Shi with the invention of Li.
The motivation for doing so would have been that "It is understandable that the face contained in the video image may be one face or multiple faces. Therefore, it is necessary to track the image information of the face to be identified in the target image frame and obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). In other words, because there may be more than one face present, it is necessary to generate the intermediate image from a tracked sequence of the target face to avoid identifying plural faces as the target face.
Therefore, it would have been obvious to combine Shi with Li to obtain the invention as specified in claim 3.
Regarding claim 4, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 3.
Li further discloses:
wherein the generating of the reconstructed face image comprises: executing a face landmark estimator of the generator (pg. 474 col. 1 para. 2 and Fig. 5 and Fig. 7, the Boundary Extraction Unit is understood as a landmark estimator)

    PNG
    media_image13.png
    120
    380
    media_image13.png
    Greyscale


    PNG
    media_image14.png
    292
    768
    media_image14.png
    Greyscale


    PNG
    media_image15.png
    214
    362
    media_image15.png
    Greyscale

so as to estimate multiple face landmarks on the basis of the intermediate-reconstructed face image (pg. 474 col. 1 para. 2, the Boundary Extraction Unit estimates facial boundaries which under the broadest reasonable interpretation may be understood as estimates of multiple face landmarks);

    PNG
    media_image16.png
    120
    380
    media_image16.png
    Greyscale

and executing a face upsampler of the generator so as to upsample the intermediate-reconstructed face image (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler because it upsamples the facial boundary heatmaps which are extracted from the intermediate image) by using the multiple face landmarks (pg. 474 col. 2 para. 2 and Fig. 8, the facial boundary heatmaps are upsampled which is understood as upsampling by using the multiple landmarks).

    PNG
    media_image17.png
    172
    384
    media_image17.png
    Greyscale


    PNG
    media_image18.png
    206
    362
    media_image18.png
    Greyscale

Regarding claim 5, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 4.
Li further discloses:
wherein: the generating of the reconstructed face image further comprises generating an intermediate image (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, an intermediate high resolution face image is generated),

    PNG
    media_image10.png
    128
    388
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    86
    380
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    168
    724
    media_image12.png
    Greyscale

which is the intermediate-reconstructed face image having enhanced resolution (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, the intermediate image is mapped to the same item as the intermediate-reconstructed face image and has an enhanced resolution as shown by upsampling the low resolution image to the same size as the target image), by using an intermediate image generator comprising multiple residual blocks (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, portions of the Attribute Transformation Network may be understood as an intermediate image generator and comprises multiple residual blocks);
wherein the estimating comprises estimating the multiple face landmarks (pg. 474 col. 1 para. 2, the Boundary Extraction Unit estimates facial boundaries which under the broadest reasonable interpretation may be understood as estimates of multiple face landmarks)

    PNG
    media_image16.png
    120
    380
    media_image16.png
    Greyscale

on the basis of the intermediate image (Fig. 5, the Boundary Extraction Unit receives as input the intermediate image);

    PNG
    media_image19.png
    288
    770
    media_image19.png
    Greyscale

and wherein the upsampling comprises upsampling the intermediate image (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler because it upsamples the facial boundary heatmaps which are extracted from the intermediate image) by using the estimated multiple face landmarks (pg. 474 col. 2 para. 2 and Fig. 8, the facial boundary heatmaps are upsampled which is understood as upsampling by using the multiple landmarks).

    PNG
    media_image17.png
    172
    384
    media_image17.png
    Greyscale


    PNG
    media_image18.png
    206
    362
    media_image18.png
    Greyscale

Regarding claim 6, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1.
Li discloses:
wherein the training of the video identity clarification model (pg. 475 col. 2 para. 6, Heatmap loss is considered and compares the boundary features of the output image and the ground truth image showing that this is used in training the model) further comprises extracting, by executing a face feature extractor of the video identity clarification model (pg. 474 col. 1 para. 2 and Fig. 5 and Fig. 7, the Boundary Extraction Unit is understood as a face feature extractor),

    PNG
    media_image13.png
    120
    380
    media_image13.png
    Greyscale


    PNG
    media_image14.png
    292
    768
    media_image14.png
    Greyscale


    PNG
    media_image15.png
    214
    362
    media_image15.png
    Greyscale

a feature map of the reconstructed face image (pg. 475 col. 2 para. 3, the output of the SE-Net, including the Boundary Extraction Unit, is a facial boundary heatmap which is understood as the feature map of the reconstructed face image)

    PNG
    media_image20.png
    90
    376
    media_image20.png
    Greyscale

and a feature map of the ground truth face image (pg. 477 col. 1 para. 2, the boundary map, i.e. face features, of the ground truth image are determined).

    PNG
    media_image21.png
    212
    382
    media_image21.png
    Greyscale

Regarding claim 7, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1.
Li further discloses:
wherein the training of the video identity clarification model further comprises: calculating a training objective function (training loss function) (pg. 473 col. 2 para. 3, a total training loss function is determined);

    PNG
    media_image22.png
    126
    382
    media_image22.png
    Greyscale

and alternately training the generator and the discriminator (pg. 473 col. 2 para. 3, the total loss considers individually the reconstruction loss and the adversarial loss and then combines them which is understood as alternatively training the generator and the discriminator) so as to minimize a function value of the training objective function (it is commonly understood in the art to minimize the value of a loss function when training. For example, in training the adversarial loss, Li expressly states in pg. 473 col. 1 para. 5 that the function is minimized).
Regarding claim 8, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 7.
Li further discloses:
wherein the training objective function comprises: a first objective function comprising a GAN loss function for the generator (pg. 473 col. 1 para. 5, the reconstruction loss is on the generator);

    PNG
    media_image23.png
    126
    386
    media_image23.png
    Greyscale

and a second objective function based on a GAN loss function for the discriminator (pg. 473 col. 1 para. 6, the adversarial loss considers the function of the discriminator).

    PNG
    media_image24.png
    316
    382
    media_image24.png
    Greyscale

Regarding claim 9, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 8.
Li further discloses:
wherein the first objective function comprises a pixel reconstruction accuracy function between the reconstructed face image and the ground truth face image (pg. 473 col. 1 para. 5, the reconstruction loss is pixel wise and compares the generated image to the ground truth image),

    PNG
    media_image23.png
    126
    386
    media_image23.png
    Greyscale

an estimation accuracy function of face landmarks estimated during the generating of the reconstructed face image (pg. 473 col. 2 para. 2, the attribute loss considers the accuracy of attributes. Attributes may be understood as facial landmarks),

    PNG
    media_image25.png
    172
    384
    media_image25.png
    Greyscale

and a face feature similarity function between the reconstructed face image and the ground truth face image (pg. 475 col. 2 last para., the heatmap loss is considered to compare face features between the generated face image and the ground truth face image).

    PNG
    media_image26.png
    154
    380
    media_image26.png
    Greyscale


Regarding claim 12, Li discloses:
A face image reconstruction device comprising: a memory (pg. 476 col. 2 para. 4, the workstation includes a memory)

    PNG
    media_image1.png
    36
    398
    media_image1.png
    Greyscale

configured to store a video identity clarification model (pg. 471 col. 1 para. 2, the model this method is based around generates high resolution face images from low resolution images and is therefore understood to clarify the identity. A person of ordinary skill in the art would understand that the model would be stored on the memory)

    PNG
    media_image3.png
    86
    384
    media_image3.png
    Greyscale

comprising a generator and a discriminator which is in a generative adversarial network competition relationship with the generator (pg. 471 col. 2 para. 2, the generator is in a relationship with the discriminator);

    PNG
    media_image9.png
    72
    380
    media_image9.png
    Greyscale
 
and a processor (pg. 476 col. 2 para. 4, the method is performed on a GPU which is a processor)

    PNG
    media_image1.png
    36
    398
    media_image1.png
    Greyscale
 
configured to execute training of the video identity clarification model (pg. 471 col. 1 para. 2 and Fig. 3, the model this method is based around generates high resolution face images from low resolution images and is therefore understood to clarify the identity. Pg. 476 col. 1 para. 4, the model is trained)

    PNG
    media_image3.png
    86
    384
    media_image3.png
    Greyscale


    PNG
    media_image4.png
    92
    388
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    692
    520
    media_image5.png
    Greyscale

on the basis of training data (pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, faces are cropped from images. Face images are downsampled as the low resolution image which is understood as the at least one face image),
wherein the processor is configured, in order to perform the executing of the training, to: generate, by executing the generator, a reconstructed face image (it is commonly understood in the art that a model is trained by operating the model and then adjust factors within the model according to some loss function. Therefore, references which describe the operating of a model are understood to include the operation during training of a model. pg. 472 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, a "hallucinated" face is generated which is understood as a reconstructed face image) in which identity of a face shown in the at least one face image has been clarified (pg. 472 col. 2 last para. through pg. 473 col. 1 para. 1 and Fig. 3, the hallucinated face is an optimized attempt to approximate the high resolution face image which is understood as the face image being clarified);

    PNG
    media_image6.png
    36
    394
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    40
    380
    media_image7.png
    Greyscale

and discriminate the reconstructed face image on the basis of the ground truth face image by executing the discriminator (pg. 472 col. 1 para. 1 and Fig. 3, several discriminators are run comparing the generated image against the ground truth image).

    PNG
    media_image8.png
    220
    384
    media_image8.png
    Greyscale

Li does not disclose expressly acquiring training data comprising a plurality of face images tracked from a series of frames of an input video, acquiring ground truth images for the plurality of face images, determining a reference image of the plurality of face images, performing inter-frame motion estimation between the reference image and at least one other face image, and combining the reference image and the at least one other face image so as to generate the reconstructed face image.
Shi discloses:
a face image tracked from a series of frames of an input video ([0060] the face image is tracked through continuous frames, [0061] such as a video)
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection of Shi with the invention of Li.
The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]).
Therefore, it would have been obvious to combine Shi with Li.
training data comprising a plurality of face images ([0058] at least two images containing a face are extracted from a video which is a plurality of face images) tracked from a series of frames of an input video ([0060] the face image is tracked through continuous frames, [0061] such as a video) and a ground truth face image for the plurality of face images ([0069] the video data used includes high definition videos, i.e. ground-truth, and blurred videos. [0073] generate a facial feature pair between images of two resolutions, high and low, see [0071]. [0074] Use the feature pair to train the neural network. Therefore, the high definition video is understood as ground truth face images);
determining a reference image of the plurality of face images ([0059] a key image frame, i.e. a reference image, is obtained from the video stream, i.e. the plurality of face images), performing inter-frame motion estimation between the reference image and at least one other face image of the plurality of face images ([0061] the face is tracked in the complete sequence from the target image frame. Tracking the face is understood as inter-frame motion estimation), combining the reference image and the at least one other face image so as to generate the reconstructed face image ([0064] the facial sequence, which includes the reference image and other images, is used to obtain a super-resolution facial image. As plural images are the input and the output is a singular super-resolution facial image, it is understood that the input images are in some way combined);
Li and Shi are combinable because they are from the same field of endeavor of improving the resolution of face images (Li, pg. 468 col. 2 para. 2; Shi, [0006]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection comprising a plurality of images of Shi with the invention of Li.
The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine using the reference face image operations of Shi with the invention of Li.
The motivation for doing so would have been “the target image frame [i.e. reference image] contains complete image information of the face to be recognized” (Shi, [0059]).
Therefore, it would have been obvious to combine Shi with Li.
Li in view of Shi does not disclose expressly warping between the reference image and at least one other face image of the plurality of face images. 
Wheeler discloses:
warping between the reference image and at least one other face image of the plurality of face images (pg. 3 col. 1 para. 1, the images are warped to a mean face shape which is understood as the reference image),
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the warping of Wheeler with the invention of Li in view of Shi.
The motivation for doing so would have been "The warping scales up and aligns each face image" (Wheeler, pg. 4 col. 1 para. 1).
Therefore, it would have been obvious to combine Wheeler with Li in view of Shi to obtain the invention as specified in claim 12.
Regarding claim 13, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12.
Li further discloses:
wherein the processor is configured to acquire the training data (pg. 476 col. 2 para. 5, training data is acquired from large databases),

    PNG
    media_image2.png
    54
    348
    media_image2.png
    Greyscale

Li does not disclose expressly that the one face image is extracted from the plurality of face images series of frames based on face feature point information.
Shi discloses:
and the processor is configured, in order to acquire the training data, to extract the plurality of face images from the series of frames ([0058] at least two images containing a face are extracted from a video which is a plurality of face images), based on face feature point tracking information ([0063] facial key points are identified in the image frames) between successive frames of the series of frames (the above calculation of facial feature points is performed on the [0061] sequence of frames containing the face which is understood as successive frames).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection of Shi with the invention of Li.
The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]).
Therefore, it would have been obvious to combine Shi with Li to obtain the invention as specified in claim 13.
Regarding claim 14, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12.
Li further discloses:
wherein the generator comprises a multi-frame face resolution enhancer (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, portions of the Attribute Transformation Network may be understood as a multi-frame face resolution enhancer), and the processor is configured, in order to perform the generating of the reconstructed face image, to generate an intermediate-reconstructed face image from the at least one face image (pg. 472 col. 1 para. 1 and Fig. 4a, an intermediate high resolution face image is generated which is understood as the reconstructed face. It is generated from the low resolution image which may be understood as at least one face image) by executing the multi-frame face resolution enhancer.

    PNG
    media_image10.png
    128
    388
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    86
    380
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    168
    724
    media_image12.png
    Greyscale

Li does not disclose expressly to generate an intermediate-reconstructed face image from the plurality of face images.
Shi discloses:
to generate an intermediate-reconstructed face image from the plurality of face images ([0064] the reconstructed image is generated from a model which may be understood as a face resolution enhancer. The model receives as input a facial feature point heat map. [0063] the facial feature point heat map is a probability distribution graph. As this is generated after the input step of the process and before the reconstruction step and as a graph is an image this is understood as an intermediate image and is generated from a sequence of images, see [0061]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the generation of the intermediate image from the plurality of images of Shi with the invention of Li.
The motivation for doing so would have been that "It is understandable that the face contained in the video image may be one face or multiple faces. Therefore, it is necessary to track the image information of the face to be identified in the target image frame and obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). In other words, because there may be more than one face present, it is necessary to generate the intermediate image from a tracked sequence of the target face to avoid identifying plural faces as the target face.
Therefore, it would have been obvious to combine Shi with Li to obtain the invention as specified in claim 14.
Regarding claim 15, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 14.
Li further discloses:
wherein the generator further comprises a face landmark estimator (pg. 474 col. 1 para. 2 and Fig. 5 and Fig. 7, the Boundary Extraction Unit is understood as a landmark estimator)

    PNG
    media_image13.png
    120
    380
    media_image13.png
    Greyscale


    PNG
    media_image14.png
    292
    768
    media_image14.png
    Greyscale


    PNG
    media_image15.png
    214
    362
    media_image15.png
    Greyscale

 and a face upsampler (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler), and the processor is configured, in order to perform the generating of the reconstructed face image, to: execute the face landmark estimator so as to estimate multiple face landmarks on the basis of the intermediate-reconstructed face image (pg. 474 col. 1 para. 2, the Boundary Extraction Unit estimates facial boundaries which under the broadest reasonable interpretation may be understood as estimates of multiple face landmarks);

    PNG
    media_image16.png
    120
    380
    media_image16.png
    Greyscale

and execute the face upsampler so as to upsample the intermediate-reconstructed face image (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler because it upsamples the facial boundary heatmaps which are extracted from the intermediate image) by using the multiple face landmarks (pg. 474 col. 2 para. 2 and Fig. 8, the facial boundary heatmaps are upsampled which is understood as upsampling by using the multiple landmarks).

    PNG
    media_image17.png
    172
    384
    media_image17.png
    Greyscale


    PNG
    media_image18.png
    206
    362
    media_image18.png
    Greyscale

Regarding claim 16, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 15.
Li further discloses:
wherein the generator further comprises an intermediate image generator comprising multiple residual blocks (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, portions of the Attribute Transformation Network may be understood as an intermediate image generator and comprises multiple residual blocks), and the processor is configured, in order to perform the generating of the reconstructed face image, to: generate an intermediate image (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, an intermediate high resolution face image is generated), which is the  intermediate-reconstructed face image having enhanced resolution (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, the intermediate image is mapped to the same item as the intermediate-reconstructed face image and has an enhanced resolution as shown by upsampling the low resolution image to the same size as the target image), by using the intermediate image generator;

    PNG
    media_image10.png
    128
    388
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    86
    380
    media_image11.png
    Greyscale


    PNG
    media_image12.png
    168
    724
    media_image12.png
    Greyscale

	execute the face landmark estimator so as to estimate the multiple face landmarks (pg. 474 col. 1 para. 2, the Boundary Extraction Unit estimates facial boundaries which under the broadest reasonable interpretation may be understood as estimates of multiple face landmarks)

    PNG
    media_image16.png
    120
    380
    media_image16.png
    Greyscale

on the basis of the intermediate image (Fig. 5, the Boundary Extraction Unit receives as input the intermediate image);

    PNG
    media_image19.png
    288
    770
    media_image19.png
    Greyscale

and execute the face upsampler so as to upsample the intermediate image (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler because it upsamples the facial boundary heatmaps which are extracted from the intermediate image) by using the multiple face landmarks estimated based on the intermediate image (pg. 474 col. 2 para. 2 and Fig. 8, the facial boundary heatmaps are upsampled which is understood as upsampling by using the multiple landmarks).

    PNG
    media_image17.png
    172
    384
    media_image17.png
    Greyscale


    PNG
    media_image18.png
    206
    362
    media_image18.png
    Greyscale

Regarding claim 17, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12.
Li further discloses:
wherein the video identity clarification model further comprises a face feature extractor (pg. 474 col. 1 para. 2 and Fig. 5 and Fig. 7, the Boundary Extraction Unit is understood as a face feature extractor), 

    PNG
    media_image13.png
    120
    380
    media_image13.png
    Greyscale


    PNG
    media_image14.png
    292
    768
    media_image14.png
    Greyscale


    PNG
    media_image15.png
    214
    362
    media_image15.png
    Greyscale

and the processor is configured, in order to perform the executing of the training (pg. 475 col. 2 para. 6, Heatmap loss is considered and compares the boundary features of the output image and the ground truth image showing that this is used in training the model) to extract a feature map of the reconstructed face image (pg. 475 col. 2 para. 3, the output of the SE-Net, including the Boundary Extraction Unit, is a facial boundary heatmap which is understood as the feature map of the reconstructed face image)

    PNG
    media_image20.png
    90
    376
    media_image20.png
    Greyscale

and a feature map of the ground truth face image by executing a face feature extractor (pg. 477 col. 1 para. 2, the boundary map, i.e. face features, of the ground truth image are determined).

    PNG
    media_image21.png
    212
    382
    media_image21.png
    Greyscale

Regarding claim 18, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12.
Li further discloses:
wherein: the processor is configured, in order to perform the executing of the training, to calculate a training objective function (pg. 473 col. 2 para. 3, a total training loss function is determined), and alternately train the generator and the discriminator (pg. 473 col. 2 para. 3, the total loss considers individually the reconstruction loss and the adversarial loss and then combines them which is understood as alternatively training the generator and the discriminator)

    PNG
    media_image22.png
    126
    382
    media_image22.png
    Greyscale

so as to minimize a function value of the training objective function (it is commonly understood in the art to minimize the value of a loss function when training. For example, in training the adversarial loss, Li expressly states in pg. 473 col. 1 para. 5 that the function is minimized);
the training objective function comprises a first objective function comprising a GAN loss function for the generator (pg. 473 col. 1 para. 5, the reconstruction loss is on the generator)

    PNG
    media_image23.png
    126
    386
    media_image23.png
    Greyscale

and a second objective function based on a GAN loss function for the discriminator (pg. 473 col. 1 para. 6, the adversarial loss considers the function of the discriminator);

    PNG
    media_image24.png
    316
    382
    media_image24.png
    Greyscale

and the first objective function comprises a pixel reconstruction accuracy function between the reconstructed face image and the ground truth face image (pg. 473 col. 1 para. 5, the reconstruction loss is pixel wise and compares the generated image to the ground truth image),

    PNG
    media_image23.png
    126
    386
    media_image23.png
    Greyscale

an estimation accuracy function of face landmarks estimated during the generating of the reconstructed face image (pg. 473 col. 2 para. 2, the attribute loss considers the accuracy of attributes. Attributes may be understood as facial landmarks),

    PNG
    media_image25.png
    172
    384
    media_image25.png
    Greyscale

and a face feature similarity function between the reconstructed face image and the ground truth face image (pg. 475 col. 2 last para., the heatmap loss is considered to compare face features between the generated face image and the ground truth face image).

    PNG
    media_image26.png
    154
    380
    media_image26.png
    Greyscale


Claims 10-11 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. ("Learning Face Image Super-Resolution Through Facial Semantic Attribute Transformation and Self-Attentive Structure Enhancement" the full reference is contained on the PTO-892 included in this action; hereafter, Li) in view of Shi et al. (CN 111488779 A; hereafter, Shi) in further view of Wheeler et al. ("Multi-Frame Super-Resolution for Face Recognition", full reference on the PTO-892 included in this action; hereafter, Wheeler) and of Prince et al. (US 20220058438 A1; hereafter, Prince).
Regarding claim 10, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1. 
Lin in view of Shi in further view of Wheeler does not disclose expressly a second training of fine-tuning the model based on second training data.
Prince discloses:
executing second training of fine-tuning the video identity clarification model ([0053] the EDSR model is pre-trained and fine-tuned, i.e. second training. The EDSR [0044] is a deep residual network for image super resolution. When considered in combination with Li, it is understood as an identity clarification model), based on second training data comprising at least one face image of a search target (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data, i.e. second training data. [0053] the image data includes an aliased low resolution image which is the target low resolution image with aliasing added. When considered in combination with Li, it is understood that the process represented here with an aliased low resolution image may be performed with a down sampled face image, Li pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, i.e. a face image of a search target. The reason for this is that the aliased low resolution image performs the same function as the face image that is downsampled from full resolution of Li, pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, because both are an edited original or target image which are edited in order to train a network to reconstruct the original or target image) and a reference face image for the at least one face image of the search target (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data. [0053] the image data includes a low resolution image without aliasing which is understood as the reference image).
Prince is combinable with Li in view of Shi in further view of Wheeler because it is solving the same problem of processing images to improve resolution (Prince, [0002]).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the second training of Prince with the invention of Li in view of Shi in further view of Wheeler.
The motivation for doing so would have been “fine-tuning a pre-trained model is accurate and fast “ (Prince, [0053]).
Therefore, it would have been obvious to combine Prince with Li in view of Shi in further view of Wheeler to obtain the invention as specified in claim 10.
Regarding claim 11, Li in view of Shi in further view of Wheeler and Prince discloses the subject matter of claim 10.
Li further discloses:
wherein the executing of the second training comprises executing the generating and the discriminating of the reconstructed face image (it is commonly understood in the art that a model is trained by operating the model and then adjusting factors within the model according to some loss function. Therefore, references which describe the operating of a model are understood to include the operation during training of a model. pg. 472 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, a "hallucinated" face is generated and discriminated which is understood as a reconstructed face image. Performing a second training is understood as repeating the training process which the system may perform),
Li in view of Shi in further view of Wheeler does not disclose a second training data.
Prince discloses: 
training based on the second training data (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data, therefore, the second training is performed based on second training data).
Regarding claim 19, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12. 
Lin in view of Shi in further view of Wheeler does not disclose expressly a second training of fin-tuning the model based on second training data.
Prince discloses:
execute second training of fine-tuning the video identity clarification model ([0053] the EDSR model is pre-trained and fine-tuned, i.e. second training. The EDSR [0044] is a deep residual network for image super resolution. When considered in combination with Li, it is understood as an identity clarification model), based on second training data comprising at least one face image of a search target (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data, i.e. second training data. [0053] the image data includes an aliased low resolution image which is the target low resolution image with aliasing added. When considered in combination with Li, it is understood that the process represented here with an aliased low resolution image may be performed with a down sampled face image, Li pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, i.e. a face image of a search target. The reason for this is that the aliased low resolution image performs the same function as the face image that is downsampled from full resolution of Li, pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, because both are an edited original or target image which are edited in order to train a network to reconstruct the original or target image) and a reference face image for the at least one face image of the search target (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data. [0053] the image data includes a low resolution image without aliasing which is understood as the reference image).
It would have been It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the second training of Prince with the invention of Li in view of Shi in further view of Wheeler.
The motivation for doing so would have been “fine-tuning a pre-trained model is accurate and fast “ (Prince, [0053]).
Therefore, it would have been obvious to combine Prince with Li in view of Shi in further view of Wheeler to obtain the invention as specified in claim 19.
Regarding claim 20, Li in view of Shi in further view of Wheeler and Prince discloses the subject matter of claim 19.
Li further discloses:
wherein the executing of the second training comprises executing the generating and the discriminating of the reconstructed face image (it is commonly understood in the art that a model is trained by operating the model and then adjusting factors within the model according to some loss function. Therefore, references which describe the operating of a model are understood to include the operation during training of a model. pg. 472 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, a "hallucinated" face is generated and discriminated which is understood as a reconstructed face image. Performing a second training is understood as repeating the training process which the system may perform),
Li in view of Shi in further view of Wheeler does not disclose a second training data.
Prince discloses: 
training based on the second training data (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data, therefore, the second training is performed based on second training data).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
CN 111563427 A, Ning et al., discloses a system which performs aligning of facial images which may be understood as warping so that it can improve the resolution of facial images.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSHUA B CROCKETT whose telephone number is (571)270-7989. The examiner can normally be reached Monday-Thursday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John M Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JOSHUA B. CROCKETT/Examiner, Art Unit 2661                                                                                                                                                                                                        /JOHN VILLECCO/Supervisory Patent Examiner, Art Unit 2661
Read full office action
Prosecution Timeline

Apr 21, 2023
Application Filed
Aug 19, 2025
Non-Final Rejection — §103
Nov 18, 2025
Response Filed
Jan 08, 2026
Final Rejection — §103
Apr 07, 2026
Request for Continued Examination
Apr 13, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/199,017
Patent 12592060
ARTIFICIAL INTELLIGENCE DEVICE AND 3D AGENCY GENERATING METHOD THEREOF
2y 5m to grant Granted Mar 31, 2026
17/925,201
Patent 12587704
VIDEO DATA TRANSMISSION AND RECEPTION METHOD USING HIGH-SPEED INTERFACE, AND APPARATUS THEREFOR
2y 5m to grant Granted Mar 24, 2026
17/811,329
Patent 12567150
EDITING PRESEGMENTED IMAGES AND VOLUMES USING DEEP LEARNING
2y 5m to grant Granted Mar 03, 2026
18/170,040
Patent 12561839
SYSTEMS AND METHODS FOR CALIBRATING IMAGE SENSORS OF A VEHICLE
2y 5m to grant Granted Feb 24, 2026
17/999,990
Patent 12529639
METHOD FOR ESTIMATING HYDROCARBON SATURATION OF A ROCK
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
99%
With Interview (+27.5%)
3y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 18 resolved cases by this examiner. Grant probability derived from career allow rate.