Prosecution Insights
Last updated: April 19, 2026
Application No. 18/033,237

METHOD AND APPARATUS FOR RECONSTRUCTING FACE IMAGE BY USING VIDEO IDENTITY CLARIFICATION NETWORK

Final Rejection §103
Filed
Apr 21, 2023
Examiner
CROCKETT, JOSHUA BRIGHAM
Art Unit
2661
Tech Center
2600 — Communications
Assignee
Seoul National University R&Db Foundation
OA Round
2 (Final)
72%
Grant Probability
Favorable
3-4
OA Rounds
3y 0m
To Grant
99%
With Interview

Examiner Intelligence

Grants 72% — above average
72%
Career Allow Rate
13 granted / 18 resolved
+10.2% vs TC avg
Strong +28% interview lift
Without
With
+27.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
26 currently pending
Career history
44
Total Applications
across all art units

Statute-Specific Performance

§101
6.0%
-34.0% vs TC avg
§103
47.5%
+7.5% vs TC avg
§102
10.1%
-29.9% vs TC avg
§112
35.1%
-4.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 18 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Priority Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. 18/033,237 (the instant application), filed on 04/21/2023. Response to Arguments Claims 1-3, 5, 12-14, and 18 have been amended. Claims 1-20 are pending in this action. Applicant’s arguments, see pg. 8-11 section “Rejection of claims 1-9 and 12-18 under 35 U.S.C. 103”, filed 18 November 2025, with respect to the rejections of claims 1-9 and 12-18 under 35 U.S.C. 103 have been fully considered and are partially persuasive. The applicant argues that Li does not disclose that the face image is tracked from a series of frames of an input video. The examiner agrees. The applicant argues that Shi’s methods are different from and do not disclose the amendments to claim 1 in claim 1 lines 11-16. The examiner finds that Shi discloses: determining a reference image of the plurality of face images ([0059] a key image frame, i.e. a reference image, is obtained from the video stream, i.e. the plurality of face images), performing inter-frame motion estimation between the reference image and at least one other face image of the plurality of face images ([0061] the face is tracked in the complete sequence from the target image frame. Tracking the face is understood as inter-frame motion estimation), combining the reference image and the at least one other face image so as to generate the reconstructed face image ([0064] the facial sequence, which includes the reference image and other images, is used to obtain a super-resolution facial image. As plural images are the input and the output is a singular super-resolution facial image, it is understood that the input images are in some way combined); However, Shi does not disclose expressly warping between the reference image and at least one other face image of the plurality of face images. Therefore, the rejection has been withdrawn. However, upon further consideration, a new grounds of rejection is made under 35 U.S.C. 103 in view of Wheeler et al. ("Multi-Frame Super-Resolution for Face Recognition", full reference on the PTO-892 included in this action; hereafter, Wheeler). Wheeler discloses: warping between the reference image and at least one other face image of the plurality of face images (pg. 3 col. 1 para. 1, the images are warped to a mean face shape which is understood as the reference image), The full rejection including motivations to combine is included below in the section “Claim Rejections - 35 USC § 103”. Therefore, claim 1 remains rejected under 35 U.S.C. 103. Claim 12 recites subject matter substantially similar to claim 1 and is likewise rejected. Claims 2-9 and 13-18 depend on claims 1 and 12 respectively and likewise remain rejected. Applicant’s arguments, see pg. 11-12 section “Rejection of claims 10, 11, 19 and 20 under 35 U.S.C. 103”, filed 18 November 2025, with respect to the rejections of claims 10, 11, 19, and 20 under 35 U.S.C. 103 have been fully considered and, in light of the above response to arguments of claims 1-9 and 12-18, are not persuasive. Therefore, claims 10, 11, 19, and 20 remain rejected under 35 U.S.C. 103. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-9 and 12-18 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. ("Learning Face Image Super-Resolution Through Facial Semantic Attribute Transformation and Self-Attentive Structure Enhancement" the full reference is contained on the PTO-892 included in this action; hereafter, Li) in view of Shi et al. (CN 111488779 A; hereafter, Shi) in further view of Wheeler et al. ("Multi-Frame Super-Resolution for Face Recognition", full reference on the PTO-892 included in this action; hereafter, Wheeler). Regarding claim 1, Li discloses: A face image reconstruction method executed by a face image reconstruction device comprising a processor (pg. 476 col. 2 para. 4, the method is performed on a GPU which is a processor), PNG media_image1.png 36 398 media_image1.png Greyscale the method comprising: acquiring training data (pg. 476 col. 2 para. 5, training data is acquired from large databases); PNG media_image2.png 54 348 media_image2.png Greyscale and training a video identity clarification model (video identity clarification network (VICN)) on the basis of the training data (pg. 471 col. 1 para. 2 and Fig. 3, the model this method is based around generates high resolution face images from low resolution images and is therefore understood to clarify the identity. Pg. 476 col. 1 para. 4, the model is trained), PNG media_image3.png 86 384 media_image3.png Greyscale PNG media_image4.png 92 388 media_image4.png Greyscale PNG media_image5.png 692 520 media_image5.png Greyscale wherein the training comprises: generating, by executing a generator of the video identity clarification model, a reconstructed face image (it is commonly understood in the art that a model is trained by operating the model and then adjusting factors within the model according to some loss function. Therefore, references which describe the operating of a model are understood to include the operation during training of a model. pg. 471 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, a "hallucinated" face is generated which is understood as a reconstructed face image) in which identity of a face shown in the at least one face image has been clarified (pg. 471 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, the hallucinated face is an optimized attempt to approximate the high resolution face image which is understood as the face image being clarified), PNG media_image6.png 36 394 media_image6.png Greyscale PNG media_image7.png 40 380 media_image7.png Greyscale and discriminating the reconstructed face image on the basis of the ground truth face image by executing a discriminator of the video identity clarification model (pg. 472 col. 1 para. 1 and Fig. 3, several discriminators are run comparing the generated image against the ground truth image), PNG media_image8.png 220 384 media_image8.png Greyscale which is in a generative adversarial network (GAN) competition relationship with the generator (pg. 471 col. 2 para. 2, the generator is in a relationship with the discriminator). PNG media_image9.png 72 380 media_image9.png Greyscale Li does not disclose expressly acquiring training data comprising a plurality of face images tracked from a series of frames of an input video, acquiring ground truth images for the plurality of face images, determining a reference image of the plurality of face images, performing inter-frame motion estimation between the reference image and at least one other face image, and combining the reference image and the at least one other face image so as to generate the reconstructed face image. Shi discloses: acquiring training data comprising a plurality of face images ([0058] at least two images containing a face are extracted from a video which is a plurality of face images) tracked from a series of frames of an input video ([0060] the face image is tracked through continuous frames, [0061] such as a video) and a ground truth face image for the plurality of face images ([0069] the video data used includes high definition videos, i.e. ground-truth, and blurred videos. [0073] generate a facial feature pair between images of two resolutions, high and low, see [0071]. [0074] Use the feature pair to train the neural network. Therefore, the high definition video is understood as ground truth face images); determining a reference image of the plurality of face images ([0059] a key image frame, i.e. a reference image, is obtained from the video stream, i.e. the plurality of face images), performing inter-frame motion estimation between the reference image and at least one other face image of the plurality of face images ([0061] the face is tracked in the complete sequence from the target image frame. Tracking the face is understood as inter-frame motion estimation), combining the reference image and the at least one other face image so as to generate the reconstructed face image ([0064] the facial sequence, which includes the reference image and other images, is used to obtain a super-resolution facial image. As plural images are the input and the output is a singular super-resolution facial image, it is understood that the input images are in some way combined); Li and Shi are combinable because they are from the same field of endeavor of improving the resolution of face images (Li, pg. 468 col. 2 para. 2; Shi, [0006]). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection comprising a plurality of images of Shi with the invention of Li. The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine using the reference face image operations of Shi with the invention of Li. The motivation for doing so would have been “the target image frame [i.e. reference image] contains complete image information of the face to be recognized” (Shi, [0059]). Therefore, it would have been obvious to combine Shi with Li. Li in view of Shi does not disclose expressly warping between the reference image and at least one other face image of the plurality of face images. Wheeler discloses: warping between the reference image and at least one other face image of the plurality of face images (pg. 3 col. 1 para. 1, the images are warped to a mean face shape which is understood as the reference image), Wheeler is combinable with Li in view of Shi because it is from the same field of endeavor of facial image super-resolution (Wheeler, pg. 1 col. 1 para. 3). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the warping of Wheeler with the invention of Li in view of Shi. The motivation for doing so would have been "The warping scales up and aligns each face image" (Wheeler, pg. 4 col. 1 para. 1). Therefore, it would have been obvious to combine Wheeler with Li in view of Shi to obtain the invention as specified in claim 1. Regarding claim 2, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1. Li does not disclose expressly that the one face image is extracted from a series of frames based on face feature point information. Shi discloses: wherein the acquiring of the training data comprises extracting the plurality of face images from the series of frames ([0058] at least two images containing a face are extracted from a video which is a plurality of face images), based on face feature point tracking information ([0063] facial key points are identified in the image frames) between successive frames of the series of frames (the above calculation of facial feature points is performed on the [0061] sequence of frames containing the face which is understood as successive frames). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection of Shi with the invention of Li. The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). Therefore, it would have been obvious to combine Shi with Li to obtain the invention as specified in claim 2. Regarding claim 3, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1. Li further discloses: wherein the generating of the reconstructed face image comprises executing a multi- frame face resolution enhancer of the generator (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, portions of the Attribute Transformation Network may be understood as a multi-frame face resolution enhancer) so as to generate an intermediate-reconstructed face image (pg. 472 col. 1 para. 1 and Fig. 4a, an intermediate high resolution face image is generated which is understood as the reconstructed face. It is generated from the low resolution image which may be understood as at least one face image). PNG media_image10.png 128 388 media_image10.png Greyscale PNG media_image11.png 86 380 media_image11.png Greyscale PNG media_image12.png 168 724 media_image12.png Greyscale Li does not disclose expressly to generate an intermediate-reconstructed face image from the plurality of face images. Shi discloses: to generate an intermediate-reconstructed face image from the plurality of face images ([0064] the reconstructed image is generated from a model which may be understood as a face resolution enhancer. The model receives as input a facial feature point heat map. [0063] the facial feature point heat map is a probability distribution graph. As this is generated after the input step of the process and before the reconstruction step and as a graph is an image this is understood as an intermediate image and is generated from a sequence of images, see [0061]). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the generation of the intermediate image from the plurality of images of Shi with the invention of Li. The motivation for doing so would have been that "It is understandable that the face contained in the video image may be one face or multiple faces. Therefore, it is necessary to track the image information of the face to be identified in the target image frame and obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). In other words, because there may be more than one face present, it is necessary to generate the intermediate image from a tracked sequence of the target face to avoid identifying plural faces as the target face. Therefore, it would have been obvious to combine Shi with Li to obtain the invention as specified in claim 3. Regarding claim 4, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 3. Li further discloses: wherein the generating of the reconstructed face image comprises: executing a face landmark estimator of the generator (pg. 474 col. 1 para. 2 and Fig. 5 and Fig. 7, the Boundary Extraction Unit is understood as a landmark estimator) PNG media_image13.png 120 380 media_image13.png Greyscale PNG media_image14.png 292 768 media_image14.png Greyscale PNG media_image15.png 214 362 media_image15.png Greyscale so as to estimate multiple face landmarks on the basis of the intermediate-reconstructed face image (pg. 474 col. 1 para. 2, the Boundary Extraction Unit estimates facial boundaries which under the broadest reasonable interpretation may be understood as estimates of multiple face landmarks); PNG media_image16.png 120 380 media_image16.png Greyscale and executing a face upsampler of the generator so as to upsample the intermediate-reconstructed face image (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler because it upsamples the facial boundary heatmaps which are extracted from the intermediate image) by using the multiple face landmarks (pg. 474 col. 2 para. 2 and Fig. 8, the facial boundary heatmaps are upsampled which is understood as upsampling by using the multiple landmarks). PNG media_image17.png 172 384 media_image17.png Greyscale PNG media_image18.png 206 362 media_image18.png Greyscale Regarding claim 5, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 4. Li further discloses: wherein: the generating of the reconstructed face image further comprises generating an intermediate image (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, an intermediate high resolution face image is generated), PNG media_image10.png 128 388 media_image10.png Greyscale PNG media_image11.png 86 380 media_image11.png Greyscale PNG media_image12.png 168 724 media_image12.png Greyscale which is the intermediate-reconstructed face image having enhanced resolution (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, the intermediate image is mapped to the same item as the intermediate-reconstructed face image and has an enhanced resolution as shown by upsampling the low resolution image to the same size as the target image), by using an intermediate image generator comprising multiple residual blocks (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, portions of the Attribute Transformation Network may be understood as an intermediate image generator and comprises multiple residual blocks); wherein the estimating comprises estimating the multiple face landmarks (pg. 474 col. 1 para. 2, the Boundary Extraction Unit estimates facial boundaries which under the broadest reasonable interpretation may be understood as estimates of multiple face landmarks) PNG media_image16.png 120 380 media_image16.png Greyscale on the basis of the intermediate image (Fig. 5, the Boundary Extraction Unit receives as input the intermediate image); PNG media_image19.png 288 770 media_image19.png Greyscale and wherein the upsampling comprises upsampling the intermediate image (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler because it upsamples the facial boundary heatmaps which are extracted from the intermediate image) by using the estimated multiple face landmarks (pg. 474 col. 2 para. 2 and Fig. 8, the facial boundary heatmaps are upsampled which is understood as upsampling by using the multiple landmarks). PNG media_image17.png 172 384 media_image17.png Greyscale PNG media_image18.png 206 362 media_image18.png Greyscale Regarding claim 6, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1. Li discloses: wherein the training of the video identity clarification model (pg. 475 col. 2 para. 6, Heatmap loss is considered and compares the boundary features of the output image and the ground truth image showing that this is used in training the model) further comprises extracting, by executing a face feature extractor of the video identity clarification model (pg. 474 col. 1 para. 2 and Fig. 5 and Fig. 7, the Boundary Extraction Unit is understood as a face feature extractor), PNG media_image13.png 120 380 media_image13.png Greyscale PNG media_image14.png 292 768 media_image14.png Greyscale PNG media_image15.png 214 362 media_image15.png Greyscale a feature map of the reconstructed face image (pg. 475 col. 2 para. 3, the output of the SE-Net, including the Boundary Extraction Unit, is a facial boundary heatmap which is understood as the feature map of the reconstructed face image) PNG media_image20.png 90 376 media_image20.png Greyscale and a feature map of the ground truth face image (pg. 477 col. 1 para. 2, the boundary map, i.e. face features, of the ground truth image are determined). PNG media_image21.png 212 382 media_image21.png Greyscale Regarding claim 7, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1. Li further discloses: wherein the training of the video identity clarification model further comprises: calculating a training objective function (training loss function) (pg. 473 col. 2 para. 3, a total training loss function is determined); PNG media_image22.png 126 382 media_image22.png Greyscale and alternately training the generator and the discriminator (pg. 473 col. 2 para. 3, the total loss considers individually the reconstruction loss and the adversarial loss and then combines them which is understood as alternatively training the generator and the discriminator) so as to minimize a function value of the training objective function (it is commonly understood in the art to minimize the value of a loss function when training. For example, in training the adversarial loss, Li expressly states in pg. 473 col. 1 para. 5 that the function is minimized). Regarding claim 8, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 7. Li further discloses: wherein the training objective function comprises: a first objective function comprising a GAN loss function for the generator (pg. 473 col. 1 para. 5, the reconstruction loss is on the generator); PNG media_image23.png 126 386 media_image23.png Greyscale and a second objective function based on a GAN loss function for the discriminator (pg. 473 col. 1 para. 6, the adversarial loss considers the function of the discriminator). PNG media_image24.png 316 382 media_image24.png Greyscale Regarding claim 9, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 8. Li further discloses: wherein the first objective function comprises a pixel reconstruction accuracy function between the reconstructed face image and the ground truth face image (pg. 473 col. 1 para. 5, the reconstruction loss is pixel wise and compares the generated image to the ground truth image), PNG media_image23.png 126 386 media_image23.png Greyscale an estimation accuracy function of face landmarks estimated during the generating of the reconstructed face image (pg. 473 col. 2 para. 2, the attribute loss considers the accuracy of attributes. Attributes may be understood as facial landmarks), PNG media_image25.png 172 384 media_image25.png Greyscale and a face feature similarity function between the reconstructed face image and the ground truth face image (pg. 475 col. 2 last para., the heatmap loss is considered to compare face features between the generated face image and the ground truth face image). PNG media_image26.png 154 380 media_image26.png Greyscale Regarding claim 12, Li discloses: A face image reconstruction device comprising: a memory (pg. 476 col. 2 para. 4, the workstation includes a memory) PNG media_image1.png 36 398 media_image1.png Greyscale configured to store a video identity clarification model (pg. 471 col. 1 para. 2, the model this method is based around generates high resolution face images from low resolution images and is therefore understood to clarify the identity. A person of ordinary skill in the art would understand that the model would be stored on the memory) PNG media_image3.png 86 384 media_image3.png Greyscale comprising a generator and a discriminator which is in a generative adversarial network competition relationship with the generator (pg. 471 col. 2 para. 2, the generator is in a relationship with the discriminator); PNG media_image9.png 72 380 media_image9.png Greyscale and a processor (pg. 476 col. 2 para. 4, the method is performed on a GPU which is a processor) PNG media_image1.png 36 398 media_image1.png Greyscale configured to execute training of the video identity clarification model (pg. 471 col. 1 para. 2 and Fig. 3, the model this method is based around generates high resolution face images from low resolution images and is therefore understood to clarify the identity. Pg. 476 col. 1 para. 4, the model is trained) PNG media_image3.png 86 384 media_image3.png Greyscale PNG media_image4.png 92 388 media_image4.png Greyscale PNG media_image5.png 692 520 media_image5.png Greyscale on the basis of training data (pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, faces are cropped from images. Face images are downsampled as the low resolution image which is understood as the at least one face image), wherein the processor is configured, in order to perform the executing of the training, to: generate, by executing the generator, a reconstructed face image (it is commonly understood in the art that a model is trained by operating the model and then adjust factors within the model according to some loss function. Therefore, references which describe the operating of a model are understood to include the operation during training of a model. pg. 472 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, a "hallucinated" face is generated which is understood as a reconstructed face image) in which identity of a face shown in the at least one face image has been clarified (pg. 472 col. 2 last para. through pg. 473 col. 1 para. 1 and Fig. 3, the hallucinated face is an optimized attempt to approximate the high resolution face image which is understood as the face image being clarified); PNG media_image6.png 36 394 media_image6.png Greyscale PNG media_image7.png 40 380 media_image7.png Greyscale and discriminate the reconstructed face image on the basis of the ground truth face image by executing the discriminator (pg. 472 col. 1 para. 1 and Fig. 3, several discriminators are run comparing the generated image against the ground truth image). PNG media_image8.png 220 384 media_image8.png Greyscale Li does not disclose expressly acquiring training data comprising a plurality of face images tracked from a series of frames of an input video, acquiring ground truth images for the plurality of face images, determining a reference image of the plurality of face images, performing inter-frame motion estimation between the reference image and at least one other face image, and combining the reference image and the at least one other face image so as to generate the reconstructed face image. Shi discloses: a face image tracked from a series of frames of an input video ([0060] the face image is tracked through continuous frames, [0061] such as a video) It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection of Shi with the invention of Li. The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). Therefore, it would have been obvious to combine Shi with Li. training data comprising a plurality of face images ([0058] at least two images containing a face are extracted from a video which is a plurality of face images) tracked from a series of frames of an input video ([0060] the face image is tracked through continuous frames, [0061] such as a video) and a ground truth face image for the plurality of face images ([0069] the video data used includes high definition videos, i.e. ground-truth, and blurred videos. [0073] generate a facial feature pair between images of two resolutions, high and low, see [0071]. [0074] Use the feature pair to train the neural network. Therefore, the high definition video is understood as ground truth face images); determining a reference image of the plurality of face images ([0059] a key image frame, i.e. a reference image, is obtained from the video stream, i.e. the plurality of face images), performing inter-frame motion estimation between the reference image and at least one other face image of the plurality of face images ([0061] the face is tracked in the complete sequence from the target image frame. Tracking the face is understood as inter-frame motion estimation), combining the reference image and the at least one other face image so as to generate the reconstructed face image ([0064] the facial sequence, which includes the reference image and other images, is used to obtain a super-resolution facial image. As plural images are the input and the output is a singular super-resolution facial image, it is understood that the input images are in some way combined); Li and Shi are combinable because they are from the same field of endeavor of improving the resolution of face images (Li, pg. 468 col. 2 para. 2; Shi, [0006]). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection comprising a plurality of images of Shi with the invention of Li. The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine using the reference face image operations of Shi with the invention of Li. The motivation for doing so would have been “the target image frame [i.e. reference image] contains complete image information of the face to be recognized” (Shi, [0059]). Therefore, it would have been obvious to combine Shi with Li. Li in view of Shi does not disclose expressly warping between the reference image and at least one other face image of the plurality of face images. Wheeler discloses: warping between the reference image and at least one other face image of the plurality of face images (pg. 3 col. 1 para. 1, the images are warped to a mean face shape which is understood as the reference image), It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the warping of Wheeler with the invention of Li in view of Shi. The motivation for doing so would have been "The warping scales up and aligns each face image" (Wheeler, pg. 4 col. 1 para. 1). Therefore, it would have been obvious to combine Wheeler with Li in view of Shi to obtain the invention as specified in claim 12. Regarding claim 13, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12. Li further discloses: wherein the processor is configured to acquire the training data (pg. 476 col. 2 para. 5, training data is acquired from large databases), PNG media_image2.png 54 348 media_image2.png Greyscale Li does not disclose expressly that the one face image is extracted from the plurality of face images series of frames based on face feature point information. Shi discloses: and the processor is configured, in order to acquire the training data, to extract the plurality of face images from the series of frames ([0058] at least two images containing a face are extracted from a video which is a plurality of face images), based on face feature point tracking information ([0063] facial key points are identified in the image frames) between successive frames of the series of frames (the above calculation of facial feature points is performed on the [0061] sequence of frames containing the face which is understood as successive frames). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the face image selection of Shi with the invention of Li. The motivation for doing so would have been to "obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). Therefore, it would have been obvious to combine Shi with Li to obtain the invention as specified in claim 13. Regarding claim 14, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12. Li further discloses: wherein the generator comprises a multi-frame face resolution enhancer (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, portions of the Attribute Transformation Network may be understood as a multi-frame face resolution enhancer), and the processor is configured, in order to perform the generating of the reconstructed face image, to generate an intermediate-reconstructed face image from the at least one face image (pg. 472 col. 1 para. 1 and Fig. 4a, an intermediate high resolution face image is generated which is understood as the reconstructed face. It is generated from the low resolution image which may be understood as at least one face image) by executing the multi-frame face resolution enhancer. PNG media_image10.png 128 388 media_image10.png Greyscale PNG media_image11.png 86 380 media_image11.png Greyscale PNG media_image12.png 168 724 media_image12.png Greyscale Li does not disclose expressly to generate an intermediate-reconstructed face image from the plurality of face images. Shi discloses: to generate an intermediate-reconstructed face image from the plurality of face images ([0064] the reconstructed image is generated from a model which may be understood as a face resolution enhancer. The model receives as input a facial feature point heat map. [0063] the facial feature point heat map is a probability distribution graph. As this is generated after the input step of the process and before the reconstruction step and as a graph is an image this is understood as an intermediate image and is generated from a sequence of images, see [0061]). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the generation of the intermediate image from the plurality of images of Shi with the invention of Li. The motivation for doing so would have been that "It is understandable that the face contained in the video image may be one face or multiple faces. Therefore, it is necessary to track the image information of the face to be identified in the target image frame and obtain the complete face sequence corresponding to the face to be identified" (Shi, [0061]). In other words, because there may be more than one face present, it is necessary to generate the intermediate image from a tracked sequence of the target face to avoid identifying plural faces as the target face. Therefore, it would have been obvious to combine Shi with Li to obtain the invention as specified in claim 14. Regarding claim 15, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 14. Li further discloses: wherein the generator further comprises a face landmark estimator (pg. 474 col. 1 para. 2 and Fig. 5 and Fig. 7, the Boundary Extraction Unit is understood as a landmark estimator) PNG media_image13.png 120 380 media_image13.png Greyscale PNG media_image14.png 292 768 media_image14.png Greyscale PNG media_image15.png 214 362 media_image15.png Greyscale and a face upsampler (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler), and the processor is configured, in order to perform the generating of the reconstructed face image, to: execute the face landmark estimator so as to estimate multiple face landmarks on the basis of the intermediate-reconstructed face image (pg. 474 col. 1 para. 2, the Boundary Extraction Unit estimates facial boundaries which under the broadest reasonable interpretation may be understood as estimates of multiple face landmarks); PNG media_image16.png 120 380 media_image16.png Greyscale and execute the face upsampler so as to upsample the intermediate-reconstructed face image (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler because it upsamples the facial boundary heatmaps which are extracted from the intermediate image) by using the multiple face landmarks (pg. 474 col. 2 para. 2 and Fig. 8, the facial boundary heatmaps are upsampled which is understood as upsampling by using the multiple landmarks). PNG media_image17.png 172 384 media_image17.png Greyscale PNG media_image18.png 206 362 media_image18.png Greyscale Regarding claim 16, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 15. Li further discloses: wherein the generator further comprises an intermediate image generator comprising multiple residual blocks (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, portions of the Attribute Transformation Network may be understood as an intermediate image generator and comprises multiple residual blocks), and the processor is configured, in order to perform the generating of the reconstructed face image, to: generate an intermediate image (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, an intermediate high resolution face image is generated), which is the intermediate-reconstructed face image having enhanced resolution (pg. 472 col. 1 last para. through pg. 472 col. 2 para. 1 and Fig. 4a, the intermediate image is mapped to the same item as the intermediate-reconstructed face image and has an enhanced resolution as shown by upsampling the low resolution image to the same size as the target image), by using the intermediate image generator; PNG media_image10.png 128 388 media_image10.png Greyscale PNG media_image11.png 86 380 media_image11.png Greyscale PNG media_image12.png 168 724 media_image12.png Greyscale execute the face landmark estimator so as to estimate the multiple face landmarks (pg. 474 col. 1 para. 2, the Boundary Extraction Unit estimates facial boundaries which under the broadest reasonable interpretation may be understood as estimates of multiple face landmarks) PNG media_image16.png 120 380 media_image16.png Greyscale on the basis of the intermediate image (Fig. 5, the Boundary Extraction Unit receives as input the intermediate image); PNG media_image19.png 288 770 media_image19.png Greyscale and execute the face upsampler so as to upsample the intermediate image (pg. 474 col. 2 para. 2 and Fig. 8, the Feature Fusion Unit may be understood as a face upsampler because it upsamples the facial boundary heatmaps which are extracted from the intermediate image) by using the multiple face landmarks estimated based on the intermediate image (pg. 474 col. 2 para. 2 and Fig. 8, the facial boundary heatmaps are upsampled which is understood as upsampling by using the multiple landmarks). PNG media_image17.png 172 384 media_image17.png Greyscale PNG media_image18.png 206 362 media_image18.png Greyscale Regarding claim 17, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12. Li further discloses: wherein the video identity clarification model further comprises a face feature extractor (pg. 474 col. 1 para. 2 and Fig. 5 and Fig. 7, the Boundary Extraction Unit is understood as a face feature extractor), PNG media_image13.png 120 380 media_image13.png Greyscale PNG media_image14.png 292 768 media_image14.png Greyscale PNG media_image15.png 214 362 media_image15.png Greyscale and the processor is configured, in order to perform the executing of the training (pg. 475 col. 2 para. 6, Heatmap loss is considered and compares the boundary features of the output image and the ground truth image showing that this is used in training the model) to extract a feature map of the reconstructed face image (pg. 475 col. 2 para. 3, the output of the SE-Net, including the Boundary Extraction Unit, is a facial boundary heatmap which is understood as the feature map of the reconstructed face image) PNG media_image20.png 90 376 media_image20.png Greyscale and a feature map of the ground truth face image by executing a face feature extractor (pg. 477 col. 1 para. 2, the boundary map, i.e. face features, of the ground truth image are determined). PNG media_image21.png 212 382 media_image21.png Greyscale Regarding claim 18, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12. Li further discloses: wherein: the processor is configured, in order to perform the executing of the training, to calculate a training objective function (pg. 473 col. 2 para. 3, a total training loss function is determined), and alternately train the generator and the discriminator (pg. 473 col. 2 para. 3, the total loss considers individually the reconstruction loss and the adversarial loss and then combines them which is understood as alternatively training the generator and the discriminator) PNG media_image22.png 126 382 media_image22.png Greyscale so as to minimize a function value of the training objective function (it is commonly understood in the art to minimize the value of a loss function when training. For example, in training the adversarial loss, Li expressly states in pg. 473 col. 1 para. 5 that the function is minimized); the training objective function comprises a first objective function comprising a GAN loss function for the generator (pg. 473 col. 1 para. 5, the reconstruction loss is on the generator) PNG media_image23.png 126 386 media_image23.png Greyscale and a second objective function based on a GAN loss function for the discriminator (pg. 473 col. 1 para. 6, the adversarial loss considers the function of the discriminator); PNG media_image24.png 316 382 media_image24.png Greyscale and the first objective function comprises a pixel reconstruction accuracy function between the reconstructed face image and the ground truth face image (pg. 473 col. 1 para. 5, the reconstruction loss is pixel wise and compares the generated image to the ground truth image), PNG media_image23.png 126 386 media_image23.png Greyscale an estimation accuracy function of face landmarks estimated during the generating of the reconstructed face image (pg. 473 col. 2 para. 2, the attribute loss considers the accuracy of attributes. Attributes may be understood as facial landmarks), PNG media_image25.png 172 384 media_image25.png Greyscale and a face feature similarity function between the reconstructed face image and the ground truth face image (pg. 475 col. 2 last para., the heatmap loss is considered to compare face features between the generated face image and the ground truth face image). PNG media_image26.png 154 380 media_image26.png Greyscale Claims 10-11 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. ("Learning Face Image Super-Resolution Through Facial Semantic Attribute Transformation and Self-Attentive Structure Enhancement" the full reference is contained on the PTO-892 included in this action; hereafter, Li) in view of Shi et al. (CN 111488779 A; hereafter, Shi) in further view of Wheeler et al. ("Multi-Frame Super-Resolution for Face Recognition", full reference on the PTO-892 included in this action; hereafter, Wheeler) and of Prince et al. (US 20220058438 A1; hereafter, Prince). Regarding claim 10, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 1. Lin in view of Shi in further view of Wheeler does not disclose expressly a second training of fine-tuning the model based on second training data. Prince discloses: executing second training of fine-tuning the video identity clarification model ([0053] the EDSR model is pre-trained and fine-tuned, i.e. second training. The EDSR [0044] is a deep residual network for image super resolution. When considered in combination with Li, it is understood as an identity clarification model), based on second training data comprising at least one face image of a search target (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data, i.e. second training data. [0053] the image data includes an aliased low resolution image which is the target low resolution image with aliasing added. When considered in combination with Li, it is understood that the process represented here with an aliased low resolution image may be performed with a down sampled face image, Li pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, i.e. a face image of a search target. The reason for this is that the aliased low resolution image performs the same function as the face image that is downsampled from full resolution of Li, pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, because both are an edited original or target image which are edited in order to train a network to reconstruct the original or target image) and a reference face image for the at least one face image of the search target (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data. [0053] the image data includes a low resolution image without aliasing which is understood as the reference image). Prince is combinable with Li in view of Shi in further view of Wheeler because it is solving the same problem of processing images to improve resolution (Prince, [0002]). It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the second training of Prince with the invention of Li in view of Shi in further view of Wheeler. The motivation for doing so would have been “fine-tuning a pre-trained model is accurate and fast “ (Prince, [0053]). Therefore, it would have been obvious to combine Prince with Li in view of Shi in further view of Wheeler to obtain the invention as specified in claim 10. Regarding claim 11, Li in view of Shi in further view of Wheeler and Prince discloses the subject matter of claim 10. Li further discloses: wherein the executing of the second training comprises executing the generating and the discriminating of the reconstructed face image (it is commonly understood in the art that a model is trained by operating the model and then adjusting factors within the model according to some loss function. Therefore, references which describe the operating of a model are understood to include the operation during training of a model. pg. 472 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, a "hallucinated" face is generated and discriminated which is understood as a reconstructed face image. Performing a second training is understood as repeating the training process which the system may perform), Li in view of Shi in further view of Wheeler does not disclose a second training data. Prince discloses: training based on the second training data (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data, therefore, the second training is performed based on second training data). Regarding claim 19, Li in view of Shi in further view of Wheeler discloses the subject matter of claim 12. Lin in view of Shi in further view of Wheeler does not disclose expressly a second training of fin-tuning the model based on second training data. Prince discloses: execute second training of fine-tuning the video identity clarification model ([0053] the EDSR model is pre-trained and fine-tuned, i.e. second training. The EDSR [0044] is a deep residual network for image super resolution. When considered in combination with Li, it is understood as an identity clarification model), based on second training data comprising at least one face image of a search target (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data, i.e. second training data. [0053] the image data includes an aliased low resolution image which is the target low resolution image with aliasing added. When considered in combination with Li, it is understood that the process represented here with an aliased low resolution image may be performed with a down sampled face image, Li pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, i.e. a face image of a search target. The reason for this is that the aliased low resolution image performs the same function as the face image that is downsampled from full resolution of Li, pg. 476 col. 2 last para. through pg. 477 col. 1 para. 1, because both are an edited original or target image which are edited in order to train a network to reconstruct the original or target image) and a reference face image for the at least one face image of the search target (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data. [0053] the image data includes a low resolution image without aliasing which is understood as the reference image). It would have been It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention to combine the second training of Prince with the invention of Li in view of Shi in further view of Wheeler. The motivation for doing so would have been “fine-tuning a pre-trained model is accurate and fast “ (Prince, [0053]). Therefore, it would have been obvious to combine Prince with Li in view of Shi in further view of Wheeler to obtain the invention as specified in claim 19. Regarding claim 20, Li in view of Shi in further view of Wheeler and Prince discloses the subject matter of claim 19. Li further discloses: wherein the executing of the second training comprises executing the generating and the discriminating of the reconstructed face image (it is commonly understood in the art that a model is trained by operating the model and then adjusting factors within the model according to some loss function. Therefore, references which describe the operating of a model are understood to include the operation during training of a model. pg. 472 col. 2 last para. through pg. 472 col. 1 para. 1 and Fig. 3, a "hallucinated" face is generated and discriminated which is understood as a reconstructed face image. Performing a second training is understood as repeating the training process which the system may perform), Li in view of Shi in further view of Wheeler does not disclose a second training data. Prince discloses: training based on the second training data (Claim 10, the data that a network is pre-trained on and fine-tuned with are different data, therefore, the second training is performed based on second training data). Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. CN 111563427 A, Ning et al., discloses a system which performs aligning of facial images which may be understood as warping so that it can improve the resolution of facial images. Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSHUA B CROCKETT whose telephone number is (571)270-7989. The examiner can normally be reached Monday-Thursday 8am-5pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John M Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /JOSHUA B. CROCKETT/Examiner, Art Unit 2661 /JOHN VILLECCO/Supervisory Patent Examiner, Art Unit 2661
Read full office action

Prosecution Timeline

Apr 21, 2023
Application Filed
Aug 19, 2025
Non-Final Rejection — §103
Nov 18, 2025
Response Filed
Jan 08, 2026
Final Rejection — §103
Apr 07, 2026
Request for Continued Examination
Apr 13, 2026
Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12592060
ARTIFICIAL INTELLIGENCE DEVICE AND 3D AGENCY GENERATING METHOD THEREOF
2y 5m to grant Granted Mar 31, 2026
Patent 12587704
VIDEO DATA TRANSMISSION AND RECEPTION METHOD USING HIGH-SPEED INTERFACE, AND APPARATUS THEREFOR
2y 5m to grant Granted Mar 24, 2026
Patent 12567150
EDITING PRESEGMENTED IMAGES AND VOLUMES USING DEEP LEARNING
2y 5m to grant Granted Mar 03, 2026
Patent 12561839
SYSTEMS AND METHODS FOR CALIBRATING IMAGE SENSORS OF A VEHICLE
2y 5m to grant Granted Feb 24, 2026
Patent 12529639
METHOD FOR ESTIMATING HYDROCARBON SATURATION OF A ROCK
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
99%
With Interview (+27.5%)
3y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 18 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month