Last updated: April 19, 2026

Application No. 17/751,322

PROCESSING FRAMEWORK FOR TEMPORAL-CONSISTENT FACE MANIPULATION IN VIDEOS

Final Rejection §103

Filed

May 23, 2022

Examiner

CHEN, HUO LONG

Art Unit

2682

Tech Center

2600 — Communications

Assignee

Adobe Inc.

OA Round

4 (Final)

Interview Optional

— +30.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 590 resolved cases, 2023–2026

Examiner Intelligence

CHEN, HUO LONG View full profile →

Grants 53% of resolved cases

Career Allow Rate

314 granted / 590 resolved

-8.8% vs TC avg

Strong +30% interview lift

Without

With

+30.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

37 currently pending

Career history

627

Total Applications

across all art units

Statute-Specific Performance

§101

11.3%

-28.7% vs TC avg

§103

64.3%

+24.3% vs TC avg

§102

12.5%

-27.5% vs TC avg

§112

8.1%

-31.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 590 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments filed on November 19, 2025 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Response to Amendment
The amendment to the claims received on November 19, 2025 has been entered.
The amendment of claims 1, 10 and 19 is acknowledged.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6-10 and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Assouline’602 (US 2023/0196602), and further in view of Zukerman’322 (US 2018/0190322) and Jeong’262 (US 2022/0207262).
     With respect to claim 19, Assouline’602 teaches a system (Fig.5) comprising: 
     a memory component [the system shown in Fig.5 is inherent disclosed with a memory to a program to be executed by the one or more processors in to order to allow the said system to provide the desired functions]; and 
     a processing device coupled to the memory component, the processing device to execute instructions stored on the memory component which cause the system to perform operations [the system shown in Fig.5 is inherent disclosed with one or more processor since a working system needs to include at least one or more processor to perform its desired functions] comprising: 
     receiving an input digital video including a plurality of frames [regarding to the received video 710 and the received video 720 shown in Fig.7]; 
     providing the input digital video to the video manipulation network (Fig.7);
     Assouline’602 does not teach after receiving the input digital video, receiving, from an appearance input, modifications to a target appearance of a subject of the input digital video, the modifications including changes to one or more of an expression and facial features of the subject of the input digital video; generating a plurality of target appearance frames corresponding to the input digital video based on the appearance input defining a target appearance of a subject of the input digital video; training a video manipulation network, using the plurality of target appearance frames and the input digital video, to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance of the subject in the plurality of target appearance frames; and generating, by the video manipulation network, a temporally consistent output digital video wherein the subject of the temporally consistent output digital video has its appearance modified to match the target appearance.
     Zukerman’322 teaches after receiving the input digital video, receiving, from an appearance input, modifications to a target appearance of a subject of the input digital video, the modifications including changes to one or more of an expression and facial features of the subject of the input digital video (Fig.4A and paragraphs 33 and 34); 
     generating a plurality of target appearance frames corresponding to the input digital video based on the appearance input defining a target appearance of a subject of the input digital video (paragraph 64).
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Assouline’602 according to the teaches of Zukerman’322 to provide a graphical user interface to enable a user to select a desired sticker to modify the appearance of the subjects in the frames of the received video to create a new edited video because this will allow the appearance of the subject in the frames of the received video to be modified more effectively.
     The combination of Assouline’602 and Zukerman’322 does not teach training a video manipulation network, using the plurality of target appearance frames and the input digital video, to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance of the subject in the plurality of target appearance frames; and generating, by the video manipulation network, a temporally consistent output digital video wherein the subject of the temporally consistent output digital video has its appearance modified to match the target appearance.
     Since Assouline’602 has suggested a machine learning model (a video manipulation network) is being trained to generate new videos having the modified contents according to the training data including the videos (Fig. 5 and paragraphs 105-114) and Zukerman’322 has suggested that a user selects a desired sticker to change the appearance of the subjects in an inputted video via a graphical user interface and to generate a modified video with the frames having the modified appearance of the subjects (Fig.4A and paragraphs 33, 34 and 64), therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to generate a modified video with the frames having the modified appearance of the subjects which is associated with an received video and then to use the modified video and the received video to train a machine learning model (a video manipulation network) (training a video manipulation network, using the plurality of target appearance frames and the input digital video, to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance of the subject in the plurality of target appearance frames) because this will allow the machine learning model (a video manipulation network) to generate a modified video with the modified appearance of the subjects more effectively.
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 to generate a modified video with the frames having the modified appearance of the subjects which is associated with an received video and then to use the modified video and the received video to train a machine learning model (a video manipulation network) (training a video manipulation network, using the plurality of target appearance frames and the input digital video, to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance of the subject in the plurality of target appearance frames) because this will allow the machine learning model (a video manipulation network) to generate a modified video with the modified appearance of the subjects more effectively.
     The combination of Assouline’602 and Zukerman’322 does not teach generating, by the video manipulation network, a temporally consistent output digital video wherein the subject of the temporally consistent output digital video has its appearance modified to match the target appearance.
     Jeong’262 teaches generating, by the video manipulation network, a temporally consistent output digital video wherein the subject of the temporally consistent output digital video has its appearance modified to match the target appearance [as shown in Fig.1A, a synthesized video data having its appearance modified to match a target appearance is being generated from an original video data. A video manipulation network is considered being disclosed to generate the synthesized video data having its appearance modified to match a target appearance from an original video data. The synthesized video data considered either generated temporally or permanently since the synthesized video data either being stored to be used in the further or being deleted after it is being use].  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to generate a synthesized video data having its appearance modified to match a target appearance is being generated from an original video data because this will allow the desired video data to be generated from an original video data more effectively. 
     With respect to claim 1, it is a method claim that claims how the system of claim 1 to manipulate the content in a video to a new appearance.  Claim 1 is obvious in view of Assouline’602, Zukerman’322 and Jeong’262 because the claimed combination operates at the same manner as described in the rejected claim 19. In addition, the reference has disclosed a system to manipulate the content in a video to a new appearance, the process (method) to manipulate the content in a video to a new appearance is inherent disclosed to be performed by a processor in the system when the system performs the operation to manipulate the content in a video to a new appearance.
     With respect to claim 6, which further limits claim 1, the combination of Assouline’602 and Zukerman’322 does not teach wherein training a video prediction network to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance, further comprises: training a plurality of video manipulation networks, wherein the plurality of video manipulation networks include an encoder network, a first decoder network, and a second decoder network.  
     Jeong’262 teaches wherein training a video prediction network to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance, further comprises: training a plurality of video manipulation networks, wherein the plurality of video manipulation networks includes an encoder network, a first decoder network, and a second decoder network (Fig.4 and Fig.8)  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to train a prediction network according to the generated synthesized video data having its appearance modified to match a target appearance because this will allow the desired video data to be generated from another video data more effectively. 
    With respect to claim 7, which further limits claim 6, the combination of Assouline’602 and Zukerman’322 does not teach wherein training a plurality of video manipulation networks, wherein the plurality of video manipulation networks include an encoder network, a first decoder network, and a second decoder network, further comprises: providing the plurality of frames and the plurality of target appearance frames to the encoder network; generating, by the encoder network, a representation of the plurality of frames and a representation of the plurality of target appearance frames; reconstructing, by the first decoder network, a plurality of reconstructed frames from the representation of the plurality of frames; reconstructing, by the second decoder network, a plurality of reconstructed target appearance frames from the representation of the plurality of target appearance frames; and training the first decoder network, second decoder network, and encoder network, by comparing the plurality of reconstructed frames to the plurality of frames and the plurality of reconstructed target appearance frames to the plurality of target appearance frames using a loss function.  
     Jeong’262 teaches wherein training a plurality of video manipulation networks, wherein the plurality of video manipulation networks includes an encoder network, a first decoder network, and a second decoder network (paragraph 9), further comprises: 
     providing the plurality of frames and the plurality of target appearance frames to the encoder network (paragraph 9); 
     generating, by the encoder network, a representation of the plurality of frames and a representation of the plurality of target appearance frames (paragraph 9); 
     reconstructing, by the first decoder network, a plurality of reconstructed frames from the representation of the plurality of frames (paragraph 9); 
     reconstructing, by the second decoder network, a plurality of reconstructed target appearance frames from the representation of the plurality of target appearance frames (paragraph 9); and 
     training the first decoder network, second decoder network, and encoder network, by comparing the plurality of reconstructed frames to the plurality of frames and the plurality of reconstructed target appearance frames to the plurality of target appearance frames using a loss function (paragraph 9).  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to train a prediction network according to the generated synthesized video data having its appearance modified to match a target appearance because this will allow the desired video data to be generated from another video data more effectively. 
     With respect to claim 8, which further limits claim 6, the combination of Assouline’602 and Zukerman’322 does not teach wherein the video prediction network comprises the encoder network and the second decoder network.  
     Jeong’262 teaches teach wherein the video prediction network comprises the encoder network and the second decoder network (paragraph 9).  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to train a prediction network according to the generated synthesized video data having its appearance modified to match a target appearance because this will allow the desired video data to be generated from another video data more effectively. 
     With respect to claim 9, which further limits claim 1, the combination of Assouline’602 and Zukerman’322 does not teach wherein a subject of the input digital video includes a representation of a person's face and wherein the target appearance includes a change to an expression or appearance of the person's face.  
     Jeong’262 teaches wherein a subject of the input digital video includes a representation of a person's face and wherein the target appearance includes a change to an expression or appearance of the person's face (Fig.1A).  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to train a prediction network according to the generated synthesized video data having appearance of human because this will allow the desired video data associated with human appearance to be generated from another video data more effectively. 
     With respect to claims 10, and 15-18, they are being analyzed and rejected for the same reason set forth in the rejection of claims 1 and 6-9. 
Claims 2, 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Assouline’602 (US 2023/0196602), Zukerman’322 (US 2018/0190322), Jeong’262 (US 2022/0207262) and further in view of Gopalkrishna’108 (US 2023/0005108).
     With respect to claim 2, which further limits claim 1, the combination of Assouline’602, Zukerman’322 and Jeong’262 does not teach wherein the plurality of frames represents a subset of the input digital video (Fig.1, item 102).  
     Gopalkrishna’108 teaches wherein the plurality of frames represents a subset of the input digital video (Fig.1, item 102).  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322 and Jeong’262 according to the teaching of Gopalkrishna’108 to train a machine learning model to manipulate an object in a video to a desired target appearance so that a new video is being generated by replacing the said object with the said desired target appearance of the object because this will allow the desired video data to be generated from an original video data more effectively. 
     With respect to claim 11, it is being analyzed and rejected for the same reason set forth in the rejection of claim 2. 
     With respect to claim 20, which further limits claim 19, the combination of Assouline’602, Zukerman’322 and Jeong’262 does not teach wherein the plurality of frames represents a subset of the input digital video (Fig.1, item 102).  
     Gopalkrishna’108 teaches wherein the plurality of frames represents a subset of the input digital video (Fig.1, item 102).  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322 and Jeong’262 according to the teaching of Gopalkrishna’108 to train a machine learning model to manipulate an object in a video to a desired target appearance so that a new video is being generated by replacing the said object with the said desired target appearance of the object because this will allow the desired video data to be generated from an original video data more effectively.
Claims 3, 4, 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Assouline’602 (US 2023/0196602), Zukerman’322 (US 2018/0190322), Jeong’262 (US 2022/0207262) and further in view of Wu’299 (US 2024/0112299).
     With respect to claim 3, which further limits claim 1, the combination of Assouline’602, Zukerman’322 and Jeong’262 does not teach wherein generating a plurality of target appearance frames from the plurality of frames, further comprises: processing the plurality of frames by a subject cropping network to generate a plurality of input cropped images.  
     Wu’299 teaches wherein generating a plurality of target appearance frames from the plurality of frames, further comprises: processing the plurality of frames by a subject cropping network to generate a plurality of input cropped images (Fig.2).  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322 and Jeong’262 according to the teaching of Wu’299 to determine the target video frames in the original video and then to perform the desired cropping operation on the desired area of the target video frames because this will allow the contents in an original video data to be processed more effectively. 
     With respect to claim 4, which further limits claim 3, the combination of Assouline’602, Zukerman’322, Jeong’262 and Wu’299 does not teach providing the plurality of input cropped images to a subject manipulation network; identifying a plurality of latent representations corresponding to the target appearance in the plurality of input cropped images; and generating a plurality of target appearance cropped images using the plurality of latent representations.  
     Since Assouline’602has suggested that the image frames in a video is being extracted and then the interested contents in the extracted image frame are being replaced with desired contents (Fig. 5 and paragraphs 105-114) and Wu’299 has suggested that a cropping box is being used to cropped the frame picture generated from an original video (Fig. 2), therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to recognize to extract the interested image frames from an inputted video and then to use the cropping box to crop the desired contents from the extract the interested image frames to be replaced with desired contents (providing the plurality of input cropped images to a subject manipulation network; identifying a plurality of latent representations corresponding to the target appearance in the plurality of input cropped images; and generating a plurality of target appearance cropped images using the plurality of latent representations) because this will allow the target image frame extracted from a video to be manipulated more effectively.
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322, Jeong’262 and Wu’299 to extract the interested image frames from an inputted video and then to use the cropping box to crop the desired contents from the extract the interested image frames to be replaced with desired contents (providing the plurality of input cropped images to a subject manipulation network; identifying a plurality of latent representations corresponding to the target appearance in the plurality of input cropped images; and generating a plurality of target appearance cropped images using the plurality of latent representations) because this will allow the target image frame extracted from a video to be manipulated more effectively.
     With respect to claims 12-13, they are being analyzed and rejected for the same reason set forth in the rejection of claims 3-4. 
Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Assouline’602 (US 2023/0196602), Zukerman’322 (US 2018/0190322), Jeong’262 (2022/0207262), Wu’299 (US 2024/0112299) and further in view of Gopalkrishna’108 (US 2023/0005108).
     With respect to claim 5, which further limits claim 4, the combination of Assouline’602, Zukerman’322, Jeong’262 and Wu’299 does not teach blending the plurality of target appearance cropped images and the plurality of frames using a subject blending network to generate the plurality of target appearance frames [when the original text in the extracted image frames are being replaced with desired text, the original text is considered being cropped
     Gopalkrishna’108 teaches blending the plurality of target appearance cropped images and the plurality of frames using a subject blending network to generate the plurality of target appearance frames [when the original text in the extracted image frames are being replaced with desired text, the original text is considered being cropped (paragraph 39)].  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322, Jeong’262 and Wu’299 according to the teaching of Gopalkrishna’108 to train a machine learning model to manipulate an object in a video to a desired target appearance so that a new video is being generated by replacing the said object with the said desired target appearance of the object because this will allow the desired video data to be generated from an original video data more effectively. 
     With respect to claim 14, it is being analyzed and rejected for the same reason set forth in the rejection of claim 5. 
Conclusion
     Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
     A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Contact
     Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUO LONG CHEN whose telephone number is (571)270-3759.  The examiner can normally be reached on M-F 9am - 5pm.  If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tieu, Benny can be reached on (571) 272-7490.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.  Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HUO LONG CHEN/Primary Examiner, Art Unit 2682

Read full office action

Prosecution Timeline

May 23, 2022

Application Filed

Oct 19, 2024

Non-Final Rejection — §103

Jan 03, 2025

Applicant Interview (Telephonic)

Apr 07, 2025

Applicant Interview (Telephonic)

Apr 07, 2025

Response Filed

Apr 08, 2025

Examiner Interview Summary

Apr 19, 2025

Final Rejection — §103

Jul 18, 2025

Request for Continued Examination

Jul 18, 2025

Applicant Interview (Telephonic)

Jul 19, 2025

Examiner Interview Summary

Jul 22, 2025

Response after Non-Final Action

Aug 19, 2025

Non-Final Rejection — §103

Nov 19, 2025

Response Filed

Nov 19, 2025

Applicant Interview (Telephonic)

Nov 24, 2025

Examiner Interview Summary

Dec 13, 2025

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/123,449

Patent 12603178

APPARATUS AND METHODS FOR SUPPORTING MEDICAL DECISIONS

2y 5m to grant Granted Apr 14, 2026

17/721,915

Patent 12597162

SYSTEM CALIBRATION USING REMOTE SENSOR DATA

2y 5m to grant Granted Apr 07, 2026

18/202,975

Patent 12592095

METHOD AND SYSTEM OF DETERMINING SHAPE OF A TABLE IN A DOCUMENT

2y 5m to grant Granted Mar 31, 2026

18/300,016

Patent 12586398

Detecting a Homoglyph in a String of Characters

2y 5m to grant Granted Mar 24, 2026

18/137,884

Patent 12567271

PICTURE RECOGNITION METHOD AND APPARATUS, ELECTRONIC DEVICE, AND MEDIUM

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

53%

Grant Probability

84%

With Interview (+30.3%)

3y 2m

Median Time to Grant

High

PTA Risk

Based on 590 resolved cases by this examiner. Grant probability derived from career allow rate.

PROCESSING FRAMEWORK FOR TEMPORAL-CONSISTENT FACE MANIPULATION IN VIDEOS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email