DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments filed on November 19, 2025 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Response to Amendment
The amendment to the claims received on November 19, 2025 has been entered.
The amendment of claims 1, 10 and 19 is acknowledged.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 6-10 and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Assouline’602 (US 2023/0196602), and further in view of Zukerman’322 (US 2018/0190322) and Jeong’262 (US 2022/0207262).
With respect to claim 19, Assouline’602 teaches a system (Fig.5) comprising:
a memory component [the system shown in Fig.5 is inherent disclosed with a memory to a program to be executed by the one or more processors in to order to allow the said system to provide the desired functions]; and
a processing device coupled to the memory component, the processing device to execute instructions stored on the memory component which cause the system to perform operations [the system shown in Fig.5 is inherent disclosed with one or more processor since a working system needs to include at least one or more processor to perform its desired functions] comprising:
receiving an input digital video including a plurality of frames [regarding to the received video 710 and the received video 720 shown in Fig.7];
providing the input digital video to the video manipulation network (Fig.7);
Assouline’602 does not teach after receiving the input digital video, receiving, from an appearance input, modifications to a target appearance of a subject of the input digital video, the modifications including changes to one or more of an expression and facial features of the subject of the input digital video; generating a plurality of target appearance frames corresponding to the input digital video based on the appearance input defining a target appearance of a subject of the input digital video; training a video manipulation network, using the plurality of target appearance frames and the input digital video, to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance of the subject in the plurality of target appearance frames; and generating, by the video manipulation network, a temporally consistent output digital video wherein the subject of the temporally consistent output digital video has its appearance modified to match the target appearance.
Zukerman’322 teaches after receiving the input digital video, receiving, from an appearance input, modifications to a target appearance of a subject of the input digital video, the modifications including changes to one or more of an expression and facial features of the subject of the input digital video (Fig.4A and paragraphs 33 and 34);
generating a plurality of target appearance frames corresponding to the input digital video based on the appearance input defining a target appearance of a subject of the input digital video (paragraph 64).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Assouline’602 according to the teaches of Zukerman’322 to provide a graphical user interface to enable a user to select a desired sticker to modify the appearance of the subjects in the frames of the received video to create a new edited video because this will allow the appearance of the subject in the frames of the received video to be modified more effectively.
The combination of Assouline’602 and Zukerman’322 does not teach training a video manipulation network, using the plurality of target appearance frames and the input digital video, to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance of the subject in the plurality of target appearance frames; and generating, by the video manipulation network, a temporally consistent output digital video wherein the subject of the temporally consistent output digital video has its appearance modified to match the target appearance.
Since Assouline’602 has suggested a machine learning model (a video manipulation network) is being trained to generate new videos having the modified contents according to the training data including the videos (Fig. 5 and paragraphs 105-114) and Zukerman’322 has suggested that a user selects a desired sticker to change the appearance of the subjects in an inputted video via a graphical user interface and to generate a modified video with the frames having the modified appearance of the subjects (Fig.4A and paragraphs 33, 34 and 64), therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to generate a modified video with the frames having the modified appearance of the subjects which is associated with an received video and then to use the modified video and the received video to train a machine learning model (a video manipulation network) (training a video manipulation network, using the plurality of target appearance frames and the input digital video, to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance of the subject in the plurality of target appearance frames) because this will allow the machine learning model (a video manipulation network) to generate a modified video with the modified appearance of the subjects more effectively.
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 to generate a modified video with the frames having the modified appearance of the subjects which is associated with an received video and then to use the modified video and the received video to train a machine learning model (a video manipulation network) (training a video manipulation network, using the plurality of target appearance frames and the input digital video, to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance of the subject in the plurality of target appearance frames) because this will allow the machine learning model (a video manipulation network) to generate a modified video with the modified appearance of the subjects more effectively.
The combination of Assouline’602 and Zukerman’322 does not teach generating, by the video manipulation network, a temporally consistent output digital video wherein the subject of the temporally consistent output digital video has its appearance modified to match the target appearance.
Jeong’262 teaches generating, by the video manipulation network, a temporally consistent output digital video wherein the subject of the temporally consistent output digital video has its appearance modified to match the target appearance [as shown in Fig.1A, a synthesized video data having its appearance modified to match a target appearance is being generated from an original video data. A video manipulation network is considered being disclosed to generate the synthesized video data having its appearance modified to match a target appearance from an original video data. The synthesized video data considered either generated temporally or permanently since the synthesized video data either being stored to be used in the further or being deleted after it is being use].
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to generate a synthesized video data having its appearance modified to match a target appearance is being generated from an original video data because this will allow the desired video data to be generated from an original video data more effectively.
With respect to claim 1, it is a method claim that claims how the system of claim 1 to manipulate the content in a video to a new appearance. Claim 1 is obvious in view of Assouline’602, Zukerman’322 and Jeong’262 because the claimed combination operates at the same manner as described in the rejected claim 19. In addition, the reference has disclosed a system to manipulate the content in a video to a new appearance, the process (method) to manipulate the content in a video to a new appearance is inherent disclosed to be performed by a processor in the system when the system performs the operation to manipulate the content in a video to a new appearance.
With respect to claim 6, which further limits claim 1, the combination of Assouline’602 and Zukerman’322 does not teach wherein training a video prediction network to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance, further comprises: training a plurality of video manipulation networks, wherein the plurality of video manipulation networks include an encoder network, a first decoder network, and a second decoder network.
Jeong’262 teaches wherein training a video prediction network to generate a digital video wherein a subject of the digital video has its appearance modified to match the target appearance, further comprises: training a plurality of video manipulation networks, wherein the plurality of video manipulation networks includes an encoder network, a first decoder network, and a second decoder network (Fig.4 and Fig.8)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to train a prediction network according to the generated synthesized video data having its appearance modified to match a target appearance because this will allow the desired video data to be generated from another video data more effectively.
With respect to claim 7, which further limits claim 6, the combination of Assouline’602 and Zukerman’322 does not teach wherein training a plurality of video manipulation networks, wherein the plurality of video manipulation networks include an encoder network, a first decoder network, and a second decoder network, further comprises: providing the plurality of frames and the plurality of target appearance frames to the encoder network; generating, by the encoder network, a representation of the plurality of frames and a representation of the plurality of target appearance frames; reconstructing, by the first decoder network, a plurality of reconstructed frames from the representation of the plurality of frames; reconstructing, by the second decoder network, a plurality of reconstructed target appearance frames from the representation of the plurality of target appearance frames; and training the first decoder network, second decoder network, and encoder network, by comparing the plurality of reconstructed frames to the plurality of frames and the plurality of reconstructed target appearance frames to the plurality of target appearance frames using a loss function.
Jeong’262 teaches wherein training a plurality of video manipulation networks, wherein the plurality of video manipulation networks includes an encoder network, a first decoder network, and a second decoder network (paragraph 9), further comprises:
providing the plurality of frames and the plurality of target appearance frames to the encoder network (paragraph 9);
generating, by the encoder network, a representation of the plurality of frames and a representation of the plurality of target appearance frames (paragraph 9);
reconstructing, by the first decoder network, a plurality of reconstructed frames from the representation of the plurality of frames (paragraph 9);
reconstructing, by the second decoder network, a plurality of reconstructed target appearance frames from the representation of the plurality of target appearance frames (paragraph 9); and
training the first decoder network, second decoder network, and encoder network, by comparing the plurality of reconstructed frames to the plurality of frames and the plurality of reconstructed target appearance frames to the plurality of target appearance frames using a loss function (paragraph 9).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to train a prediction network according to the generated synthesized video data having its appearance modified to match a target appearance because this will allow the desired video data to be generated from another video data more effectively.
With respect to claim 8, which further limits claim 6, the combination of Assouline’602 and Zukerman’322 does not teach wherein the video prediction network comprises the encoder network and the second decoder network.
Jeong’262 teaches teach wherein the video prediction network comprises the encoder network and the second decoder network (paragraph 9).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to train a prediction network according to the generated synthesized video data having its appearance modified to match a target appearance because this will allow the desired video data to be generated from another video data more effectively.
With respect to claim 9, which further limits claim 1, the combination of Assouline’602 and Zukerman’322 does not teach wherein a subject of the input digital video includes a representation of a person's face and wherein the target appearance includes a change to an expression or appearance of the person's face.
Jeong’262 teaches wherein a subject of the input digital video includes a representation of a person's face and wherein the target appearance includes a change to an expression or appearance of the person's face (Fig.1A).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602 and Zukerman’322 according to the teaching of Jeong’262 to train a prediction network according to the generated synthesized video data having appearance of human because this will allow the desired video data associated with human appearance to be generated from another video data more effectively.
With respect to claims 10, and 15-18, they are being analyzed and rejected for the same reason set forth in the rejection of claims 1 and 6-9.
Claims 2, 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Assouline’602 (US 2023/0196602), Zukerman’322 (US 2018/0190322), Jeong’262 (US 2022/0207262) and further in view of Gopalkrishna’108 (US 2023/0005108).
With respect to claim 2, which further limits claim 1, the combination of Assouline’602, Zukerman’322 and Jeong’262 does not teach wherein the plurality of frames represents a subset of the input digital video (Fig.1, item 102).
Gopalkrishna’108 teaches wherein the plurality of frames represents a subset of the input digital video (Fig.1, item 102).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322 and Jeong’262 according to the teaching of Gopalkrishna’108 to train a machine learning model to manipulate an object in a video to a desired target appearance so that a new video is being generated by replacing the said object with the said desired target appearance of the object because this will allow the desired video data to be generated from an original video data more effectively.
With respect to claim 11, it is being analyzed and rejected for the same reason set forth in the rejection of claim 2.
With respect to claim 20, which further limits claim 19, the combination of Assouline’602, Zukerman’322 and Jeong’262 does not teach wherein the plurality of frames represents a subset of the input digital video (Fig.1, item 102).
Gopalkrishna’108 teaches wherein the plurality of frames represents a subset of the input digital video (Fig.1, item 102).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322 and Jeong’262 according to the teaching of Gopalkrishna’108 to train a machine learning model to manipulate an object in a video to a desired target appearance so that a new video is being generated by replacing the said object with the said desired target appearance of the object because this will allow the desired video data to be generated from an original video data more effectively.
Claims 3, 4, 12 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Assouline’602 (US 2023/0196602), Zukerman’322 (US 2018/0190322), Jeong’262 (US 2022/0207262) and further in view of Wu’299 (US 2024/0112299).
With respect to claim 3, which further limits claim 1, the combination of Assouline’602, Zukerman’322 and Jeong’262 does not teach wherein generating a plurality of target appearance frames from the plurality of frames, further comprises: processing the plurality of frames by a subject cropping network to generate a plurality of input cropped images.
Wu’299 teaches wherein generating a plurality of target appearance frames from the plurality of frames, further comprises: processing the plurality of frames by a subject cropping network to generate a plurality of input cropped images (Fig.2).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322 and Jeong’262 according to the teaching of Wu’299 to determine the target video frames in the original video and then to perform the desired cropping operation on the desired area of the target video frames because this will allow the contents in an original video data to be processed more effectively.
With respect to claim 4, which further limits claim 3, the combination of Assouline’602, Zukerman’322, Jeong’262 and Wu’299 does not teach providing the plurality of input cropped images to a subject manipulation network; identifying a plurality of latent representations corresponding to the target appearance in the plurality of input cropped images; and generating a plurality of target appearance cropped images using the plurality of latent representations.
Since Assouline’602has suggested that the image frames in a video is being extracted and then the interested contents in the extracted image frame are being replaced with desired contents (Fig. 5 and paragraphs 105-114) and Wu’299 has suggested that a cropping box is being used to cropped the frame picture generated from an original video (Fig. 2), therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to recognize to extract the interested image frames from an inputted video and then to use the cropping box to crop the desired contents from the extract the interested image frames to be replaced with desired contents (providing the plurality of input cropped images to a subject manipulation network; identifying a plurality of latent representations corresponding to the target appearance in the plurality of input cropped images; and generating a plurality of target appearance cropped images using the plurality of latent representations) because this will allow the target image frame extracted from a video to be manipulated more effectively.
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322, Jeong’262 and Wu’299 to extract the interested image frames from an inputted video and then to use the cropping box to crop the desired contents from the extract the interested image frames to be replaced with desired contents (providing the plurality of input cropped images to a subject manipulation network; identifying a plurality of latent representations corresponding to the target appearance in the plurality of input cropped images; and generating a plurality of target appearance cropped images using the plurality of latent representations) because this will allow the target image frame extracted from a video to be manipulated more effectively.
With respect to claims 12-13, they are being analyzed and rejected for the same reason set forth in the rejection of claims 3-4.
Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Assouline’602 (US 2023/0196602), Zukerman’322 (US 2018/0190322), Jeong’262 (2022/0207262), Wu’299 (US 2024/0112299) and further in view of Gopalkrishna’108 (US 2023/0005108).
With respect to claim 5, which further limits claim 4, the combination of Assouline’602, Zukerman’322, Jeong’262 and Wu’299 does not teach blending the plurality of target appearance cropped images and the plurality of frames using a subject blending network to generate the plurality of target appearance frames [when the original text in the extracted image frames are being replaced with desired text, the original text is considered being cropped
Gopalkrishna’108 teaches blending the plurality of target appearance cropped images and the plurality of frames using a subject blending network to generate the plurality of target appearance frames [when the original text in the extracted image frames are being replaced with desired text, the original text is considered being cropped (paragraph 39)].
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Assouline’602, Zukerman’322, Jeong’262 and Wu’299 according to the teaching of Gopalkrishna’108 to train a machine learning model to manipulate an object in a video to a desired target appearance so that a new video is being generated by replacing the said object with the said desired target appearance of the object because this will allow the desired video data to be generated from an original video data more effectively.
With respect to claim 14, it is being analyzed and rejected for the same reason set forth in the rejection of claim 5.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUO LONG CHEN whose telephone number is (571)270-3759. The examiner can normally be reached on M-F 9am - 5pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tieu, Benny can be reached on (571) 272-7490. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HUO LONG CHEN/Primary Examiner, Art Unit 2682