Last updated: April 19, 2026

Application No. 18/642,219

VISUAL DUBBING OF AN AUDIOVISUAL SEQUENCE

Final Rejection §103

Filed

Apr 22, 2024

Examiner

HOANG, PHI

Art Unit

2619

Tech Center

2600 — Communications

Assignee

ETH ZÜRICH

OA Round

2 (Final)

Interview Optional

— +17.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 928 resolved cases, 2023–2026

Examiner Intelligence

HOANG, PHI View full profile →

Grants 82% — above average

Career Allow Rate

756 granted / 928 resolved

+19.5% vs TC avg

Strong +17% interview lift

Without

With

+17.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

25 currently pending

Career history

953

Total Applications

across all art units

Statute-Specific Performance

§101

10.5%

-29.5% vs TC avg

§103

53.0%

+13.0% vs TC avg

§102

11.9%

-28.1% vs TC avg

§112

13.4%

-26.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 928 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Arguments
Applicant's arguments filed 10 December 2025 have been fully considered but they are not persuasive.
With regard to claim 1 and similar claims in substance, Applicant states, “Neither the Pan reference nor the Hinz reference include any teachings, suggestions, or motivations for integrating the latent code facial representations as taught by Hinz with the 3D geometry facial representations disclosed by Pan. A 3D geometry of a face and a latent encoding of a face are incompatible representations, and the retargeting/alignment technique as disclosed by Pan would not be operable to act on a latent encoding of a face, because the retargeting/alignment technique of Pan performs per-vertex positional adjustments on a 3D geometry, and the latent encodings of Hinz do not include vertex information associated with a face. Pan also generates the texture for a 3D geometry by estimating colors of the face in the video frame, such as skin color, eye color, lip color, eye shape, facial hair, etc. at each vertex of the 3D geometry generated for the face. See Pan at 1 [0047]. Further, Pan indicates the color of light and shadows on the face as colors associated with each vertex of the 3D geometry generated for the face. See Id. These per-vertex operations are not compatible with a latent encoding of an image that does not include vertex information, as taught by Hinz. Applicant contends that a person having ordinary skill in the art, having viewed both Pan and Hinz, would be neither motivated to combine the latent encoding of Hinz with the 3D geometries of Pan, nor informed as to how to integrate the latent encodings of Hinz into the various per-vertex operations disclosed by Pan.” Applicant also makes additional arguments related to an improper combination of Pan and Hinz.
	However, the 3D geometry facial representations of Pan are not relied upon in forming the combination with Hinz. Claim 1 of the application for example comprises four key limitations. The first two limitations describe “identifying, based on an actor frame included in the audiovisual sequence, one or more regions included in the actor frame; identifying, based on a dubber frame included in a visual recording of a dubber performance, one or more regions included in the dubber frame.” In essence these two limitations simply identifies at least one region in a first type of frame and identifies at least one region in a second type of frame. Pan (Paragraph 0024) discloses “In some embodiments, a dubbing application performs three-dimensional (3D) tracking of (1) the face of an actor within video frames of a first media content item in order to generate 3D geometry representing the face of the actor in each video frame of the first media content item, and (2) the face of a dubber within video frames of a second media content item in order to generate 3D geometry representing the face of the dubber in each video frame of the second media content item.” Pan therefore identifies at least one region in a first type of frame by tracking a region with the face of an actor within video frames of a first media content item and identifies at least one region in a second type of frame by tracking a region with a face of a dubber within video frames of a second media content item. While Pan describes 3D tracking of each of the respective faces, the claim does not limit how the tracking or identifying is performed. Therefore, Pan discloses the claimed identifying functions of the first two limitations by tracking faces in frames of different content.	The last two limitations of claim 1 describe “generating a plurality of latent vectors based on at least one identified region included in the actor frame and at least one identified region included in the dubber frame; and generating an output image based on the plurality of latent vectors.” Hinz (Paragraphs 0023 and 0054-0056 and figure 3) discloses determining latent vectors of two different images, a source digital image and a target image, for combining into a combined digital image. As discussed above, Pan discloses two different images, at least one video frame of a first media content item and at least one video frame of a second media content item, each with tracked face regions in the frames. Hinz’s determination of latent vectors for two different images for combination into a third image could have been applied to Pan’s video frames of a first media content item and a second media content item having tracked face regions to yield a predictable result of determination of latent vectors for a video frame of first media content item and a video frame of a second media content, each having tracked face regions, for combination into a combined image. Therefore, the combination of Pan and Hinz discloses all the limitations of claim 1 and similar claims in substance.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 6-9, 11, 15-17, 19, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pan et al. (US 2025/0209759 A1) in view of Hinz et al. (US 2023/0342893 A1).
Regarding claim 1, Pan discloses a computer-implemented method for performing visual dubbing of an audiovisual sequence, the computer-implemented method comprising: identifying, based on an actor frame included in the audiovisual sequence, one or more regions included in the actor frame; (Paragraph 0024, tracking a face of an actor in a frame of a first media content item)	identifying, based on a dubber frame included in a visual recording of a dubber performance, one or more regions included in the dubber frame (Paragraph 0024, tracking a face of a dubber in a frame of a second media content item). 	Pan does not clearly disclose generating a plurality of latent vectors based on at least one identified region included in the actor frame and at least one identified region included in the dubber frame; and generating an output image based on the plurality of latent vectors.	Hinz discloses determining latent vectors for a source digital image and a target digital image (Paragraph 0023) and using the latent vectors to generate a combined image (Figure 3 and paragraphs 0054-0056).	Hinz’s technique for determining latent vectors for two images to generate a combined image would have been recognized by one of ordinary skill in the art to be applicable to the frames having face regions of an actor and a dubber of Pan and the results would have been predictable in determining latent vectors for frames having faces of an actor and a dubber to generate a combined image. Therefore, the claimed subject matter would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention.
Regarding claim 2, Pan discloses wherein the one or more regions included in the actor frame include an actor right eye region, an actor left eye region, and an actor mouth region, and the one or more regions included in the dubber frame include a dubber mouth region (Paragraph 0039, landmarks on the face that can be determined include eye and mouth landmarks).
Regarding claim 3, Pan discloses wherein the one or more regions included in the actor frame further include an actor rest of frame region that includes one or more portions of the actor frame that are not included in any of the actor right eye region, the actor left eye region, or the actor mouth region (Paragraph 0039, background region not part of the faces).
Regarding claim 6, Hinz discloses concatenating the plurality of latent vectors into a combined latent vector (Paragraph 0055, combined latent vector).
Regarding claim 7, Hinz discloses generating, via a decoder in a machine learning model and based on the combined latent vector, a decoded image including a modified actor mouth; (Paragraph 0021 and figure 2, combined digital image generated by a decoder with swapped mouths)	and modifying one or more of lighting, contrast, or smoothing associated with the modified actor mouth (Paragraph 0099, global contrast factor for transferring lighting characteristics).
Regarding claim 8, Pan discloses wherein identifying the one or more regions included in the actor frame further comprises identifying a set of two-dimensional (2D) coordinates within the actor frame associated with facial landmarks included in the actor frame (Paragraph 0039, facial landmarks located at the corners of the eyes, at the corners of the mouth, at the ends of the eyeballs, etc.).
Regarding claim 9, Pan discloses wherein the facial landmarks include one or more of an eye, a nose, a mouth, an eyebrow, or a facial contour (Paragraph 0039, eyes and mouth).
Regarding claim 11, Hinz discloses wherein generating the plurality of latent vectors is performed by a plurality of encoders included in a machine learning model (Paragraph 0052, a neural network encoder projects image segments into latent vectors, such as StyleGAN). 
Regarding claim 15, similar reasoning as discussed in claim 1 is applied.
Regarding claim 16, similar reasoning as discussed in claim 2 is applied.
Regarding claim 17, similar reasoning as discussed in claim 3 is applied.
Regarding claim 19, similar reasoning as discussed in claim 8 is applied.
Regarding claim 20, similar reasoning as discussed in claim 9 is applied.
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Pan et al. (US 2025/0209759 A1) in view of Hinz et al. (US 2023/0342893 A1) and further in view of Oktay et al. (US 2025/0173613 A1).
Regarding claim 5, Pan in view of Hinz discloses all limitations as discussed in claim 1.	Pan in view of Hinz does not clearly disclose wherein each latent vector included in the plurality of latent vectors has an associated length, and the lengths associated with each of the plurality of latent vectors are equal.	Oktay discloses latent vectors can be the same as one another (Paragraph 0056).	Oktay’s latent vectors that can be the same as one another would have been recognized by one of ordinary skill in the art to be applicable to the latent vectors determine for faces in frames of Pan in view of Hinz and the results would have been predictable in the determining of latent vectors having the same length for faces in frames. Therefore, the claimed subject matter would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention.


Allowable Subject Matter
Claims 4, 10, 12-14, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding claim 4, the prior art does not clearly disclose the computer-implemented method of claim 3, wherein each of the plurality of latent vectors is generated based on a different one of the actor right eye region, the actor left eye region, the actor rest of frame region, and the dubber mouth region.
Regarding claim 10, the prior art does not clearly disclose the computer-implemented method of claim 1, wherein the plurality of latent vectors is a first plurality of latent vectors, further comprising: identifying, based on the output image, one or more regions included in the output image; generating a second plurality of latent vectors based on at least one identified region included in the actor frame and at least one of the one or more regions included in the output image; and generating a double-swapped output image based on the second plurality of latent vectors.
Regarding claim 12, the prior art does not clearly disclose the computer-implemented method of claim 1, wherein generating the plurality of latent vectors further comprises: generating an original dubber mouth latent vector based on an identified original dubber mouth region associated with the dubber frame included in the visual recording of the dubber performance; generating a modified dubber frame based on the identified original dubber mouth region and an identified original actor mouth region associated with the actor frame included in the audiovisual sequence; generating a modified dubber mouth latent vector based on the modified dubber frame; and calculating a latent vector difference based on the original dubber mouth latent vector and the modified dubber mouth latent vector.
Regarding claim 18, similar reasoning as discussed in claim 4 is applied.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Prasad et al. (US 2025/0285399 A1) discloses identifying latent vectors of face regions.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHI HOANG whose telephone number is (571)270-3417. The examiner can normally be reached Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JASON CHAN can be reached at (571)272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PHI HOANG/Primary Examiner, Art Unit 2619

Read full office action

Prosecution Timeline

Apr 22, 2024

Application Filed

Sep 29, 2025

Non-Final Rejection — §103

Dec 10, 2025

Response Filed

Mar 20, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/001,682

Patent 12602889

METHOD AND SYSTEM OF RENDERING A 3D IMAGE FOR AUTOMATED FACIAL MORPHING

2y 5m to grant Granted Apr 14, 2026

17/862,818

Patent 12592010

NEURAL NETWORK-BASED IMAGE LIGHTING

2y 5m to grant Granted Mar 31, 2026

18/233,814

Patent 12579624

DISPLAY DEVICE AND OPERATING DRIVING THEREOF

2y 5m to grant Granted Mar 17, 2026

18/338,093

Patent 12561885

METHOD, SYSTEM, AND MEDIUM FOR ARTIFICIAL INTELLIGENCE-BASED COMPLETION OF A 3D IMAGE DURING ELECTRONIC COMMUNICATION

2y 5m to grant Granted Feb 24, 2026

18/504,821

Patent 12561866

CONTENT-SPECIFIC-PRESET EDITS FOR DIGITAL IMAGES

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

82%

Grant Probability

98%

With Interview (+17.0%)

2y 8m

Median Time to Grant

Moderate

PTA Risk

Based on 928 resolved cases by this examiner. Grant probability derived from career allow rate.