Last updated: April 18, 2026

Application No. 17/382,027

SYNTHESIZING VIDEO FROM AUDIO USING ONE OR MORE NEURAL NETWORKS

Final Rejection §103

Filed

Jul 21, 2021

Examiner

RICHER, AARON M

Art Unit

2617

Tech Center

2600 — Communications

Assignee

Nvidia Corporation

OA Round

6 (Final)

Interview Optional

— +19.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 465 resolved cases, 2023–2026

Examiner Intelligence

RICHER, AARON M View full profile →

Grants 51% of resolved cases

Career Allow Rate

236 granted / 465 resolved

-11.2% vs TC avg

Strong +20% interview lift

Without

With

+19.5%

Interview Lift

resolved cases with interview

Typical timeline

4y 0m

Avg Prosecution

28 currently pending

Career history

493

Total Applications

across all art units

Statute-Specific Performance

§101

9.4%

-30.6% vs TC avg

§103

54.7%

+14.7% vs TC avg

§102

13.1%

-26.9% vs TC avg

§112

19.9%

-20.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 465 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 26 November 2025 have been fully considered but they are not persuasive. 
Applicant’s arguments with respect to the prior art have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 6, 7, 12, 13, 18, 19, 24, 25, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Song (WO 2021052224 A1, herein represented by U.S. Publication 2021/0357625) in view of Park (U.S. Publication 2021/0357088).

As to claim 1, Song discloses a processor (fig. 8, element 82; p. 10, sections 0174-0178; a processor is configured to execute instructions from a medium, such as a memory, which also stores data for performing a programmed method), comprising:
one or more circuits to use a one or more neural networks to generate a first plurality of video frames of a person uttering speech based on speech information corresponding to the speech and based on an image corresponding to the user (fig. 5; p. 7, sections 0101-0112; p. 7, section 0120; a first neural network is trained using an audio clip with voice/speech information, to generate face key point information corresponding to input video data of a person’s face and speech).
Song does not expressly disclose, but Park discloses that the generation is without corresponding video frames and modifying the generated first plurality of video frames to produce a second plurality of video frames of a representation of a user uttering the speech based on a single image corresponding to the user (fig. 3; fig. 6-7; p. 4, sections 0065-0070; p. 5, sections 0076-0077; p. 6, section 0088; p. 7, sections 0102-0108; p. 8, sections 0115-0116; generation of a character animation based on driving information including mouth movements and words for a character to speak is performed; at this stage, the character is synthetic like the characters in the figures and not based on input video; later, a user inputs “a facial image” and movements, which would include mouth motions associated with speaking, are mapped such that the input facial image replaces one of the synthetic character images in the animation). The motivation for this is to allow more personal and realistic content so that a user can experience the content more dynamically (p. 1, section 0005). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Song to generate without corresponding video frames and modify the generated first plurality of video frames to produce a second plurality of video frames of a representation of a user uttering the speech based on a single image corresponding to the user in order to allow more personal and realistic content so that a user can experience the content more dynamically as taught by Park. 

As to claim 6, Song discloses wherein the plurality of video frames is representative of an amount of emotion or pattern of speech determined from the speech information (fig. 3; p. 3, sections 0046-0052; p. 4, section 0060; p. 9, section 0146-0147; the voice/speech information is analyzed to determine a facial expression representing a particular emotional state; the inpainted image from the second neural network uses the expression information and video is generated from the inpainted image).

As to claim 7, see the rejection to claim 1.

As to claim 12, see the rejection to claim 6.

As to claim 13, see the rejection to claim 1.

As to claim 18, see the rejection to claim 6.

As to claim 19, see the rejection to claim 1.

As to claim 24, see the rejection to claim 6.

As to claim 25, see the rejection to claim 1.

As to claim 30, see the rejection to claim 6.

Claims 2-5, 8-11, 14-17, 20-23, and 26-29 are rejected under 35 U.S.C. 103 as being unpatentable over Song in view of Park and further in view of Liao (U.S. Publication 2021/0390748).

As to claim 2, Song does not disclose, but Liao does disclose wherein the plurality of frames includes a representation of one or more three-dimensional character models uttering a corresponding portion of the speech information (p. 1, section 0006; p. 6, sections 0065-0067; p. 6-7, section 0070; p. 7, section 0072; p. 7, section 0079-p. 8, section 0083; using a neural network, a 3D skeleton model is used to create poses for video frames corresponding to spoken voice/speech information). The motivation for this is to produce output video with synchronized, realistic, and expressive body dynamics at low cost (p. 1, sections 0003-0004). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Song and Park to have the plurality of frames include a representation of one or more three-dimensional character models uttering a corresponding portion of the speech information in order to produce output video with synchronized, realistic, and expressive body dynamics at low cost as taught by Liao. 

As to claim 3, Song does not disclose, but Liao does disclose wherein the plurality of frames includes a representation of the one or more users uttering the corresponding portion of the speech information as represented by the one or more three-dimensional character models in the plurality of video frames (p. 1, sections 0006-0007; p. 5, section 0059; p. 6, sections 0065-0067; p. 6-7, section 0070; p. 7, section 0072; p. 7, section 0079-p. 8, section 0083; the 3D body model first video information, input video of a person, and speech-to-text information are used to generate video frames of a speaking person). Motivation for the combination is given in the rejection to claim 2.

As to claim 4, Song does not disclose, but Liao does disclose wherein the one or more neural networks is trained to correlate key points between the one or more three-dimensional character models represented in the plurality of video frames and at least one of shape information or pose information for the user (p. 1, section 0006; p. 5, section 0060; p. 6, section 0062; p. 6, section 0064-0067; correspondence between points in the 3D model and 2D point positions of a person speaking in input video that correspond to projected pose information is found using the neural network). Motivation for the combination is given in the rejection to claim 2.

As to claim 5, Song does not disclose, but Liao does disclose wherein the one or more circuits are further to use the one or more neural networks to synthesize the speech information as voice information from text (fig. 3, element 325; p. 4-5, section 0053; p. 5, section 0056; p. 5, section 0060; a neural network is shown that trains speech video synthesis and mapping from text input). Motivation for the combination is given in the rejection to claim 2.

As to claim 8, see the rejection to claim 2.

As to claim 9, see the rejection to claim 3.

As to claim 10, see the rejection to claim 4.

As to claim 11, see the rejection to claim 5.

As to claim 14, see the rejection to claim 2.

As to claim 15, see the rejection to claim 3.

As to claim 16, see the rejection to claim 4.

As to claim 17, see the rejection to claim 5.

As to claim 20, see the rejection to claim 2.

As to claim 21, see the rejection to claim 3.

As to claim 22, see the rejection to claim 4.

As to claim 23, see the rejection to claim 5.

As to claim 26, see the rejection to claim 2.

As to claim 27, see the rejection to claim 3.

As to claim 28, see the rejection to claim 4.

As to claim 29, see the rejection to claim 5.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AARON M RICHER whose telephone number is (571)272-7790. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AARON M RICHER/Primary Examiner, Art Unit 2617

Read full office action

Prosecution Timeline

Jul 21, 2021

Application Filed

Jul 30, 2022

Non-Final Rejection — §103

Feb 06, 2023

Response Filed

Feb 09, 2023

Final Rejection — §103

Mar 08, 2023

Interview Requested

Apr 25, 2023

Examiner Interview Summary

Apr 25, 2023

Applicant Interview (Telephonic)

Jul 06, 2023

Request for Continued Examination

Jul 11, 2023

Response after Non-Final Action

Sep 02, 2023

Non-Final Rejection — §103

Mar 08, 2024

Response Filed

Mar 22, 2024

Final Rejection — §103

Sep 30, 2024

Notice of Allowance

Dec 18, 2024

Response after Non-Final Action

Jan 05, 2025

Response after Non-Final Action

Mar 13, 2025

Response after Non-Final Action

May 20, 2025

Request for Continued Examination

May 21, 2025

Response after Non-Final Action

Aug 24, 2025

Non-Final Rejection — §103

Nov 26, 2025

Response Filed

Jan 07, 2026

Final Rejection — §103

Apr 09, 2026

Request for Continued Examination

Apr 16, 2026

Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

17/437,298

Patent 12586151

Frame Rate Extrapolation

2y 5m to grant Granted Mar 24, 2026

17/877,606

Patent 12579600

SEAMLESS VIDEO IN HETEROGENEOUS CORE INFORMATION HANDLING SYSTEM

2y 5m to grant Granted Mar 17, 2026

17/345,439

Patent 12571669

DETECTING AND GENERATING A RENDERING OF FILL LEVEL AND DISTRIBUTION OF MATERIAL IN RECEIVING VEHICLE(S)

2y 5m to grant Granted Mar 10, 2026

17/773,700

Patent 12555305

Systems And Methods For Generating And/Or Using 3-Dimensional Information With Camera Arrays

2y 5m to grant Granted Feb 17, 2026

17/166,586

Patent 12548233

3D TEXTURING VIA A RENDERING LOSS

2y 5m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

7-8

Expected OA Rounds

51%

Grant Probability

70%

With Interview (+19.5%)

4y 0m

Median Time to Grant

High

PTA Risk

Based on 465 resolved cases by this examiner. Grant probability derived from career allow rate.

SYNTHESIZING VIDEO FROM AUDIO USING ONE OR MORE NEURAL NETWORKS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email