Prosecution Insights
Last updated: April 18, 2026
Application No. 17/382,027

SYNTHESIZING VIDEO FROM AUDIO USING ONE OR MORE NEURAL NETWORKS

Final Rejection §103
Filed
Jul 21, 2021
Examiner
RICHER, AARON M
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
6 (Final)
51%
Grant Probability
Moderate
7-8
OA Rounds
4y 0m
To Grant
70%
With Interview

Examiner Intelligence

Grants 51% of resolved cases
51%
Career Allow Rate
236 granted / 465 resolved
-11.2% vs TC avg
Strong +20% interview lift
Without
With
+19.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
28 currently pending
Career history
493
Total Applications
across all art units

Statute-Specific Performance

§101
9.4%
-30.6% vs TC avg
§103
54.7%
+14.7% vs TC avg
§102
13.1%
-26.9% vs TC avg
§112
19.9%
-20.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 465 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant's arguments filed 26 November 2025 have been fully considered but they are not persuasive. Applicant’s arguments with respect to the prior art have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claims 1, 6, 7, 12, 13, 18, 19, 24, 25, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Song (WO 2021052224 A1, herein represented by U.S. Publication 2021/0357625) in view of Park (U.S. Publication 2021/0357088). As to claim 1, Song discloses a processor (fig. 8, element 82; p. 10, sections 0174-0178; a processor is configured to execute instructions from a medium, such as a memory, which also stores data for performing a programmed method), comprising: one or more circuits to use a one or more neural networks to generate a first plurality of video frames of a person uttering speech based on speech information corresponding to the speech and based on an image corresponding to the user (fig. 5; p. 7, sections 0101-0112; p. 7, section 0120; a first neural network is trained using an audio clip with voice/speech information, to generate face key point information corresponding to input video data of a person’s face and speech). Song does not expressly disclose, but Park discloses that the generation is without corresponding video frames and modifying the generated first plurality of video frames to produce a second plurality of video frames of a representation of a user uttering the speech based on a single image corresponding to the user (fig. 3; fig. 6-7; p. 4, sections 0065-0070; p. 5, sections 0076-0077; p. 6, section 0088; p. 7, sections 0102-0108; p. 8, sections 0115-0116; generation of a character animation based on driving information including mouth movements and words for a character to speak is performed; at this stage, the character is synthetic like the characters in the figures and not based on input video; later, a user inputs “a facial image” and movements, which would include mouth motions associated with speaking, are mapped such that the input facial image replaces one of the synthetic character images in the animation). The motivation for this is to allow more personal and realistic content so that a user can experience the content more dynamically (p. 1, section 0005). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Song to generate without corresponding video frames and modify the generated first plurality of video frames to produce a second plurality of video frames of a representation of a user uttering the speech based on a single image corresponding to the user in order to allow more personal and realistic content so that a user can experience the content more dynamically as taught by Park. As to claim 6, Song discloses wherein the plurality of video frames is representative of an amount of emotion or pattern of speech determined from the speech information (fig. 3; p. 3, sections 0046-0052; p. 4, section 0060; p. 9, section 0146-0147; the voice/speech information is analyzed to determine a facial expression representing a particular emotional state; the inpainted image from the second neural network uses the expression information and video is generated from the inpainted image). As to claim 7, see the rejection to claim 1. As to claim 12, see the rejection to claim 6. As to claim 13, see the rejection to claim 1. As to claim 18, see the rejection to claim 6. As to claim 19, see the rejection to claim 1. As to claim 24, see the rejection to claim 6. As to claim 25, see the rejection to claim 1. As to claim 30, see the rejection to claim 6. Claims 2-5, 8-11, 14-17, 20-23, and 26-29 are rejected under 35 U.S.C. 103 as being unpatentable over Song in view of Park and further in view of Liao (U.S. Publication 2021/0390748). As to claim 2, Song does not disclose, but Liao does disclose wherein the plurality of frames includes a representation of one or more three-dimensional character models uttering a corresponding portion of the speech information (p. 1, section 0006; p. 6, sections 0065-0067; p. 6-7, section 0070; p. 7, section 0072; p. 7, section 0079-p. 8, section 0083; using a neural network, a 3D skeleton model is used to create poses for video frames corresponding to spoken voice/speech information). The motivation for this is to produce output video with synchronized, realistic, and expressive body dynamics at low cost (p. 1, sections 0003-0004). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Song and Park to have the plurality of frames include a representation of one or more three-dimensional character models uttering a corresponding portion of the speech information in order to produce output video with synchronized, realistic, and expressive body dynamics at low cost as taught by Liao. As to claim 3, Song does not disclose, but Liao does disclose wherein the plurality of frames includes a representation of the one or more users uttering the corresponding portion of the speech information as represented by the one or more three-dimensional character models in the plurality of video frames (p. 1, sections 0006-0007; p. 5, section 0059; p. 6, sections 0065-0067; p. 6-7, section 0070; p. 7, section 0072; p. 7, section 0079-p. 8, section 0083; the 3D body model first video information, input video of a person, and speech-to-text information are used to generate video frames of a speaking person). Motivation for the combination is given in the rejection to claim 2. As to claim 4, Song does not disclose, but Liao does disclose wherein the one or more neural networks is trained to correlate key points between the one or more three-dimensional character models represented in the plurality of video frames and at least one of shape information or pose information for the user (p. 1, section 0006; p. 5, section 0060; p. 6, section 0062; p. 6, section 0064-0067; correspondence between points in the 3D model and 2D point positions of a person speaking in input video that correspond to projected pose information is found using the neural network). Motivation for the combination is given in the rejection to claim 2. As to claim 5, Song does not disclose, but Liao does disclose wherein the one or more circuits are further to use the one or more neural networks to synthesize the speech information as voice information from text (fig. 3, element 325; p. 4-5, section 0053; p. 5, section 0056; p. 5, section 0060; a neural network is shown that trains speech video synthesis and mapping from text input). Motivation for the combination is given in the rejection to claim 2. As to claim 8, see the rejection to claim 2. As to claim 9, see the rejection to claim 3. As to claim 10, see the rejection to claim 4. As to claim 11, see the rejection to claim 5. As to claim 14, see the rejection to claim 2. As to claim 15, see the rejection to claim 3. As to claim 16, see the rejection to claim 4. As to claim 17, see the rejection to claim 5. As to claim 20, see the rejection to claim 2. As to claim 21, see the rejection to claim 3. As to claim 22, see the rejection to claim 4. As to claim 23, see the rejection to claim 5. As to claim 26, see the rejection to claim 2. As to claim 27, see the rejection to claim 3. As to claim 28, see the rejection to claim 4. As to claim 29, see the rejection to claim 5. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to AARON M RICHER whose telephone number is (571)272-7790. The examiner can normally be reached 9AM-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /AARON M RICHER/Primary Examiner, Art Unit 2617
Read full office action

Prosecution Timeline

Jul 21, 2021
Application Filed
Jul 30, 2022
Non-Final Rejection — §103
Feb 06, 2023
Response Filed
Feb 09, 2023
Final Rejection — §103
Mar 08, 2023
Interview Requested
Apr 25, 2023
Examiner Interview Summary
Apr 25, 2023
Applicant Interview (Telephonic)
Jul 06, 2023
Request for Continued Examination
Jul 11, 2023
Response after Non-Final Action
Sep 02, 2023
Non-Final Rejection — §103
Mar 08, 2024
Response Filed
Mar 22, 2024
Final Rejection — §103
Sep 30, 2024
Notice of Allowance
Dec 18, 2024
Response after Non-Final Action
Jan 05, 2025
Response after Non-Final Action
Mar 13, 2025
Response after Non-Final Action
May 20, 2025
Request for Continued Examination
May 21, 2025
Response after Non-Final Action
Aug 24, 2025
Non-Final Rejection — §103
Nov 26, 2025
Response Filed
Jan 07, 2026
Final Rejection — §103
Apr 09, 2026
Request for Continued Examination
Apr 16, 2026
Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12586151
Frame Rate Extrapolation
2y 5m to grant Granted Mar 24, 2026
Patent 12579600
SEAMLESS VIDEO IN HETEROGENEOUS CORE INFORMATION HANDLING SYSTEM
2y 5m to grant Granted Mar 17, 2026
Patent 12571669
DETECTING AND GENERATING A RENDERING OF FILL LEVEL AND DISTRIBUTION OF MATERIAL IN RECEIVING VEHICLE(S)
2y 5m to grant Granted Mar 10, 2026
Patent 12555305
Systems And Methods For Generating And/Or Using 3-Dimensional Information With Camera Arrays
2y 5m to grant Granted Feb 17, 2026
Patent 12548233
3D TEXTURING VIA A RENDERING LOSS
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

7-8
Expected OA Rounds
51%
Grant Probability
70%
With Interview (+19.5%)
4y 0m
Median Time to Grant
High
PTA Risk
Based on 465 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month