Last updated: April 19, 2026

Application No. 18/435,873

DIGITAL MODEL GENERATION USING NEURAL NETWORKS

Final Rejection §103

Filed

Feb 07, 2024

Examiner

WELCH, DAVID T

Art Unit

2613

Tech Center

2600 — Communications

Assignee

Nvidia Corporation

OA Round

2 (Final)

Interview Optional

— +27.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 303 resolved cases, 2023–2026

Examiner Intelligence

WELCH, DAVID T View full profile →

Grants 82% — above average

Career Allow Rate

247 granted / 303 resolved

+19.5% vs TC avg

Strong +27% interview lift

Without

With

+27.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

29 currently pending

Career history

332

Total Applications

across all art units

Statute-Specific Performance

§101

11.6%

-28.4% vs TC avg

§103

47.4%

+7.4% vs TC avg

§102

20.6%

-19.4% vs TC avg

§112

12.2%

-27.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 303 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 1, 8, and 15 are objected to because of the following informalities: each of these claims recites predicting body movements based on “the conditional inputs, emotional state of the speaker, and textual input.” Although the Examiner can understand the meaning, scope, and intention of this phrase, it should be amended to clearly recite the antecedent basis for the emotional state and the textual input.  For example, this phrase should be amended to read --the conditional inputs, the emotional state of the speaker, and the textual input--. Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-10, 12-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hwang et al. (U.S. Patent Application Publication No. 2024/0203099), referred herein as Hwang, in view of Wang et al. (U.S. Patent Application Publication No. 2023/0267916), referred herein as Wang.
Regarding claim 1, Hwang teaches one or more processors, comprising circuitry to use one or more neural networks (figs 1 and 15-17, processor 180, neural network portions 530, 630, 730, and 820) to:
obtain one or more conditional inputs (figs 15-17, inputs 516/730; paragraphs 280 and 281; paragraphs 291 and 292), audio data of speech (figs 15-17, audio input 518/820; paragraphs 280 and 281; paragraph 295), and a textual input (paragraphs 188 and 190; paragraphs 280 and 281; paragraph 296), at least one of the one or more conditional inputs comprising 3D position data or depth information of one or more features of an object (paragraphs 280 and 281; paragraphs 291 and 292);
determine an expression of a speaker of the speech represented by the audio data (paragraph 220; paragraph 253);
predict based, at least in part, on the conditional inputs, the expression of the speaker, and the textual input, one or more body movements and facial expressions of an object when pronouncing the textual input (paragraphs 188 and 190; paragraphs 222 and 255; paragraphs 280 and 281; paragraph 297); and
generate, based on the one or more predicted body movements and predicted facial expressions, a 3D model of the object that depicts the object performing the one or more body movements while pronouncing the textual input and with facial expressions that correspond to the expression (figs 15-17, model generator 530; paragraphs 220 and 222; paragraphs 253 and 255; paragraphs 284 and 299).
Hwang teaches determining expressions based on the text, audio data of speech, and 3D position data, as shown above. However, Hwang does not explicitly teach determining an emotional state, which is reflected in the 3D model.

However, in a similar field of endeavor, Wang teaches a system comprising neural networks configured to obtain a textual input and audio data of speech, and predict 3D model animation based, at least in part, on the textual input and the audio data of speech (paragraphs 15, 19, and 27; paragraphs 46 and 48; paragraph 70), and further configured to predict an emotional state of a speaker in order to animate the 3D model to pronounce the textual input such that it that corresponds to the emotional state (paragraph 45; paragraphs 52 and 53; paragraph 58).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the text input and emotional state determination of Wang with the text, audio data, and 3D position data processing of Hwang because this helps ensure that the emotional context of input text is accurately reflected in the animation, thereby improving the quality of the 3D model generation while retaining the speed and efficiency of its generation (see, for example, Wang, paragraphs 15 and 54).
Regarding claim 2, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the one or more predicted body movements of the object comprises one or more motions of limbs of the object and the one or more predicted facial expression of the object comprises one or more motions of features of a face of the object (Hwang, fig 17; paragraphs 182 and 188; paragraphs 252, 253, and 260; paragraphs 291 and 297).

Regarding claim 3, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the one or more neural networks comprise a second portion to generate the one or more predicted body movements (Hwang, fig 17, neural network portion 730/530; paragraph 284; paragraphs 291 and 292; paragraph 299), and a first portion to generate the one or more predicted facial expressions based, at least in part, on the audio data of speech (Hwang, fig 17, neural network portion 630/530; paragraph 284; paragraphs 289 and 290; paragraph 299).
Regarding claim 5, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the one or more predicted body movements and facial expressions of the object indicate body language, conveying the emotional state of the speaker of the speech represented by the audio data, through motion of limbs and the facial expressions of the object (Hwang, paragraphs 220 and 253; paragraphs 284 and 299; Wang, paragraph 45; paragraphs 52 and 53; paragraph 58; the motivation to combine is similar to that discussed above in the rejection of claim 1).
Regarding claim 6, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the object is an avatar of one or more portions of a human (Hwang, fig 17; paragraphs 284, 297 and 299; paragraphs 324-326).
Regarding claim 7, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the audio data comprises one or more utterances of speech and the one or more conditional inputs correspond to the one or more utterances of speech (Hwang, paragraphs 280 and 281; paragraphs 291 and 292; paragraphs 295 and 297).
Regarding claims 8-10 and 12-14, the limitations of these claims substantially correspond to the limitations of claims 1-3 and 5-7, respectively; thus they are rejected on similar grounds as their corresponding claims.
Regarding claims 15-17, 19, and 20, the limitations of these claims substantially correspond to the limitations of claims 1-3, 5, and 6, respectively; thus they are rejected on similar grounds as their corresponding claims.

Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hwang, in view of Wang, and further in view of Li et al. (U.S. Patent Application Publication No. 2024/0153184), referred herein as Li.
Regarding claim 4, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the one or more conditional inputs are generated by one or more neural networks and indicate the 3D position data or depth information of the object corresponding to the one or more predicted body movements and facial expressions of the object (Hwang, paragraphs 253 and 255; paragraphs 280 and 281; paragraphs 289 and 291).
Hwang in view of Wang does not teach an input comprising a heatmap.
However, in a similar field of endeavor, Li teaches a system comprising circuits to use one or more neural networks to generate motions of an avatar object based on input motion and audio (fig 5; paragraph 56, lines 1-4; paragraph 58, lines 1-22; paragraph 64, the last 5 lines), wherein heatmaps are input that indicate a position of the object (paragraph 61, lines 1-5 and the last 8 lines; paragraph 62, lines 1-6 and the last 8 lines).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the heatmap body position indication of Li with the body position processing of Hwang in view of Wang because this provides highly accurate body position identification that is efficient enough to perform in real-time, while still reducing processing resource requirements (see, for example, Li, paragraph 21, lines 1-10 and the last 5 lines; paragraph 62, the last 2 lines).
Regarding claims 11 and 18, the limitations of each of these claims substantially correspond to the limitations of claim 4; thus they are rejected on similar grounds.

Response to Arguments
Applicant’s arguments with respect to the prior art rejections have been fully considered, but are moot in view of the new grounds of rejection presented above.

Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Choe (U.S. Patent Application Publication No. 2016/0378965); Electronic apparatus and method for controlling functions in the electronic apparatus using a bio-metric sensor.
Harazi (U.S. Patent Application Publication No. 2023/0252972); Emotion-based text to speech.
Tiwari (U.S. Patent Application Publication No. 2024/0070399); Multi-level emotional enhancement of dialogue.
Nandwana (U.S. Patent Application Publication No. 2024/0087596); Artificial latency for moderating voice communication.
Harpale (U.S. Patent Application Publication No. 2025/0029305); Rendering XR avatars based on acoustical features.
Groves (U.S. Patent Application Publication No. 2024/0038205); Systems, apparatuses, and/or methods for real-time adaptive music generation.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID T WELCH whose telephone number is (571)270-5364. The examiner can normally be reached Monday-Thursday, 8:30-5:30 EST, and alternate Fridays, 9:00-2:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DAVID T. WELCH
Primary Examiner
Art Unit 2613



/DAVID T WELCH/Primary Examiner, Art Unit 2613

Read full office action

Prosecution Timeline

Feb 07, 2024

Application Filed

Aug 20, 2025

Non-Final Rejection — §103

Oct 22, 2025

Interview Requested

Nov 03, 2025

Examiner Interview Summary

Nov 03, 2025

Applicant Interview (Telephonic)

Jan 06, 2026

Response Filed

Jan 22, 2026

Final Rejection — §103

Feb 03, 2026

Interview Requested

Feb 25, 2026

Examiner Interview Summary

Feb 25, 2026

Examiner Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

18/413,383

Patent 12602742

IMAGE PROCESSING APPARATUS, BINARIZATION METHOD, AND NON-TRANSITORY RECORDING MEDIUM

2y 5m to grant Granted Apr 14, 2026

18/529,550

Patent 12602842

TEXTURE GENERATION USING MULTIMODAL EMBEDDINGS

2y 5m to grant Granted Apr 14, 2026

18/737,274

Patent 12592048

System and Method for Creating Anchors in Augmented or Mixed Reality

2y 5m to grant Granted Mar 31, 2026

18/641,421

Patent 12579734

METHOD FOR RENDERING VIEWPOINTS AND ELECTRONIC DEVICE

2y 5m to grant Granted Mar 17, 2026

17/779,661

Patent 12573119

APPARATUS AND METHOD FOR GENERATING SPEECH SYNTHESIS IMAGE

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

82%

Grant Probability

99%

With Interview (+27.2%)

3y 2m

Median Time to Grant

Moderate

PTA Risk

Based on 303 resolved cases by this examiner. Grant probability derived from career allow rate.