Prosecution Insights
Last updated: April 19, 2026
Application No. 18/435,873

DIGITAL MODEL GENERATION USING NEURAL NETWORKS

Final Rejection §103
Filed
Feb 07, 2024
Examiner
WELCH, DAVID T
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
2 (Final)
82%
Grant Probability
Favorable
3-4
OA Rounds
3y 2m
To Grant
99%
With Interview

Examiner Intelligence

Grants 82% — above average
82%
Career Allow Rate
247 granted / 303 resolved
+19.5% vs TC avg
Strong +27% interview lift
Without
With
+27.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
29 currently pending
Career history
332
Total Applications
across all art units

Statute-Specific Performance

§101
11.6%
-28.4% vs TC avg
§103
47.4%
+7.4% vs TC avg
§102
20.6%
-19.4% vs TC avg
§112
12.2%
-27.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 303 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Objections Claims 1, 8, and 15 are objected to because of the following informalities: each of these claims recites predicting body movements based on “the conditional inputs, emotional state of the speaker, and textual input.” Although the Examiner can understand the meaning, scope, and intention of this phrase, it should be amended to clearly recite the antecedent basis for the emotional state and the textual input. For example, this phrase should be amended to read --the conditional inputs, the emotional state of the speaker, and the textual input--. Appropriate correction is required. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-3, 5-10, 12-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hwang et al. (U.S. Patent Application Publication No. 2024/0203099), referred herein as Hwang, in view of Wang et al. (U.S. Patent Application Publication No. 2023/0267916), referred herein as Wang. Regarding claim 1, Hwang teaches one or more processors, comprising circuitry to use one or more neural networks (figs 1 and 15-17, processor 180, neural network portions 530, 630, 730, and 820) to: obtain one or more conditional inputs (figs 15-17, inputs 516/730; paragraphs 280 and 281; paragraphs 291 and 292), audio data of speech (figs 15-17, audio input 518/820; paragraphs 280 and 281; paragraph 295), and a textual input (paragraphs 188 and 190; paragraphs 280 and 281; paragraph 296), at least one of the one or more conditional inputs comprising 3D position data or depth information of one or more features of an object (paragraphs 280 and 281; paragraphs 291 and 292); determine an expression of a speaker of the speech represented by the audio data (paragraph 220; paragraph 253); predict based, at least in part, on the conditional inputs, the expression of the speaker, and the textual input, one or more body movements and facial expressions of an object when pronouncing the textual input (paragraphs 188 and 190; paragraphs 222 and 255; paragraphs 280 and 281; paragraph 297); and generate, based on the one or more predicted body movements and predicted facial expressions, a 3D model of the object that depicts the object performing the one or more body movements while pronouncing the textual input and with facial expressions that correspond to the expression (figs 15-17, model generator 530; paragraphs 220 and 222; paragraphs 253 and 255; paragraphs 284 and 299). Hwang teaches determining expressions based on the text, audio data of speech, and 3D position data, as shown above. However, Hwang does not explicitly teach determining an emotional state, which is reflected in the 3D model. However, in a similar field of endeavor, Wang teaches a system comprising neural networks configured to obtain a textual input and audio data of speech, and predict 3D model animation based, at least in part, on the textual input and the audio data of speech (paragraphs 15, 19, and 27; paragraphs 46 and 48; paragraph 70), and further configured to predict an emotional state of a speaker in order to animate the 3D model to pronounce the textual input such that it that corresponds to the emotional state (paragraph 45; paragraphs 52 and 53; paragraph 58). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the text input and emotional state determination of Wang with the text, audio data, and 3D position data processing of Hwang because this helps ensure that the emotional context of input text is accurately reflected in the animation, thereby improving the quality of the 3D model generation while retaining the speed and efficiency of its generation (see, for example, Wang, paragraphs 15 and 54). Regarding claim 2, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the one or more predicted body movements of the object comprises one or more motions of limbs of the object and the one or more predicted facial expression of the object comprises one or more motions of features of a face of the object (Hwang, fig 17; paragraphs 182 and 188; paragraphs 252, 253, and 260; paragraphs 291 and 297). Regarding claim 3, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the one or more neural networks comprise a second portion to generate the one or more predicted body movements (Hwang, fig 17, neural network portion 730/530; paragraph 284; paragraphs 291 and 292; paragraph 299), and a first portion to generate the one or more predicted facial expressions based, at least in part, on the audio data of speech (Hwang, fig 17, neural network portion 630/530; paragraph 284; paragraphs 289 and 290; paragraph 299). Regarding claim 5, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the one or more predicted body movements and facial expressions of the object indicate body language, conveying the emotional state of the speaker of the speech represented by the audio data, through motion of limbs and the facial expressions of the object (Hwang, paragraphs 220 and 253; paragraphs 284 and 299; Wang, paragraph 45; paragraphs 52 and 53; paragraph 58; the motivation to combine is similar to that discussed above in the rejection of claim 1). Regarding claim 6, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the object is an avatar of one or more portions of a human (Hwang, fig 17; paragraphs 284, 297 and 299; paragraphs 324-326). Regarding claim 7, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the audio data comprises one or more utterances of speech and the one or more conditional inputs correspond to the one or more utterances of speech (Hwang, paragraphs 280 and 281; paragraphs 291 and 292; paragraphs 295 and 297). Regarding claims 8-10 and 12-14, the limitations of these claims substantially correspond to the limitations of claims 1-3 and 5-7, respectively; thus they are rejected on similar grounds as their corresponding claims. Regarding claims 15-17, 19, and 20, the limitations of these claims substantially correspond to the limitations of claims 1-3, 5, and 6, respectively; thus they are rejected on similar grounds as their corresponding claims. Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Hwang, in view of Wang, and further in view of Li et al. (U.S. Patent Application Publication No. 2024/0153184), referred herein as Li. Regarding claim 4, Hwang in view of Wang teaches the one or more processors of claim 1, wherein the one or more conditional inputs are generated by one or more neural networks and indicate the 3D position data or depth information of the object corresponding to the one or more predicted body movements and facial expressions of the object (Hwang, paragraphs 253 and 255; paragraphs 280 and 281; paragraphs 289 and 291). Hwang in view of Wang does not teach an input comprising a heatmap. However, in a similar field of endeavor, Li teaches a system comprising circuits to use one or more neural networks to generate motions of an avatar object based on input motion and audio (fig 5; paragraph 56, lines 1-4; paragraph 58, lines 1-22; paragraph 64, the last 5 lines), wherein heatmaps are input that indicate a position of the object (paragraph 61, lines 1-5 and the last 8 lines; paragraph 62, lines 1-6 and the last 8 lines). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the heatmap body position indication of Li with the body position processing of Hwang in view of Wang because this provides highly accurate body position identification that is efficient enough to perform in real-time, while still reducing processing resource requirements (see, for example, Li, paragraph 21, lines 1-10 and the last 5 lines; paragraph 62, the last 2 lines). Regarding claims 11 and 18, the limitations of each of these claims substantially correspond to the limitations of claim 4; thus they are rejected on similar grounds. Response to Arguments Applicant’s arguments with respect to the prior art rejections have been fully considered, but are moot in view of the new grounds of rejection presented above. Conclusion The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Choe (U.S. Patent Application Publication No. 2016/0378965); Electronic apparatus and method for controlling functions in the electronic apparatus using a bio-metric sensor. Harazi (U.S. Patent Application Publication No. 2023/0252972); Emotion-based text to speech. Tiwari (U.S. Patent Application Publication No. 2024/0070399); Multi-level emotional enhancement of dialogue. Nandwana (U.S. Patent Application Publication No. 2024/0087596); Artificial latency for moderating voice communication. Harpale (U.S. Patent Application Publication No. 2025/0029305); Rendering XR avatars based on acoustical features. Groves (U.S. Patent Application Publication No. 2024/0038205); Systems, apparatuses, and/or methods for real-time adaptive music generation. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID T WELCH whose telephone number is (571)270-5364. The examiner can normally be reached Monday-Thursday, 8:30-5:30 EST, and alternate Fridays, 9:00-2:30 EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. DAVID T. WELCH Primary Examiner Art Unit 2613 /DAVID T WELCH/Primary Examiner, Art Unit 2613
Read full office action

Prosecution Timeline

Feb 07, 2024
Application Filed
Aug 20, 2025
Non-Final Rejection — §103
Oct 22, 2025
Interview Requested
Nov 03, 2025
Examiner Interview Summary
Nov 03, 2025
Applicant Interview (Telephonic)
Jan 06, 2026
Response Filed
Jan 22, 2026
Final Rejection — §103
Feb 03, 2026
Interview Requested
Feb 25, 2026
Examiner Interview Summary
Feb 25, 2026
Examiner Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602742
IMAGE PROCESSING APPARATUS, BINARIZATION METHOD, AND NON-TRANSITORY RECORDING MEDIUM
2y 5m to grant Granted Apr 14, 2026
Patent 12602842
TEXTURE GENERATION USING MULTIMODAL EMBEDDINGS
2y 5m to grant Granted Apr 14, 2026
Patent 12592048
System and Method for Creating Anchors in Augmented or Mixed Reality
2y 5m to grant Granted Mar 31, 2026
Patent 12579734
METHOD FOR RENDERING VIEWPOINTS AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 17, 2026
Patent 12573119
APPARATUS AND METHOD FOR GENERATING SPEECH SYNTHESIS IMAGE
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+27.2%)
3y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 303 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month