Last updated: April 19, 2026

Application No. 18/675,365

METHOD, DEVICE, AND PROGRAM PRODUCT FOR GENERATING AVATAR ANIMATION

Non-Final OA §103

Filed

May 28, 2024

Examiner

PUNTIER, CHRIS ALEJANDRO

Art Unit

2616

Tech Center

2600 — Communications

Assignee

DELL PRODUCTS, L.P.

OA Round

1 (Non-Final)

Interview Optional

— +10.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 31 resolved cases, 2023–2026

Examiner Intelligence

PUNTIER, CHRIS ALEJANDRO View full profile →

Grants 94% — above average

Career Allow Rate

29 granted / 31 resolved

+31.5% vs TC avg

Moderate +10% lift

Without

With

+10.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

12 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

6.6%

-33.4% vs TC avg

§103

70.9%

+30.9% vs TC avg

§102

15.4%

-24.6% vs TC avg

§112

6.6%

-33.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 31 resolved cases

Office Action

§103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 5/28/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Allowable Subject Matter
Claims 3-8 and 13-18 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
Claim(s) 1,11,20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Abdelaziz(US-20210248804-A1) in view of Sagar (US-20220108510-A1) and Tong(US-20170039750-A1).

Regarding claim 1, Abdelaziz discloses A method for generating an avatar animation, comprising: generating an animation instruction vector for the avatar animation based on input text (para. [0274-0276]“In some examples, neural network 810 determines a vector representation of text 802 and a vector representation of the determined emotional state… In some examples, neural network 810 determines speech data set 812 and animation parameters 814 based on the first intermediate vector representing text 802 and the determined emotional state. Thus, the outputs of neural network 810, including speech data set 812 and animation parameters 814 are based on both text 802 and the determined emotional state, and thus are based on both text 802 and optionally indication of emotional state 804.” Abdelaziz explicitly generates a vector representation of text and uses that as the basis to produce animation control outputs, aligning with the claim element.); 
However, Abdelaziz alone does not fully disclose  determining an animation sequence of the avatar animation based on the animation instruction vector, the animation sequence indicating multiple frames of the avatar animation and transitions between the multiple frames; determining a facial blended shape of the avatar animation based on the animation instruction vector, the facial blended shape indicating a facial expression of the avatar animation; and generating an avatar animation corresponding to the input text based on the animation sequence and the facial blended shape;
The combination of Abdelaziz and Sagar do disclose determining an animation sequence of the avatar animation based on the animation instruction vector, the animation sequence indicating multiple frames of the avatar animation and transitions between the multiple frames(Abdelaziz discloses in  para.[0299] “As illustrated by FIG. 8, system 800 includes avatar animator 820. In some examples, avatar animator 820 receives animation parameters 814 from neural network 810. Once avatar animator 820 receives animation parameters avatar animator 820 generates avatar data 822 for animating an avatar (e.g., avatar 906, 914, 920) using animation parameters 814.” Sagar discloses in para.[0066] “In one embodiment, real-time generation of speech animation uses model visemes to predict the animation sequences at onsets of visemes and a look-up table based (data-driven) algorithm to predict the dynamics at transitions of visemes.” Abdelaziz supplies that the system produces animation parameters and then generates avatar data for animating an avatar using those parameters. Sagar supplies the missing explicit structure of an “animation sequence” with “transitions.” Combining them, the sequence/transition concept of Sagar is the form of the generated avatar animation data of Abdelaziz driven by the instruction vector parameters.); and generating an avatar animation corresponding to the input text based on the animation sequence and the facial blended shape(Abdelaziz discloses in para.[0302] “As illustrated by FIG. 8, system 800 also includes output processor 830. In some examples, output processor 830 processes avatar data 822 to produce an avatar (e.g., avatar 906, 914, 920) for display. In some examples, output processor 830 receives speech data set 812 generated by neural network 810 and avatar data 822 generated by avatar animator 820. In some examples, output processor 830 processes speech data set 812 to generate speech output 832. In some examples, output processor 830 includes a neural network (e.g., a WaveRNN) trained to generate speech output 832 based on speech data set 812.” Sagar discloses in para.[0099] “Expression animations may be presented as a collection of time-varying FACS AU weights which are added to speech (lip synchronization) animation.” Abdelaziz teaches how to generate avatar data for animating an avatar and processes avatar data to produce an avatar for display, including facial features. Sagar provides the missing structure by teaching that expression is represented as time varying weights and that the expression weights are added with lip-sync animation.) It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the teachings of Sagar into the teachings of Abdelaziz in order to achieve a way to operationalize smooth animations from parameters and have a predictable way to produce expressive avatar animations.
The combination of Sagar and Abdelaziz still do not fully disclose determining a facial blended shape of the avatar animation based on the animation instruction vector, the facial blended shape indicating a facial expression of the avatar animation.
The combination of Abdelaziz and Tong do disclose determining a facial blended shape of the avatar animation based on the animation instruction vector, the facial blended shape indicating a facial expression of the avatar animation(Tong discloses in para.[0011] “The facial expression and speech tracker may further include an animation message generation function to select a plurality of blend shapes, including assignment of weights of the blend shapes, for animating the avatar, based on tracked facial expressions or speech of the user.” Further in para.[0022] “The blend shapes may be decided or selected for various facial expression and speech tracker 102 capabilities and target mobile device system requirements. During operation, facial expression and speech tracker 102 may select various blend shapes, and assign the blend shape weights, based on the facial expression and/or speech determined. The selected blend shapes and their assigned weights may be output as part of animation messages 120.” Tong expressly discloses selecting blend shapes and assigning weights to them for animating an avatar’s face to produce the target expressed facial. Although there is not explicit disclosure of a “animation instruction vector” it ties blendedshape weights to the tracked facial expression, which can be used in the system taught by Abdelaziz.) It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the teachings of Tong into the teachings of Abdelaziz in order to allow the system to yield more predictable movements and have more efficiency and scalability.
Regarding claim 11, claim 11 is similar in scope to claim 1. However, claim 11 recites An electronic device, comprising: at least one processor; and a memory coupled to the at least one processor and having instructions stored therein, the instructions, when executed by the at least one processor, causing the electronic device to perform actions comprising. Abdelaziz also discloses this in para.[0047] “In some examples, a non-transitory computer-readable storage medium of memory 202 is used to store instructions (e.g., for performing aspects of processes described below) for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In other examples, the instructions (e.g., for performing aspects of the processes described below) are stored on a non-transitory computer-readable storage medium (not shown) of the server system 108 or are divided between the non-transitory computer-readable storage medium of memory 202 and the non-transitory computer-readable storage medium of server system 108.” The rest of claim 11 is rejected under the same rationale as claim 1.
Regarding claim 20, claim 20 is similar in scope to claims 1 and 11, thus rejected under the same rationale. 


Claim(s) 2,12 are rejected under 35 U.S.C. 103 as being unpatentable over Abdelaziz, Sagar, and Tong as applied to claim 1 above, and further in view of Ferstl (Ferstl, Ylva, and Rachel McDonnell. "Investigating the use of recurrent motion modelling for speech gesture generation." Proceedings of the 18th International Conference on Intelligent Virtual Agents. 2018.)
Regarding claim 2, the combination of Abdelaziz, Sagar and Tong disclose all the elements of claim 1 as discussed above. However, they do not fully disclose wherein generating an animation instruction vector for the avatar animation based on input text comprises: determining an animation instruction at a second instant based on emotional features and contextual features at a first instant of the input text and the animation instruction at the first instant.
The combination of Abdelaziz, Sagar, Tong and Ferstl does disclose wherein generating an animation instruction vector for the avatar animation based on input text comprises: determining an animation instruction at a second instant based on emotional features and contextual features at a first instant of the input text and the animation instruction at the first instant( Abdelaziz discloses in para.[0270] “In some examples, indication of emotional state 804 is based on contextual data of the user or the user device. As discussed above, the contextual data includes user-specific data, vocabulary, and/or preferences relevant to the user input.” Ferstl then discloses on page. 94, top of the second column. “…our pre-training task is predicting the next few frames of a motion sequence, receiving as an input the preceding motion sequence. This requires a modelling of the dynamics of human motion. State-of-the-art work has shown the potential of recurrent neural networks for modelling human motion [15, 20, 25]. Recurrent networks can model sequential data by using recurrent connections between network activations at consecutive timesteps.” Abdelaziz discloses the emotional and contextual features side and ties them into the instruction vector pipeline deriving the emotional state from the received text and uses contextual data to generate animation outputs. Ferstl discloses explicit recurrence across instances teaching predicting later frames from earlier frames by taking the preceding motion sequence as an input. ) It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the teachings of Ferstl into the combination of teachings of in order to a be able to produce avatar animations at different times based on previous instances.
Claim 12, which is similar in scope to claim 2, thus rejected under the same rationale.

Claim(s) 9,10 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Abdelaziz, Sagar, and Tong as applied to claim 1 above, and further in view of Liu (Liu, Zhilei, et al. "Conditional Adversarial Synthesis of 3D Facial Action Units." arXiv preprint arXiv:1802.07421 (2018).)
Regarding claim 9, the combination of Abdelaziz, Sagar and Tong disclose all the elements of claim 1 as discussed above. However, the combination does not fully disclose wherein determining the facial blended shape of the avatar animation based on the animation instruction vector comprises: determining, by a facial expression generative adversarial network, the facial blended shape at a fifth instant based on emotional features at the fifth instant.
The combination of Abdelaziz, Sagar, Tong and Liu do disclose wherein determining the facial blended shape of the avatar animation based on the animation instruction vector comprises: determining, by a facial expression generative adversarial network, the facial blended shape at a fifth instant based on emotional features at the fifth instant (Liu discloses, on page.2 para. 2, “In particular, these are conditional GANs, with the conditioning on target AU labels. In this way, we are able to generate 3D faces with target expressions specified by different desired AU labels, which can then be rendered to generate the corresponding high-resolution facial images” Liu discloses the use of a conditional GAN to generate facial expressions with the output being a #DMM expression parameter. However, Liu does not disclose any time-instant wording. Sagar discloses in para.[0099] “Expression animations may be presented as a collection of time-varying FACS AU weights which are added to speech (lip synchronization) animation.” This ties expression to a states and represents the animation as time-varying weights. This is a close analogue to the “instant” described in the claim element through its time indexed sequences.). It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the teachings of Liu into the combination of teachings of Abdelaziz, Sagar and Tong in order to achieve a way to have higher quality facial animations through the use of the GAN.
Regarding claim 10, the combination of Abdelaziz, Sagar, Tong and Liu disclose all the elements of claim 9 as discussed above. The combination also discloses wherein generating an avatar animation corresponding to the input text based on the animation sequence and the facial blended shape comprises: generating the avatar animation at a sixth instant based on the animation sequence at the sixth instant and the facial blended shape at the sixth instant(Sagar discloses in para.[0099] “Expression animations may be presented as a collection of time-varying FACS AU weights which are added to speech (lip synchronization) animation.” Further in para.[0098] “A Speech animation can be composited with Expression animations to form expressive speech animations. shows an animation system. Under a simplified embodiment, an Animation Composer 005 receives two animation inputs, including Speech animation and Expression animation.” Sagar teaches time-indexed animation sequences, as a time series of weights, and time-indexed facial expression weights, and generating the final expressive animation by compositing them aligning with the claim element.) It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the teachings of Sagar into the combination of teachings of Albedazi, Tong and Liu in order to have a conventional way to generate an animation at specific time instances based on the animation sequence and current facial expressions with predictable success.

Claim 19, which is similar in scope to claim 9, thus rejected under the same rationale.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRIS ALEJANDRO PUNTIER whose telephone number is (703)756-1893. The examiner can normally be reached M-F 7:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 571-272-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHRIS ALEJANDRO PUNTIER/               Examiner, Art Unit 2616                    


/DANIEL F HAJNIK/               Supervisory Patent Examiner, Art Unit 2616

Read full office action

Prosecution Timeline

May 28, 2024

Application Filed

Feb 06, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/105,941

Patent 12586298

CONTROLLED ILLUMINATION FOR IMPROVED 3D MODEL RECONSTRUCTION

2y 5m to grant Granted Mar 24, 2026

18/185,230

Patent 12586291

Fast Large-Scale Radiance Field Reconstruction

2y 5m to grant Granted Mar 24, 2026

18/357,499

Patent 12573103

ENVIRONMENT MAP UPSCALING FOR DIGITAL IMAGE GENERATION

2y 5m to grant Granted Mar 10, 2026

18/353,240

Patent 12548226

SYSTEMS AND METHODS FOR A THREE-DIMENSIONAL DIGITAL PET REPRESENTATION PLATFORM

2y 5m to grant Granted Feb 10, 2026

18/128,751

Patent 12536679

APPLICATION MATCHING METHOD AND APPLICATION MATCHING DEVICE

2y 5m to grant Granted Jan 27, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

94%

Grant Probability

99%

With Interview (+10.0%)

2y 6m

Median Time to Grant

Low

PTA Risk

Based on 31 resolved cases by this examiner. Grant probability derived from career allow rate.