Last updated: April 19, 2026
Application No. 18/329,831
FACIAL ANIMATION USING EMOTIONS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Final Rejection §103
Filed
Jun 06, 2023
Examiner
HSU, JONI
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
4 (Final)
Interview Optional

— +7.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 848 resolved cases, 2023–2026
Examiner Intelligence

HSU, JONI View full profile →
Grants 87% — above average
Career Allow Rate
741 granted / 848 resolved
+25.4% vs TC avg
Moderate +7% lift
Without
With
+7.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
34 currently pending
Career history
882
Total Applications
across all art units
Statute-Specific Performance

§101
8.4%
-31.6% vs TC avg
§103
59.7%
+19.7% vs TC avg
§102
11.4%
-28.6% vs TC avg
§112
3.1%
-36.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 848 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see p. 13-15, filed February 20, 2026, with respect to the 35 U.S.C. 112 rejections have been fully considered and are persuasive.  The 35 U.S.C. 112 rejections of Claims 1-14 and 16-20 have been withdrawn. 
Applicant’s arguments with respect to claim(s) 1-14 and 16-21 have been considered but are moot because new grounds of rejection are made in view of McDuff (US 20200279553A1).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1, 9, 13, 17, 18, 20, and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Beith (US 20240078732A1) in view of McDuff (US 20200279553A1).
As per Claim 1, Beith teaches a method comprising:  obtaining audio data (204) representative of one or more words (microphones are configured to generate audio data 204, microphone configured to capture speech of the user, [0074]) to be output by an animated character; generating, using one or more first neural networks (222) and based at least on the audio data, a first output indicating an emotional state that is associated with displacing first points of an upper portion of a facial representation of the animated character when outputting the one or more words; determining, using one or more second neural networks and based at least on the audio data and the first output, a second output associated with a facial animation for the animated character, wherein a displacement in the first points of the upper portion of the facial representation represented by the facial animation is caused by the emotional state while a displacement in second points of a lower portion of the facial representation is caused by the audio data; and causing, based on the second output, the animated character to perform the facial animation while outputting the one or more words (audio unit 222 includes a deep learning neural network, [0080], semantical context is based on an emotion 270 associated with the speech 258 represented in the audio data 204, processors 116 are configured to process the audio data 204 to predict the emotion 270, in addition to detecting emotion associated with the meanings of words of the user’s speech 258, the audio unit 222 can include machine learning models that are configured to detect audible emotions based on the speaking characteristics of the user, feature data generator 120 may be configured to associate particular facial expressions with various audible emotions, adjusted face data 134 causes the avatar facial expression 156 to represent the emotion 270 (e.g., smiling to express happiness, eyes narrowed to express anger, eyes widened to express surprise, etc.), [0083], feature data generator 120 includes an image unit 226 that is configured to generate a facial representation 228 based on the image data 208 and that may indicate the semantical context 122, by processing the image data 208 using neural networks of the image unit 226, the resulting adjusted face data 134 can provide a more accurate and realistic facial expression of the avatar 154, facial representation 228 includes an indication of expressions, movements of the user, [0086], context-based future speech prediction network 1210 processes the audio data 204 to determine a predicted word in context 1220, context-based future speech prediction network 120 includes neural network, [0135], representation generator 1230 is configured to generate a representation 1250 of the predicted word in context 1220, the representation 1250 may be concatenated to the image-based features 322 to generate the feature data 124, [0136], context-based future speech prediction network 1210 and the representation generator 1230 enable prediction, based on a context of spoken words, of what a word will be, which is used to predict an avatar’s facial image or to ensure compliance frame-to-frame, to ensure that the image of the avatar pronouncing words is transitioning correctly over time, [0137]).
However, Beith does not teach wherein a greater displacement in the first points of the upper portion of the facial representation represented by the facial animation is caused by the emotional state as compared to the audio data while a greater displacement in second points of a lower portion of the facial representation is caused by the audio data as compared to the emotional state. However, McDuff teaches wherein a greater displacement in the first points of the upper portion of the facial representation represented by the facial animation is caused by the emotional state as compared to the audio data (emotion and head pose synthesizer 420 may receive the sentiment output from the text sentiment recognizer 404 to modify the emotional expressiveness of the upper face of the synthesized output 422, [0080]) while a greater displacement in second points of a lower portion of the facial representation is caused by the audio data as compared to the emotional state (phoneme recognizer 406 may act on a stream of audio samples to identify phonemes, or visemes, for use in animating the lips of the embodied conversational agent 302, [0069]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beith so that a greater displacement in the first points of the upper portion of the facial representation represented by the facial animation is caused by the emotional state as compared to the audio data while a greater displacement in second points of a lower portion of the facial representation is caused by the audio data as compared to the emotional state because McDuff suggests that the emotions are expressed more by the upper face [0080], and the audio is used to animate the lips so that it looks like the conversational agent is saying the words in the audio [0069].
As per Claims 9 and 13, these claims are each similar in scope to Claim 1, and therefore are rejected under the same rationale.
As per Claim 17, Beith teaches wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system implemented using one or more large language models; a system for performing conversational AI operations; a system for generating synthetic data; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources (deep learning architecture neural network, [0078]).
As per Claims 18 and 20, these claims are similar in scope to Claims 1 and 17 respectively, and therefore are rejected under the same rationale.
As per Claim 21, Beith teaches wherein the one or more neural networks are trained such that: the emotional state drives animation of the upper portion of the facial representation of the character [0080, 0083]; and the audio data drives animation of a lower portion of the facial representation of the character [0135, 0137].
	However, Beith does not teach the emotional state mostly drives animation of the upper portion of the facial representation of the character; and the audio data mostly drives animation of a lower portion of the facial representation of the character. However, McDuff teaches the emotional state mostly drives animation of the upper portion of the facial representation of the character [0080]; and the audio data mostly drives animation of a lower portion of the facial representation of the character [0069]. This would be obvious for the reasons given in the rejection for Claim 1.
Claim(s) 2-4, 10, and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Beith (US 20240078732A1) and McDuff (US 20200279553A1) in view of Karras (US 20180336464A1).
As per Claim 2, Beith and McDuff are relied upon for the teachings as discussed above relative to Claim 1.
	However, Beith and McDuff do not teach the second output corresponds to locations of a plurality of vertices associated with one or more of the first points or the second points, an individual vertex of the plurality of vertices representing a three-dimensional point associated with the facial representation of the animated character.  However, Karras teaches wherein the second output corresponds to locations of a plurality of vertices associated with one or more of the first points or the second points, an individual vertex of the plurality of vertices representing a three-dimensional point associated with the facial representation of the animated character (creating facial animation sequences in a vertex mesh based on audio input, [0079], produce the final 3D positions of a plurality of control vertices of a mesh 808, the facial mesh includes 5022 control vertices that can be moved in 3D positions, [0090]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beith and McDuff so that the second output corresponds to locations of a plurality of vertices associated with one or more of the first points or the second points, an individual vertex of the plurality of vertices representing a three-dimensional point associated with the facial representation of the animated character because Karras suggests that this way, a 3D realistic looking face can be animated [0079, 0090].
As per Claim 3, Beith and McDuff do not teach further comprising:  determining, using one or more third neural networks and based at least on the audio data, a third output, wherein:  the determining the first output indicating the emotional state is based at least on the third output; and the determining the second output associated with the facial animation for the animated character is based at least on the first output and the third output.  However, Karras teaches further comprising:  determining, using one or more third neural networks (810) and based at least on the audio data (802), a third output, wherein:  the determining the first output indicating the emotional state is based at least on the third output; and the determining the second output associated with the facial animation for the animated character is based at least on the first output and the third output (first part of the deep neural network is a formant analysis network 810, the formant analysis network 810 receives an audio input 802 and produces a time-varying sequence of speech features that are passed to the articulation network 820, the articulation network 820 analyze the temporal evolution of the features and output a single abstract feature vector that describes the facial pose, the articulation network 820 outputs a set of 256+E abstract features, where E is the number of components of the emotional state vector 804, the set of abstract features output by the articulation network 820 is fed to an output network 830, which generates the final 3D positions of a plurality of vertices in a mesh output 808, [0083], [0090, 0079]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beith and McDuff to include determining, using one or more third neural networks and based at least on the audio data, a third output, wherein:  the determining the first output indicating the emotional state is based at least on the third output; and the determining the second output associated with the facial animation for the animated character is based at least on the first output and the third output because Karras suggests that this is useful for generating plausible and expressive 3D facial animation based exclusively on a vocal audio track [0081].
As per Claim 4, Beith and McDuff do not teach wherein the determining the second output associated with the facial animation comprises:  determining, using at least one third neural network of the one or more second neural networks, and based at least on the audio data and the first output, a third output; and determining, using at least one fourth neural network of the one or more second neural networks, and based at least on the third output, the second output associated with the facial animation for the animated character.  However, Karras teaches wherein the determining the second output associated with the facial animation comprises:  determining, using at least one third neural network (820) of the one or more second neural networks, and based at least on the audio data and the first output, a third output; and determining, using at least one fourth neural network (830) of the one or more second neural networks, and based at least on the third output, the second output associated with the facial animation for the animated character [0083, 0090, 0079].  This would be obvious for the reasons given in the rejection for Claim 3.
As per Claim 10, Claim 10 is similar in scope to Claim 2, and therefore is rejected under the same rationale.
As per Claim 11, Beith and McDuff do not teach the one or more processors are to determine, using one or more neural networks and based at least on the audio data, a third output, wherein the first output indicating the emotional state is determined based at least on the third output, and wherein the second output associated with the facial animation for the character is determined based at least on the third output and the second output.  However, Karras teaches the one or more processing units are to determine, using one or more neural networks (820) and based at least on the audio data, a third output, wherein the first output associated with the emotional state is determined based at least on the third output, and wherein the second output associated with the facial animation for the character is determined based at least on the third output and the second output [0083, 0090, 0079].  This would be obvious for the reasons given in the rejection for Claim 3.
Claim(s) 5, 12, and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Beith (US 20240078732A1) and McDuff (US 20200279553A1) in view of Bolzoni (US 20240021196A1).
As per Claim 5, Beith and McDuff are relied upon for the teachings as discussed above relative to Claim 1.  Beith teaches generating, using one or more first neural networks and based at least on the audio data, a first output indicating an emotional state; determining, using one or more second neural networks and based at least on the audio data and the first output, a second output associated with a facial animation; and causing, based at least on the second output, the animated character to perform the facial animation, as discussed in the rejection for Claim 1.
However, Beith and McDuff do not teach generating using the one or more first neural networks and based at least on second audio data corresponding to one or more second words, a third output associated with at least one of the emotional state or a second emotional state; determining, using the one or more second neural networks and based at least on the second audio data and the third output, a fourth output associated with a second facial animation; and causing, based at least on the fourth output, the animated character to perform the second facial animation.  However, Bolzoni teaches that the user’s first response (first audio data) is “I’m very worried. It’s Peter, we got into an argument again” [0046].  The user’s second response (second audio data) is “I believe he doesn’t love me anymore so I dumped him” [0048].  Thus, Bolzoni teaches further comprising:  generating, using the one or more first neural networks (150) and based at least on second audio data correspond to one or more second words, a third output associated with at least one of the emotional state or a second emotional state (the user may reply by stating “I believe he doesn’t love me anymore so I dumped him”, [0048], once again, the input analyzer 150 may convert the user’s audio input into text and may extract metadata therefrom, the input analyzer 150 may enrich the text with the metadata and transmit the enriched text to the state machine 155, the state machine 155 may classify the user’s response of “I believe he doesn’t love me anymore so I dumped him” during the second iteration as feeling (sadness) and anxiety, [0049], [0025]); determining, using the one or more second neural networks (155) and based at least on the second audio data and the third output, a fourth output associated with a face; and causing, based at least on the fourth output, a character with the face ([0036], based on the classification of the user’s response as feeling (anger), the state machine 155 may determine that the guidance state would be an appropriate state to transition to, the state machine 155 may transition to the guidance state, [0050], the third iteration may begin with the state machine 155 outputting the base prompt to the output composer 160 along with the extracted metadata, based on the metadata extracted from the user’s response during the second iteration, the output composer 160 may modify the base prompt of the inquiry, the output composer 160 may then output the modified prompt to the speaker and display for presentation to the user via the avatar 350, [0051], Fig. 3B).  Since Beith teaches generating, using one or more first neural networks and based at least on the audio data, a first output indicating an emotional state; determining, using one or more second neural networks and based at least on the audio data and the first output, a second output associated with a facial animation; and causing, based at least on the second output, the animated character to perform the facial animation, as discussed in the rejection for Claim 1, this teaching from Bolzoni of performing the processing for second audio data can be implemented into the device of Beith to include a fourth output associated with a second facial animation; and causing, based at least on the fourth output, the animated character to perform the second facial animation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beith and McDuff to include generating using the one or more first neural networks and based at least on second audio data corresponding to one or more second words, a third output associated with at least one of the emotional state or a second emotional state; determining, using the one or more second neural networks and based at least on the second audio data and the third output, a fourth output associated with a second facial animation; and causing, based at least on the fourth output, the animated character to perform the second facial animation because Bolzoni suggests that this way, the conversation can continue based on the next sentence, and thus this provides an interactive conversation platform that can engage in conversation with a user in a manner that simulates humanistic interaction including learned understanding of users [0020].
As per Claim 12, Claim 12 is similar in scope to Claim 5, and therefore is rejected under the same rationale.
As per Claim 14, Beith teaches wherein the determination of the second output associated with the facial animation of the character is based at least on the audio data, as discussed in the rejection for Claim 1.
However, Beith and McDuff do not teach wherein the one or more processors are further to:  receive input data representative of one or more inputs; and generate, based at least on the input data, a third output by updating at least a portion of the first output, wherein the second output associated with the facial animation of the character is determined based at least on the audio data and the third output.  However, Bolzoni teaches wherein the one or more processors are further to:  receive input data representative of one or more inputs; and generate, based at least on the input data, a third output by updating at least a portion of the first output (if the state machine 155 classifies a user’s response during a current iteration as anxious, the state machine 155 may determine that the next state to transition to may be the anxiety relief state, the state machine 155 may wait until the user’s responses from a certain number of previous iterations/states of the user session indicate that the user has been feeling anxious before determining that it should transition to the anxiety relief state, the state machine 155 may continue in the anxiety relief state until it determines the user’s anxiety has been lowered, upon detecting that the user’s responses for a consecutive number of iterations of the user session no longer indicate anxiety, the state machine 155 may determine that it should transition to the inquiry state to resume discussing the issues that initially caused the anxiety in the user to begin with, [0040]), wherein the second output associated with the face of the character is determined based at least on the audio data and the third output [0024, 0040, 0041, 0050, 0051] (Fig. 3B).  Since Beith teaches the second output is associated with the facial animation of the character, as discussed in the rejection for Claim 1, this teaching from Bolzoni can be implemented into the device of Beith so that the second output associated with the facial animation of the character is determined based at least on the audio data and the third output.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beith and McDuff so that the one or more processors are further to:  receive input data representative of one or more inputs; and generate, based at least on the input data, a third output by updating at least a portion of the first output, wherein the second output associated with the facial animation of the character is determined based at least on the audio data and the third output because Bolzoni suggests that this way, the conversation can be adjusted based on the user’s emotions, and thus this provides an interactive conversation platform that can engage in conversation with a user in a manner that simulates humanistic interaction including learned understanding of users [0020].
Claim(s) 6, 7, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Beith (US 20240078732A1) and McDuff (US 20200279553A1) in view of Li (US 20170243387A1).
As per Claim 6, Beith and McDuff are relied upon for the teachings as discussed above relative to Claim 1.
	However, Beith and McDuff do not teach wherein: the one or more first neural networks are trained based at least on animating the upper portion of the facial representation of the animated character without animating the lower portion of the facial representation; and the one or more second neural networks are trained based at least on animating the lower portion of the facial representation of the animated character without animating the upper portion of the facial representation.  However, Li teaches wherein: the one or more first neural networks (311) are trained based at least on animating the upper portion of the facial representation of the animated character without animating the lower portion of the facial representation (eyes convolutional neural network 311, the convolutional neural network, trained by the data from the training system, creates a facial animation for the eyes, [0047]); and the one or more second neural networks (312) are trained based at least on animating the lower portion of the facial representation of the animated character without animating the upper portion of the facial representation (for the mouth convolutional neural network 312, mouth animation control weights are applied, these rely upon the trained datasets created by the training system, [0049]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beith and McDuff so that the one or more first neural networks are trained based at least on animating the upper portion of the facial representation of the animated character without animating the lower portion of the facial representation; and the one or more second neural networks are trained based at least on animating the lower portion of the facial representation of the animated character without animating the upper portion of the facial representation because Li suggests that properly-trained neural networks can be applied to the mouth image data and the eyes image data to derive extremely accurate facial and speech animation [0019].
As per Claim 7, Beith and McDuff do not teach wherein: the upper portion of the facial representation includes at least one of one or more eyes, a nose, or one or more eyebrows of the facial representation; and the lower portion of the facial representation includes at least one of a mouth, one or more cheeks, or a chin of the facial representation.  However, Li teaches wherein: the upper portion of the facial representation includes at least one of one or more eyes, a nose, or one or more eyebrows of the facial representation [0047]; and the lower portion of the facial representation includes at least one of a mouth, one or more cheeks, or a chin of the facial representation [0049].  This would be obvious for the reasons given in the rejection for Claim 6.
As per Claim 19, Claim 19 is similar in scope to Claim 6, and therefore is rejected under the same rationale.
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Beith (US 20240078732A1) and McDuff (US 20200279553A1) in view of Choi (US011568647B2).
	Beith and McDuff are relied on for the teachings as discussed above relative to Claim 1.
	However, Beith and McDuff do not teach wherein the one or more first neural networks are trained using adversarial training in order to learn a discriminator that predicts if the emotional state is from a distribution.  However, Choi teaches wherein the one or more first neural networks are trained using adversarial training in order to learn a discriminator that predicts if the emotional state is from a distribution (first generative adversarial networks, first discriminator configured to compare the emotion expression image created from the first generator with a preset comparison image, determine whether or not the input image is a comparison image or a created image according to the comparison result, col. 10, line 64-col. 11, line 41).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beith and McDuff so that the one or more first neural networks are trained using adversarial training in order to learn a discriminator that predicts if the emotional state is from a distribution because Choi suggests that this improves the GAN so that it can create a more complex emotion expression video (col. 1, lines 34-50).
Claim(s) 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Beith (US 20240078732A1) and McDuff (US 20200279553A1) in view of Li (US 20170243387A1) and Karras (US 20180336464A1).
	Beith and McDuff are relied on for the teachings as discussed above relative to Claim 9.
However, Beith and McDuff do not teach wherein: the one or more neural networks are trained using a first loss function that is associated with the upper portion of the facial representation of the character; and the one or more neural networks are trained using a second loss function that is associated with a lower portion of the facial representation of the character.  However, Li teaches wherein: the one or more first neural networks are trained based at least on animating the upper portion of the facial representation of the character [0047]; and the one or more second neural networks are trained based at least on animating a lower portion of the facial representation of the character [0049], as discussed in the rejection for Claim 6.
	However, Beith, McDuff, and Li do not teach wherein: the one or more neural networks are trained using a first loss function that is associated with the upper portion of the facial representation of the character; and the one or more neural networks are trained using a second loss function that is associated with a lower portion of the facial representation of the character.  However, Karras teaches the one or more neural networks are trained using a loss function that is associated with the facial representation of the character (training the network involves the steps of: comparing the output of the network with a desired target output in the training dataset using a loss function, the network parameters are then updated based on the result of the loss function, [0092], [0083, 0090]).  Thus, this teaching of the loss function from Karras can be implemented into the neural networks of Li so that the one or more neural networks are trained using a first loss function that is associated with the upper portion of the facial representation of the character; and the one or more neural networks are trained using a second loss function that is associated with a lower portion of the facial representation of the character.  
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beith, McDuff, and Li so that the one or more neural networks are trained using a first loss function that is associated with the upper portion of the facial representation of the character; and the one or more neural networks are trained using a second loss function that is associated with a lower portion of the facial representation of the character because Karras suggests that this way, the neural network can be trained to be more accurate [0092].
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONI HSU whose telephone number is (571)272-7785. The examiner can normally be reached M-F 10am-6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





JH
/JONI HSU/Primary Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Jun 06, 2023
Application Filed
Mar 11, 2025
Non-Final Rejection — §103
May 28, 2025
Applicant Interview (Telephonic)
May 28, 2025
Examiner Interview Summary
May 28, 2025
Response Filed
Sep 08, 2025
Final Rejection — §103
Oct 09, 2025
Response after Non-Final Action
Oct 29, 2025
Request for Continued Examination
Nov 06, 2025
Response after Non-Final Action
Nov 21, 2025
Non-Final Rejection — §103
Feb 20, 2026
Examiner Interview Summary
Feb 20, 2026
Applicant Interview (Telephonic)
Feb 20, 2026
Response Filed
Mar 05, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/257,410
Patent 12592028
METHODS AND DEVICES FOR IMMERSING A USER IN AN IMMERSIVE SCENE AND FOR PROCESSING 3D OBJECTS
2y 5m to grant Granted Mar 31, 2026
18/337,537
Patent 12586306
METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR MODELING OBJECT
2y 5m to grant Granted Mar 24, 2026
18/432,989
Patent 12586260
CREATING IMAGE ENHANCEMENT TRAINING DATA PAIRS
2y 5m to grant Granted Mar 24, 2026
18/027,304
Patent 12581168
A METHOD FOR A MEDIA FILE GENERATING AND A METHOD FOR A MEDIA FILE PROCESSING
2y 5m to grant Granted Mar 17, 2026
18/449,286
Patent 12561850
IMAGE GENERATION WITH LEGIBLE SCENE TEXT
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
87%
Grant Probability
95%
With Interview (+7.2%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 848 resolved cases by this examiner. Grant probability derived from career allow rate.
FACIAL ANIMATION USING EMOTIONS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email