Last updated: April 19, 2026

Application No. 18/509,134

TECHNIQUES FOR ANIMATING AN AVATAR BASED ON SENSOR DATA FROM AN ARTIFICIAL-REALITY HEADSET COLLECTED WHILE PREPARING A SPEECH-BASED COMMUNICATION, AND SYSTEMS AND METHODS USING THESE TECHNIQUES

Final Rejection §102§103

Filed

Nov 14, 2023

Examiner

DEMETER, HILINA K

Art Unit

2617

Tech Center

2600 — Communications

Assignee

Meta Platforms Technologies, LLC

OA Round

2 (Final)

Interview Optional

— +19.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 659 resolved cases, 2023–2026

Examiner Intelligence

DEMETER, HILINA K View full profile →

Grants 72% — above average

Career Allow Rate

472 granted / 659 resolved

+9.6% vs TC avg

Strong +19% interview lift

Without

With

+19.4%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

27 currently pending

Career history

686

Total Applications

across all art units

Statute-Specific Performance

§101

8.7%

-31.3% vs TC avg

§103

61.0%

+21.0% vs TC avg

§102

14.5%

-25.5% vs TC avg

§112

6.7%

-33.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 659 resolved cases

Office Action

§102 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed on 01/30/2026 have been fully considered but they are not persuasive. 

On page 8, Applicant argues that Beith does not teach “during a period of time in which the wearer of augmented-reality glasses is recording an electronic communication that includes vocalization”.

In response: Beith teaches para. [0062], note that sensor data associated with a user, such as audio data representing the user's speech, image data representing one or more portions of the user's face, motion data corresponding to movement of the user or the user's head, or a combination thereof, is used to determine a semantical context associated with such data. For example, the semantical context can correspond to the meaning of a word, phrase, or sentence spoken (or predicted to be spoken) by the user, which may be used to inform the avatar's facial expression. Also see para. [0174], mouth animation is shown. Thus, the stated argument is not persuasive.


On page 9, Applicant argues that Beith does not teach “animate head movements of an avatar based on receiving first data indicating movements of a head of the wearer during a period of time in which the wearer of the augmented reality glasses is recording an electronic communication that includes vocalizations”. 

In response: Beth disclosed para. [0176], note that the representation 152 of the avatar 154, the audio output 2340, or both, can be sent to a second device (e.g., transmitted to a headset of a user of the system 2300, a device of a remote user, a server, etc.) for display of the avatar 154, playback of the audio output 2340, or both. Also see, para. [0221], note that the VAD can also be configured to check whether other applications are in use that may indicate whether non-user speech may be present, such as a video conferencing application, an audio or video playback application, etc., which may further inform the VAD as to whether speech in the audio data 204 is from the user. Thus, the stated argument is not persuasive.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-8, 10-11, 14-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Beith et al. (US Publication Number 2024/0078731 A1, hereinafter “Beith”).

(1) regarding claim 1:
As shown in fig. 1, Beith disclosed a method (para. [0058], note that systems and methods of generating avatar facial expressions are disclosed), comprising: 
animating head movements of an avatar representing a wearer of augmented-reality glasses (154, fig. 1, para. [0077], note that the face data adjuster 130 can receive face data 132, such as data corresponding to a rough mesh that represents a face of a user 108 and that is used as a reference for generation of a face of the avatar 154), based on receiving first data indicating movements of a head of the wearer of the augmented-reality glasses during a period of time in which the wearer of the augmented-reality glasses is recording an electronic communication that includes vocalizations (para. [0074], note that the semantical context 122 is based on a meaning of speech represented in the audio data, based on an emotion associated with speech represented in the audio data, based on an audio event detected in the audio data, or a combination thereof. The sensor data 106 includes image data (e.g., video data), and the semantical context 122 is based on an emotion associated with an expression on a user's face represented in the image data), wherein the first data was obtained from a first sensor while the wearer was wearing the augmented-reality glasses (para. [0082], note that the device 102 can correspond to a mobile phone or computer device (e.g., a laptop computer or a server), and the one or more sensors 104 can be coupled to or integrated in an extended reality (“XR”) headset, such as a virtual reality (“VR”), augmented reality (“AR”), or mixed reality (“MR”) headset device (e.g., an HMD), that is worn by the user 108.); and 
animating mouth movements of the avatar representing the wearer, based on receiving second data indicating the vocalizations of the wearer during the period of time (para. [0062], note that sensor data associated with a user, such as audio data representing the user's speech, image data representing one or more portions of the user's face, motion data corresponding to movement of the user or the user's head, or a combination thereof, is used to determine a semantical context associated with such data. For example, the semantical context can correspond to the meaning of a word, phrase, or sentence spoken (or predicted to be spoken) by the user, which may be used to inform the avatar's facial expression. Also see para. [0174], mouth animation), wherein the second data was obtained from a second sensor, distinct from the first sensor, while the wearer was wearing the augmented-reality glasses (para. [0177], note that the audio data 204 may represent speech of a user of the system 2300 (e.g., captured by one or more microphones 202), and the feature data generator 120 may be configured to process the audio data 204 to generate the first output data 2320 representing the user's speech), 
wherein an animated version of the avatar having the head movements and the mouth movements is provided to a device other than the augmented-reality glasses for presentation in conjunction with playback of the electronic communication at the device other than the augmented-reality glasses (para. [0176], note that the representation 152 of the avatar 154, the audio output 2340, or both, can be sent to a second device (e.g., transmitted to a headset of a user of the system 2300, a device of a remote user, a server, etc.) for display of the avatar 154, playback of the audio output 2340, or both).

(2) regarding claim 2:
Beith further disclosed the method of claim 1, wherein: the first sensor is an inertial measurement unit (IMU) of the augmented-reality glasses (para. [0081], note that the one or more sensors 104 include one or more motion sensors, such as an inertial measurement unit (IMU) or other sensors configured to detect movement, acceleration, orientation, or a combination thereof), and the second sensor is a microphone of the augmented-reality glasses (para. [0177], note that the audio data 204 may represent speech of a user of the system 2300 (e.g., captured by one or more microphones 202)).

(3) regarding claim 3:
Beith further disclosed the method of claim 2, wherein the first data indicating the movements of the head of the wearer and the second data indicating the vocalizations are obtained without using any imaging sensors of the augmented-reality glasses (para. [0177], note that the audio data 204 may represent speech of a user of the system 2300 (e.g., captured by one or more microphones 202)).

(4) regarding claim 4:
Beith further disclosed the method of claim 3, wherein: the augmented-reality glasses include one or more imaging sensors that are configured to detect movements of the wearer (para. [0081], note that the one or more sensors 104 include one or more motion sensors, such as an inertial measurement unit (IMU) or other sensors configured to detect movement, acceleration, orientation, or a combination thereof); and before the first data and the second data have been obtained, the one or more imaging sensors of the augmented-reality glasses are caused to be deactivated based on one or more of (i) an indication of a limited amount of storage at the augmented-reality glasses, and (ii) an indication of a limited amount of battery at the augmented-reality glasses (para. [0105], note that both of the above-described implementations enable reduction in camera usage, which results in power savings due to the one or more cameras 206 being used less, turned off, or omitted from the system 200 entirely).

(5) regarding claim 5:
Beith further disclosed the method of claim 1, wherein the animating of the head movements and the animating of the mouth movements are performed at another electronic device, distinct from the augmented-reality glasses (para. [0176], note that the representation 152 of the avatar 154, the audio output 2340, or both, can be sent to a second device (e.g., transmitted to a headset of a user of the system 2300, a device of a remote user, a server, etc.) for display of the avatar 154, playback of the audio output 2340, or both). 

(6) regarding claim 6:
Beith further disclosed the method of claim 1, further comprising animating body movements of the avatar based on detecting that the wearer is performing a physical activity (para. [0233], note that the assistant operations can include initiating or joining an online activity with one or more other participants, such as an online game or virtual conference, in which the user is represented by the avatar 154. For example, the wireless speaker and voice activated device 3902 may send the representation 152 of the avatar 154, the audio output 2340, or both, to another device (e.g., a gaming server) that can include the avatar 154 in a virtual setting that is shared by the other participants).

(7) regarding claim 7:
Beith further disclosed the method of claim 1, wherein a visual aspect of the avatar is based on a particular application from which the electronic communication is initiated at the augmented-reality glasses (para. [0233], note that the assistant operations can include initiating or joining an online activity with one or more other participants, such as an online game or virtual conference, in which the user is represented by the avatar 154).

(8) regarding claim 8:
Beith further disclosed the method of claim 1, wherein animating one or more of the (i) head movements of the avatar, and (ii) the mouth movements of the avatar cause animation of one or more accessory elements presented in conjunction with the avatar (para. [0098], note that by processing the image data 208 using neural networks of the image unit 226 that are trained to specifically detect facial expressions and movements associated with speaking, conveying emotion, etc., such as in the vicinity of the eyes and mouth, and using such detected facial expressions and movements when generating the feature data 124, the resulting adjusted face data 134 can provide a more accurate and realistic facial expression of the avatar 154).


(9) regarding claim 10:
Beith further disclosed the method of claim 1, wherein animating the head movements is based on fusing the first data from the first sensor with the second data from the second sensor (para. [0114], note that the one or more audio-based features 320 and the one or more image-based features 322 are combined (e.g., concatenated, fused, etc.) in the feature data 124 to be used by the face data adjuster 130 in generating the adjusted face data 134).

(11) regarding claim 11:
Beith further disclosed the method of claim 1, wherein animating the head movements is further based on the second data (para. [0062], note that sensor data associated with a user, such as audio data representing the user's speech, image data representing one or more portions of the user's face, motion data corresponding to movement of the user or the user's head, or a combination thereof, is used to determine a semantical context associated with such data).

(12) regarding claim 14:
Beith further disclosed the method of claim 1, wherein: the augmented-reality glasses include (i) a first temple that includes a first audio sensor, and (ii) a second temple with a second audio sensor (para. [0103], note that the system 200 uses the one or more microphones 202 to capture/record the user's auditory behaviors to recognize sounds generated by the user and identify emotions); and based on the first data indicating the movements of the head of the wearer by the first sensor, applying a higher weighting to data from the first audio sensor than data from the second audio sensors (para. [0103], note that the recognized auditory information can inform the system 200 as to the current behavior, emotion, or both, that the user's face is demonstrating, and the user's face also has facial expressions associated with the behavior or emotion. For example, if the user is laughing, then the system 200 can exclude certain facial expressions that are not associated with laughter and can therefore select from a smaller set of specific facial expressions when determining the avatar facial expression 156).

(13) regarding claim 15:
Beith further disclosed the method of claim 1, further comprising: adjusting the avatar based on data from additional sensors of the augmented-reality glasses or another electronic device that is in electronic communication with the augmented-reality glasses (para. [0103], note that the system 200 may identify a laugh in the audio data 204, and in response to identifying the laugh, the system 200 can adjust the avatar facial expression 156 to make the mouth smile bigger, make the eyes tighten, enhance crow's feet around eyes, show dimples, etc.), the additional sensors including one or more of: a neuromuscular-signal sensor (e.g., an electromyography (EMG) electrode); a time-of-flight (TOF) sensor; a mechanomyography (MMG) sensor; a photoplethysmography (PPG) sensor; and a camera (para. [0105], note that he device 102 may intermittently use the one or more cameras 206 to augment the audio data 204 to assist in creating the expressions of the avatars. Both of the above-described implementations enable reduction in camera usage, which results in power savings due to the one or more cameras 206 being used less).

The proposed rejection of claims 1-4, renders inherent the steps of the non-transitory computer readable medium claims (para. [0007], note that a non-transitory computer-readable medium includes instructions that, when executed by one or more processors, cause the one or more processors to process image data corresponding to a user's face to generate face data) 16-19 and the device claim (para. [0082], the device 102 can correspond to a mobile phone or computer device (e.g., a laptop computer or a server), and the one or more sensors 104 can be coupled to or integrated in an extended reality (“XR”) headset, such as a virtual reality (“VR”), augmented reality (“AR”), or mixed reality (“MR”) headset device (e.g., an HMD), that is worn by the user 108)20 because these steps occur in the operation of the proposed rejection as discussed above. Thus, the arguments similar to that presented above for claims 1-4 are equally applicable to claims 16-20.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 9 and 12-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Beith in view of Ni et al. (US Publication Number 2018/0225857 A1, hereinafter “Ni”).

(1) regarding claim 9:
Beith disclosed most of the subject matter as described as above except for specifically teaching wherein animating the head movements of the avatar representing the wearer includes applying a model to the first data, the model selected from a group consisting of (i) an axis-angle model, (ii) a Euler model, and (iii) a quaternion model.
However, Ni disclosed wherein animating the head movements of the avatar representing the wearer includes applying a model to the first data, the model selected from a group consisting of (i) an axis-angle model, (ii) a Euler model, and (iii) a quaternion model (para. [0072], note that a head bone structure of the head portion of the rigged 3D model may be animated based on a first quaternion. The first quaternion may be computed from a joint position of the head bone structure in the head portion).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein animating the head movements of the avatar representing the wearer includes applying a model to the first data, the model selected from a group consisting of (i) an axis-angle model, (ii) a Euler model, and (iii) a quaternion model. The suggestion/motivation for doing so would have been provided the ability to animate a realistic 3D model and visualize real objects in a 3D computer graphics environment (abs.). Therefore, it would have been obvious to combine Beith with Ni to obtain the invention as specified in claim 9.

(2) regarding claim 12:
Beith disclosed most of the subject matter as described as above except for specifically teaching wherein the second data is applied to a quaternion model for animating head movements.
However, Ni disclosed wherein the second data is applied to a quaternion model for animating head movements (para. [0072], note that a head bone structure of the head portion of the rigged 3D model may be animated based on a first quaternion. The first quaternion may be computed from a joint position of the head bone structure in the head portion).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the second data is applied to a quaternion model for animating head movements. The suggestion/motivation for doing so would have been provided the ability to animate a realistic 3D model and visualize real objects in a 3D computer graphics environment (abs.). Therefore, it would have been obvious to combine Beith with Ni to obtain the invention as specified in claim 12.

(3) regarding claim 13:
Beith further disclosed the method of claim 12, wherein the second data is used to determine an amount of tilt correction to be applied to at least part of the first data (para. [0109], note that the device 102 may prevent the avatar 154 from expressing behaviors indicating that the user 108 is inattentive during an interaction, such as when the user 108 check the user's phone (e.g., head tilts downward, eye focus lowers, facial expression suddenly changes, etc.). The feature data generator 120 may adjust the feature data 124 to cause the avatar 154 to express subtle visual facial cues to make the communication more comfortable, to exhibit courteous behaviors, etc., that are not actually expressed by the user 108.).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Todasco et al. (US Publication Number 2024/0004456 A1) disclosed systems and methods for automated configuration of augmented and virtual reality avatars for user specific behaviors.

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
/HILINA K DEMETER/Primary Examiner, Art Unit 2617

Read full office action

Prosecution Timeline

Nov 14, 2023

Application Filed

Sep 25, 2025

Non-Final Rejection — §102, §103

Jan 30, 2026

Response Filed

Feb 05, 2026

Applicant Interview (Telephonic)

Feb 05, 2026

Examiner Interview Summary

Mar 06, 2026

Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/083,474

Patent 12602864

EVENT ROUTING IN 3D GRAPHICAL ENVIRONMENTS

2y 5m to grant Granted Apr 14, 2026

18/378,049

Patent 12592042

SYSTEMS AND METHODS FOR MAINTAINING SECURITY OF VIRTUAL OBJECTS IN A DISTRIBUTED NETWORK

2y 5m to grant Granted Mar 31, 2026

17/966,363

Patent 12586297

INTERACTIVE IMAGE GENERATION

2y 5m to grant Granted Mar 24, 2026

18/331,906

Patent 12579724

EXPRESSION GENERATION METHOD AND APPARATUS, DEVICE, AND MEDIUM

2y 5m to grant Granted Mar 17, 2026

18/154,219

Patent 12561906

METHOD FOR GENERATING AT LEAST ONE GROUND TRUTH FROM A BIRD'S EYE VIEW

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

72%

Grant Probability

91%

With Interview (+19.4%)

3y 1m

Median Time to Grant

Moderate

PTA Risk

Based on 659 resolved cases by this examiner. Grant probability derived from career allow rate.