Last updated: May 29, 2026
Application No. 18/532,611
DEVICE AND METHOD FOR CONTROLLING A VIRTUAL AVATAR ON AN ELECTRONIC DEVICE

Final Rejection §103
Filed
Dec 07, 2023
Priority
Dec 08, 2022 — GB 2218463.4
Examiner
SHENG, XIN
Art Unit
2619
Tech Center
2600 — Communications
Assignee
Sony Interactive Entertainment LLC
OA Round
2 (Final)
Interview Optional

— +17.2% interview lift. Examiner has a relatively high allowance rate (72%); +17.2% interview lift. A written response may suffice.
Based on 404 resolved cases, 2023–2026
Examiner Intelligence

SHENG, XIN View full profile →
Grants 72% — above average
Career Allowance Rate
293 granted / 404 resolved
+10.5% vs TC avg
Strong +17% interview lift
Without
With
+17.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
14 currently pending
Career history
421
Total Applications
across all art units
Statute-Specific Performance

§101
1.6%
-38.4% vs TC avg
§103
94.5%
+54.5% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
0.3%
-39.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 404 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s amendments and remarks submitted 02/05/2026 have been entered and considered, Claims 1, 14, 19 are amended. Claim 2 is canceled. This action is made final.

Response to Arguments
Applicant’s arguments filed on 02/05/2026 have been fully considered but are not persuasive.
Regarding Claim 1 limitation “identifying a user characteristic from the input data; and detecting a frequency of the user characteristic, wherein if the frequency exceeds a predetermined limit the user characteristic is determined to be the user mannerism;”.

Applicant argues “Kipman does not teach or suggest the concept of "frequency" in the context of user characteristics or mannerisms. Instead, Kipman describes confidence testing between pose models and defines frequency as being a number of confidence tests…. Accordingly, rather than identifying a user characteristic from input data, detecting a frequency of the user characteristic, and determining the user characteristic is a user mannerism if the frequency exceeds a predetermined limit, Kipman describes performing confidence testing repeatedly to determine an accurate pose. Since a confidence test is different from a user characteristic and since Kipman fails to disclose any analysis or operations involving determining how often a user exhibits a particular characteristic,”.
However, Kipman, abstract, the invention teaches a method of tracking a target includes receiving an observed depth image of the target from a source and analyzing the observed depth image with a prior-trained collection of known poses to find an exemplar pose that represents an observed pose of the target. The method further includes rasterizing a model of the target into a synthesized depth image having a rasterized pose and adjusting the rasterized pose of the model into a model-fitting pose based, at least in part, on differences between the observed depth image and the synthesized depth image. Either the exemplar pose or the model-fitting pose is then selected to represent the target.
[00136] As indicated at 146, a posed model acquired via model fitting can optionally be compared to a posed model acquired via exemplar. In particular, one or more confidence tests can be used to determine which pose is believed to be a more accurate representation of the target. When such a comparison is made, the pose that is believed to be more accurate can be selected while the other pose is discarded and/or saved to facilitate subsequent pose determinations.
[00137] The relative frequency of when an exemplar pose, a model-fitting pose, or a combined pose are tested and/or chosen can be varied without departing from the scope of this disclosure. In some embodiments, a pose acquired via model fitting can be tested against a pose acquired via exemplar every frame. In other embodiments, such a
comparison may only be carried out every nth frame, anytime the target moves or changes poses by more than a threshold, or every time confidence in either the model fitting model or the exemplar model falls below a threshold.
Therefore, confidence test is applied to the detected user pose, movement, gesture and etc. (mannerism). The frequency is the number of confidence tests applied. If the frequency is over a predetermined threshold, the confidence score is high. The frequency of the confidence test reflected the number of frames the test is applied on (See [00137] “In some embodiments, a pose acquired via model fitting can be tested against a pose acquired via exemplar every frame”). It is obvious to a person with ordinary skill in the art that the more often the same pose is detected among the frames, the more confidence the system will have, to conclude that the detected pose is accurate. The high frequence of a detected pose (gesture) indicates user’s mannerism.
Perez and Kipman are analogous art because they both teach method of tracking/capturing user characteristics and apply to avatar. Kipman further teaches analyzing user’s characteristics and calculate confidence score. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the avatar modification method based on captured user mannerism (taught in Perez), to further use the confidence score to assess the accuracy of the user characteristics (taught in Kipman), so as to provide user with more realistic visual representation in a virtual environment such as video game (Kipman, [0001]).
Therefore, the combination of Perez and Kipman still teaches the above mentioned limitation of Claim 1.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4-5, 7-9, 11-16, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Perez et al (US20110007079) in view of Kipman et al (WO2010088033).

Regarding Claim 1. Perez teaches A computer-implemented method for controlling a virtual avatar on an electronic device (Perez, abstract, the invention describes method of customizing avatar based on captured user data. Data captured with respect to a human may be analyzed and applied to a visual representation of a user such that the visual representation begins to reflect the behavioral characteristics of the user. For example, a system may have a capture device that captures data about the user in the physical space. The system may identify the user's characteristics, tendencies, voice patterns, behaviors, gestures, etc. Over time, the system may learn a user's tendencies and intelligently apply animations to the user's avatar such that the avatar behaves and responds in accordance with the identified behaviors of the user. The animations applied to the avatar may be animations selected from a library of pre-packaged animations, or the animations may be entered and recorded by the user into the avatar's avatar library.), the method comprising:
providing a virtual avatar associated with a user, wherein the virtual avatar
is associated with a plurality of poses and a plurality of animations (Perez, [0052] As shown, in FIG. 2, the computing environment 12 may include an avatar library 196 that comprises animations selectable for application to the user's avatar. The user profile 198 may include an avatar library or otherwise associated with an avatar library with animations specific to the user. The lookup tables may include the user's behaviors and tendencies with respect to various contextual circumstances. The profile may be use to apply animations to the user's avatar so that the avatar closely reflects the behaviors of the user.
[0053] The animations in the avatar library may comprise any characteristic that may be applied to an avatar. The characteristics may be indicative of a user's behavior. Thus, the animation selected for application to the user's behavior may be selected to correspond to the user's behaviors.);
receiving input data associated with the user from at least one input source (Perez, [0027] The capture device 20 may capture data representative of a user's behaviors. For example, the capture device may capture characteristics that are indicative of the user's behaviors. Such characteristics may include the user's body position, facial expressions, vocal commands, speech patterns, gestures, or the like.);
processing the input data, the input data comprising images of the user
and/or audio data captured from the user (Perez, [0027] The capture device 20 may capture data representative of a user's behaviors. For example, the capture device may capture characteristics that are indicative of the user's behaviors. Such characteristics may include the user's body position, facial expressions, vocal commands, speech patterns, gestures, or the like.
Although Perez didn’t explicitly indicate capturing images of the user, the capture device obtains data such as facial expressions and gestures which requires image being captured.);
determining a user mannerism from the processed input data by: (Perez, [0018] The system may track the user and any motion in the physical space and identify characteristics of the user that can be applied to the user's visual representation. The identified characteristics may be indicative of the user's behaviors. For example, the system may identify the user's physical characteristics, tendencies, voice patterns, gestures, etc. The system may continue to track the user over time and apply modifications or updates to the user's avatar based on the history of the tracked data. For example, the capture device may continue to identify behaviors and mannerisms, emotions, speech patterns, or the like, of a user and apply these to the user's avatar. The fidelity of the rendered avatar, with respect to the resemblance of the avatar to the avatar, increases over time as the system gathers history data for that user.); 

Perez fails to explicitly teach, however, Kipman teaches identifying a user characteristic from the input data; and
detecting a frequency of the user characteristic, wherein if the frequency exceeds a predetermined limit the user characteristic is determined to be the user mannerism; (Kipman, abstract, the invention teaches a method of tracking a target includes receiving an observed depth image of the target from a source and analyzing the observed depth image with a prior-trained collection of known poses to find an exemplar pose that represents an observed pose of the target. The method further includes rasterizing a model of the target into a synthesized depth image having a rasterized pose and adjusting the rasterized pose of the model into a model-fitting pose based, at least in part, on differences between the observed depth image and the synthesized depth image. Either the exemplar pose or the model-fitting pose is then selected to represent the target.
[00135] A posed model acquired using the above-described model fitting process, or a rasterized version thereof, may be compared to the observed depth image in order to assess a relative confidence in the acquired pose. Such a confidence may be assessed per joint, body part, or pixel, or the confidence may be assessed for the model as a whole.
[00136] As indicated at 146, a posed model acquired via model fitting can optionally be compared to a posed model acquired via exemplar. In particular, one or more confidence tests can be used to determine which pose is believed to be a more accurate representation of the target. When such a comparison is made, the pose that is believed to be more accurate can be selected while the other pose is discarded and/or saved to facilitate subsequent pose determinations.
[00137] The relative frequency of when an exemplar pose, a model-fitting pose, or a combined pose are tested and/or chosen can be varied without departing from the scope of this disclosure. In some embodiments, a pose acquired via model fitting can be tested against a pose acquired via exemplar every frame. In other embodiments, such a
comparison may only be carried out every nth frame, anytime the target moves or changes poses by more than a threshold, or every time confidence in either the model fitting model or the exemplar model falls below a threshold.
Therefore, confidence test is applied to the detected user pose, movement, gesture and etc. (mannerism). The frequency is the number of confidence tests applied. If the frequency is over a predetermined threshold, the confidence score is high).
Perez and Kipman are analogous art because they both teach method of tracking/capturing user characteristics and apply to avatar. Kipman further teaches analyzing user’s characteristics and calculate confidence score. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the avatar modification method based on captured user mannerism (taught in Perez), to further use the confidence score to assess the accuracy of the user characteristics (taught in Kipman), so as to provide user with more realistic visual representation in a virtual environment such as video game (Kipman, [0001]).

The combination of Perez and Kipman further teaches and updating at least one of the plurality of poses and/or animations of the virtual avatar, or creating a new pose and/or animation, to apply the user mannerism to the virtual avatar (Perez, [0018] The system may track the user and any motion in the physical space and identify characteristics of the user that can be applied to the user's visual representation. The identified characteristics may be indicative of the user's behaviors. For example, the system may identify the user's physical characteristics, tendencies, voice patterns, gestures, etc. The system may continue to track the user over time and apply modifications or updates to the user's avatar based on the history of the tracked data. For example, the capture device may continue to identify behaviors and mannerisms, emotions, speech patterns, or the like, of a user and apply these to the user's avatar. The fidelity of the rendered avatar, with respect to the resemblance of the avatar to the avatar, increases over time as the system gathers history data for that user).

Regarding Claim 4. The combination of Perez and Kipman further teaches The computer-implemented method of claim 1, wherein determining the user mannerism comprises:
determining a link between the user mannerism and a trigger event or a
trigger input in the input data; and
associating the updated pose and/or animation, or the new pose and/or
animation, with the trigger event or trigger input (Perez, [0052] As shown, in FIG. 2, the computing environment 12 may include an avatar library 196 that comprises animations selectable for application to the user's avatar. The user profile 198 may include an avatar library or otherwise associated with an avatar library with animations specific to the user. The lookup tables may include the user's behaviors and tendencies with respect to various contextual circumstances. The profile may be use to apply animations to the user's avatar so that the avatar closely reflects the behaviors of the user.
[0053] The animations in the avatar library may comprise any characteristic that may be applied to an avatar. The characteristics may be indicative of a user's behavior. Thus, the animation selected for application to the user's behavior may be selected to correspond to the user's behaviors.
[0054] The animations in the avatar library 193 may be a stock library of animations. In an example embodiment, the animations applied to the avatar may be animated with an animations selected from a library of pre-packaged animations, such as those that come with a program, application, or a system, for example. The animation selected for application to the user's avatar may be that which correspond to the user's inputs learned by the system to reflect certain behaviors. For example, the system may identify that the user tends to jump up and down in a certain context, such as when achieving success in a game application. Thus, when the system identifies a similar set of contextual circumstances such as a similar state of the game (e.g., success), the system may select an animation that reflects jumping up and down and apply the animation to the user's avatar. The pre-canned animations may be defined for an application or for a system. For example, the jumping up and down animation may be applicable to a gaming application, but an open/close file animation applied to an avatar may be the same system-wide.
Therefore, specific contextual circumstances (trigger event) triggers specific movement of the user.). 

Regarding Claim 5. The combination of Perez and Kipman further teaches The computer-implemented method of claim 4, comprising using image recognition and/or audio recognition to determine a trigger event or trigger input from the input data (Perez, [0054] The animations in the avatar library 193 may be a stock library of animations. In an example embodiment, the animations applied to the avatar may be animated with an animations selected from a library of pre-packaged animations, such as those that come with a program, application, or a system, for example. The animation selected for application to the user's avatar may be that which correspond to the user's inputs learned by the system to reflect certain behaviors. For example, the system may identify that the user tends to jump up and down in a certain context, such as when achieving success in a game application. Thus, when the system identifies a similar set of contextual circumstances such as a similar state of the game (e.g., success), the system may select an animation that reflects jumping up and down and apply the animation to the user's avatar. The pre-canned animations may be defined for an application or for a system. For example, the jumping up and down animation may be applicable to a gaming application, but an open/close file animation applied to an avatar may be the same system-wide.
Therefore, the system uses captured image of user jumping up and down to determine that the circumstance such as game success triggers user’s cheering behavior.).

Regarding Claim 7. The combination of Perez and Kipman further teaches The computer-implemented method of claim 1, comprising using audio recognition and/or speech recognition to determine the user characteristic and/or user mannerism from the input data (Perez, [0121] The identity of characteristics indicative of a user's behaviors may include information that may be associated with the particular user 602 such as behavioral tendencies, speech patterns, facial expressions, skeletal movements, words spoken, history data, voice recognition information, or the like. The user's characteristics may comprise physical features of the user, such as: eye size, type, and color; hair length, type, and color; skin color; clothing and clothing colors. For example, colors may be identified based on a corresponding RGB image. Other target characteristics for a human target may include, for example, height and/or arm length and may be obtained based on, for example, a body scan, a skeletal model, the extent of a user 602 on a pixel area or any other suitable process or data. The computing system 610 may use body recognition techniques to interpret the image data and may size and shape the visual representation of the user 602 according to the size, shape and depth of the user's 602 appendages.
[0127] The system can learn any of the user's natural or idle behaviors in such circumstances and associate them to the user. For example, the system may identify how the player walks and save the motion as the walking animation in the avatar library for that user. The system can watch and listen to a user during activity under various circumstances and scenarios that may not involve a gesture or other active control of the system or the executing application. For example, when a user greets a friend in a remote game playing experience, the system may detect that the user typically greets friends with a typically greeting such as "Hi, buddy, how are you?" The same user may greet unknown players with a greeting such as "Hello, my name is ... " The system may use the captured data, including the voice fluctuations, words spoken, and any other motion, and add it to the avatar library for the user's avatar.).

Regarding Claim 8. The combination of Perez and Kipman further teaches The computer-implemented method of claim 1, further comprising outputting the updated pose and/or animation, or the new pose and/or animation (Perez, [0134] In both scenarios that the system or user adds, updates, or rewrites gesture or animation data, the system may record the user's inputs and validate the redefined gesture or animation data. For example, if the user is performing a "wave" gesture, the system can detect that the motion corresponds to the "wave" gesture via the gesture recognition engine as described above. The system may detect where the gesture starts and stops and prompt the user to perform a new motion in the physical space to overwrite the current "wave" gesture data and/or create an additional "wave" gesture that corresponds to the user's recorded motion.).

Regarding Claim 9. The combination of Perez and Kipman further teaches The computer-implemented method of claim 8, wherein outputting the updated pose and/or animation, or the new pose and/or animation, comprises: modifying at least one of a live state of the virtual avatar or the updated or new pose and/or animation to avoid a potential conflict (Perez, [0134] In both scenarios that the system or user adds, updates, or rewrites gesture or animation data, the system may record the user's inputs and validate the redefined gesture or animation data. For example, if the user is performing a "wave" gesture, the system can detect that the motion corresponds to the "wave" gesture via the gesture recognition engine as described above. The system may detect where the gesture starts and stops and prompt the user to perform a new motion in the physical space to overwrite the current "wave" gesture data and/or create an additional "wave" gesture that corresponds to the user's recorded motion.
Therefore, the new gesture is updated in the system to overwrite the current gesture.).

Regarding Claim 11.The combination of Perez and Kipman further teaches The computer-implemented method of claim 9, wherein modifying at least one of the live state of the virtual avatar or the updated or new pose and/or animation to avoid the potential conflict comprises: blending or interpolating between the live state of the virtual avatar and the updated or the new pose and/or animation (Kipman, abstract, the invention teaches a method of tracking a target includes receiving an observed depth image of the target from a source and analyzing the observed depth image with a prior-trained collection of known poses to find an exemplar pose that represents an observed pose of the target. The method further includes rasterizing a model of the target into a synthesized depth image having a rasterized pose and adjusting the rasterized pose of the model into a model-fitting pose based, at least in part, on differences between the observed depth image and the synthesized depth image. Either the exemplar pose or the model-fitting pose is then selected to represent the target.
[00136] As indicated at 146, a posed model acquired via model fitting can optionally be compared to a posed model acquired via exemplar. In particular, one or more confidence tests can be used to determine which pose is believed to be a more accurate representation of the target. When such a comparison is made, the pose that is believed to be more accurate can be selected while the other pose is discarded and/or saved to facilitate subsequent pose determinations. In some embodiments, high-confidence aspects of one pose may be combined with high-confidence aspects of the other pose to produce a combined pose that is believed to be a better representation of the target than either the model obtained via model fitting or the model obtained via exemplar. It is to be understood that in some embodiments, model fitting, as discussed with reference to steps 116-145 of FIG. 7, may be skipped if the relative confidence in the exemplar pose is above a predetermined threshold.).
The reasoning for combination of Perez and Kipman is the same as described in Claim 1.

Regarding Claim 12. The combination of Perez and Kipman further teaches The computer-implemented method of claim 1, wherein the user mannerism or the user characteristic includes one or more of: a facial expression; a sound; a word or a phrase spoken by the user; an emote; a pose; a breathing pattern; a gesture; and/or an action (Perez, [0027] The capture device 20 may capture data representative of a user's behaviors. For example, the capture device may capture characteristics that are indicative of the user's behaviors. Such characteristics may include the user's body position, facial expressions, vocal commands, speech patterns, gestures, or the like.)).

Regarding Claim 13. The combination of Perez and Kipman further teaches The computer-implemented method of claim 1, wherein the at least one input source comprises at least one primary input source configured to provide primary input data comprising images of the user and/or audio data captured from the user, and at least one secondary input source configured to provide secondary input data (Perez, [0018] The system may track the user and any motion in the physical space and identify characteristics of the user that can be applied to the user's visual representation. The identified characteristics may be indicative of the user's behaviors. For example, the system may identify the user's physical characteristics, tendencies, voice patterns, gestures, etc. The system may continue to track the user over time and apply modifications or updates to the user's avatar based on the history of the tracked data. For example, the capture device may continue to identify behaviors and mannerisms, emotions, speech patterns, or the like, of a user and apply these to the user's avatar. The fidelity of the rendered avatar, with respect to the resemblance of the avatar to the avatar, increases over time as the system gathers history data for that user.
Therefore, the system has history of the tracked data (primary input data) to preset the avatar initially. The continuous tracking data of the user is the secondary input source, which is also used to update the avatar.).

Regarding Claim 14. The combination of Perez and Kipman further teaches The computer-implemented method of claim 13, further comprising characterizing the primary input data using the secondary input data, optionally wherein characterizing the primary input data using the secondary input data comprises associating the updated pose and/or animation, or the new pose and/or animation, with an event received in the secondary input data (Perez, [0052] As shown, in FIG. 2, the computing environment 12 may include an avatar library 196 that comprises animations selectable for application to the user's avatar. The user profile 198 may include an avatar library or otherwise associated with an avatar library with animations specific to the user. The lookup tables may include the user's behaviors and tendencies with respect to various contextual circumstances. The profile may be use to apply animations to the user's avatar so that the avatar closely reflects the behaviors of the user.
[0053] The animations in the avatar library may comprise any characteristic that may be applied to an avatar. The characteristics may be indicative of a user's behavior. Thus, the animation selected for application to the user's behavior may be selected to correspond to the user's behaviors.
[0054] The animations in the avatar library 193 may be a stock library of animations. In an example embodiment, the animations applied to the avatar may be animated with an animations selected from a library of pre-packaged animations, such as those that come with a program, application, or a system, for example. The animation selected for application to the user's avatar may be that which correspond to the user's inputs learned by the system to reflect certain behaviors. For example, the system may identify that the user tends to jump up and down in a certain context, such as when achieving success in a game application. Thus, when the system identifies a similar set of contextual circumstances such as a similar state of the game (e.g., success), the system may select an animation that reflects jumping up and down and apply the animation to the user's avatar. The pre-canned animations may be defined for an application or for a system. For example, the jumping up and down animation may be applicable to a gaming application, but an open/close file animation applied to an avatar may be the same system-wide.
Therefore, specific contextual circumstances (trigger event) triggers specific movement of the user.).

Regarding Claim 15. The combination of Perez and Kipman further teaches The computer-implemented method of claim 14, wherein the secondary input data comprises gameplay data (Perez, [0052] As shown, in FIG. 2, the computing environment 12 may include an avatar library 196 that comprises animations selectable for application to the user's avatar. The user profile 198 may include an avatar library or otherwise associated with an avatar library with animations specific to the user. The lookup tables may include the user's behaviors and tendencies with respect to various contextual circumstances. The profile may be use to apply animations to the user's avatar so that the avatar closely reflects the behaviors of the user.
[0053] The animations in the avatar library may comprise any characteristic that may be applied to an avatar. The characteristics may be indicative of a user's behavior. Thus, the animation selected for application to the user's behavior may be selected to correspond to the user's behaviors.
[0054] The animations in the avatar library 193 may be a stock library of animations. In an example embodiment, the animations applied to the avatar may be animated with an animations selected from a library of pre-packaged animations, such as those that come with a program, application, or a system, for example. The animation selected for application to the user's avatar may be that which correspond to the user's inputs learned by the system to reflect certain behaviors. For example, the system may identify that the user tends to jump up and down in a certain context, such as when achieving success in a game application. Thus, when the system identifies a similar set of contextual circumstances such as a similar state of the game (e.g., success), the system may select an animation that reflects jumping up and down and apply the animation to the user's avatar. The pre-canned animations may be defined for an application or for a system. For example, the jumping up and down animation may be applicable to a gaming application, but an open/close file animation applied to an avatar may be the same system-wide.
Therefore, specific contextual circumstances (trigger event) triggers specific movement of the user. The trigger event is related to the state of game (e.g. success).).

Regarding Claim 16. The combination of Perez and Kipman further teaches The computer-implemented method of claim 1, further comprising storing each updated pose and/or animation, and/or each new pose and/or animation, in a memory (Perez, [0127] The system can learn any of the user's natural or idle behaviors in such circumstances and associate them to the user. For example, the system may identify how the player walks and save the motion as the walking animation in the avatar library for that user. The system can watch and listen to a user during activity under various circumstances and scenarios that may not involve a gesture or other active control of the system or the executing application. For example, when a user greets a friend in a remote game playing experience, the system may detect that the user typically greets friends with a typically greeting such as "Hi, buddy, how are you?" The same user may greet unknown players with a greeting such as "Hello, my name is ... " The system may use the captured data, including the voice fluctuations, words spoken, and any other motion, and add it to the avatar library for the user's avatar.
Therefore, the captured new user behavior is added to the avatar library. It is obvious to a person with ordinary skill in the art that data library can be stored in location such as database server, memory, storage device and etc.).

Regarding Claim 19. The combination of Perez and Kipman further teaches A computing device for controlling a virtual avatar on an electronic device, comprising:
a memory comprising computer readable instructions; and
a processor configured to read the computer readable instructions that
when executed causes the computing device to carry out the computer implemented method of claim 1 (Perez, [0083] In FIG. 4, the computing environment 220 comprises a computer 241, which typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer241 and includes both volatile and nonvolatile media, removable and non-removable media. The system memory 222 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 223 and
random access memory (RAM) 260. A basic input/output system 224 (BIOS), containing the basic routines that help to transfer information between elements within computer 241, such as during start-up, is typically stored in ROM 223. RAM 260 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 259. By way of example, and not limitation, FIG. 4 illustrates operating system 225, application programs 226, other program modules 227, and program data 228.).

Claims 3, 17 are rejected under 35 U.S.C. 103 as being unpatentable over Perez et al (US20110007079) in view of Kipman et al (WO2010088033) further in view of Zhang et al (CN114967937).

Regarding Claim 3. The combination of Perez and Kipman fails to explicitly teach, however, Zhang teaches The computer-implemented method of claim 1, wherein determining the user mannerism comprises comparing at least a portion of the input data to input data received from one or more other users (Zhang, abstract, the invention teaches a virtual human motion generation method and system. The generation method and system comprise the steps of identifying identity information of a target object so as to generate an archive of the object, further, analyzing actions of the target object when the target object communicates with a virtual person, and particularly, setting small habitual actions which are made by focusing on the target object in an unconscious manner as habitual behaviors of the target object, recording the habitual behavior in a file of the target object; furthermore, whether the other target objects have positive attitudes when communicating with the virtual person and based on the set character of the virtual person, at least one habitual action of the virtual person for reproducing the target object at a proper time is calculated, so that the interaction scene of the two parties is more intimate.
[0003] On another front, related research indicates that in real-life human communication situations, if both parties are enthusiastic about communicating and maintain a positive attitude, they often unconsciously mimic the other person's gestures; for example, spreading their hands, shrugging their shoulders, or nodding rhythmically while speaking, thus creating a more friendly atmosphere.
Therefore, it is obvious to a person with ordinary skill in the art to further consider other user’s gestures when simulating the avatar of the user of interest.).
Perez, Kipman and Zhang are analogous art because they all teach method of tracking/capturing user characteristics and apply to avatar. Zhang further teaches considering the effect of other user’s gesture when simulating the avatar for the use of interest. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the avatar modification method based on captured user mannerism (taught in Perez and Kipman), to further consider the effect of surrounding other user’s gesture (taught in Zhang), so as to provide a more accurate method to simulate real life communication scenarios (Zhang, [0001-0004]).

Regarding Claim 17. The combination of Perez, Kipman and Zhang further teaches The computer-implemented method of claim 1, wherein the step of determining the user mannerism is at least partially implemented by a data model (Zhang, abstract, the invention teaches a virtual human motion generation method and system. The generation method and system comprise the steps of identifying identity information of a target object so as to generate an archive of the object, further, analyzing actions of the target object when the target object communicates with a virtual person, and particularly, setting small habitual actions which are made by focusing on the target object in an unconscious manner as habitual behaviors of the target object, recording the habitual behavior in a file of the target object; furthermore, whether the other target objects have positive attitudes when communicating with the virtual person and based on the set character of the virtual person, at least one habitual action of the virtual person for reproducing the target object at a proper time is calculated, so that the interaction scene of the two parties is more intimate.
[0111] Building the classification model requires implementing the machine learning steps of the classification model using a large amount of data; in a specific example, machine learning includes acquiring information on multiple different poses, which includes multiple consecutive action frames and multiple sets of (X, Y, Z) coordinate data; machine learning may include using two support vector machines, one using a linear kernel and the other using an RBF kernel, to build two classification models; both classification models can be trained using a tolerance of ε=0:00001 and a one-to-one method;).
Perez, Kipman and Zhang are analogous art because they all teach method of tracking/capturing user characteristics and apply to avatar. Zhang further teaches using machine learning to set up classification data model to detect user mannerism. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the avatar modification method based on captured user mannerism (taught in Perez and Kipman), to further use the machine learning data model to obtain user characteristics (taught in Zhang), so as to provide a more efficient method to simulate real life communication scenarios (Zhang, [0001-0004]).
	
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Perez et al (US20110007079) in view of Kipman et al (WO2010088033) further in view of Du et al (US20140055554).

Regarding Claim 6. The combination of Perez and Kipman fails to explicitly teach, however, Du teaches The computer-implemented method of claim 1, comprising using facial tracking and/or head tracking to determine the user characteristic and/or user mannerism from the input data (Du, abstract, the invention describes a video communication system that replaces actual live images of the participating users with animated avatars. A method may include selecting an avatar, initiating communication, capturing an image, detecting a face in the image, determining facial characteristics from the face, including eye movement and eyelid movement of a user indicative of direction of user gaze and blinking, respectively, converting the facial features to avatar parameters, and transmitting at least one of the avatar selection or avatar parameters.
Perez, [0121] The identity of characteristics indicative of a user's behaviors may include information that may be associated with the particular user 602 such as behavioral tendencies, speech patterns, facial expressions, skeletal movements, words spoken, history data, voice recognition information, or the like. The user's characteristics may comprise physical features of the user, such as: eye size, type, and color; hair length, type, and color; skin color; clothing and clothing colors. For example, colors may be identified based on a corresponding RGB image. Other target characteristics for a human target may include, for example, height and/or arm length and may be obtained based on, for example, a body scan, a skeletal model, the extent of a user 602 on a pixel area or any other suitable process or data. The computing system 610 may use body recognition techniques to interpret the image data and may size and shape the visual representation of the user 602 according to the size, shape and depth of the user's 602 appendages.).
Perez, Kipman and Du are analogous art because they all teach method of tracking/capturing user characteristics and apply to avatar. Du further teaches tracking user’s facial expression to determine user’s facial characteristics. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the avatar modification method based on captured user mannerism (taught in Perez and Kipman), to further use the facial tracking method to capture user’s facial characteristics (taught in Du), so as to provide user with realistic avatar in interactive environment (Du, [0001-0003]). 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Perez et al (US20110007079) in view of Kipman et al (WO2010088033) further in view of Khan et al (US10668382).

Regarding Claim 10. The combination of Perez and Kipman fails to explicitly teach, however, Khan teaches The computer-implemented method of claim 9, wherein modifying at least one of the live state of the virtual avatar or the updated or new pose and/or animation to avoid the potential conflict comprises: supressing at least one feature of the live state virtual avatar; and/or overriding at least one feature of the updated or the new pose and/or animation (Khan, abstract, the invention describes Methods and systems are provided for augmenting a video game with an avatar of a real world person. A method provides for executing a video game being played by a user via a head mounted display (HMD). The method includes operations for identifying a generic spectator within the video game and determining virtual coordinates associated with the generic spectator. The method also provides receiving data associated with an avatar of a real world person that is usable to display the avatar within the video game in three-dimensional (3D) form. The method provides removing the generic spectator from view within the video game and inserting the avatar into the video game at the virtual coordinates associated with the generic spectator. The method further provides rendering a virtual reality (VR) presentation of the video game for the user having a view of the avatar and sending the VR presentation to an HMD of the user for display.
Col 7, line 54-67, col 8, line 1-4, FIG. 5 shows an overall flow of a method for augmenting a VR video game with an avatar associated with a real world spectator, the avatar changing in appearance in real-time based on real world states of the real world spectator. The method proceeds in operation 500 by executing a video game at a server for a user of an HMD, wherein the executing the video game includes augmentation of the video game with an avatar of a spectator. In operation 510, the method receives real-time sensor data relating to facial expression and body positioning of the spectator via a plurality of sensors associated with the spectator. The method the flows to operation 520, which serves to map the real-time sensor data to virtual states of the avatar related to facial expression and body position of the avatar. In operation 530, the method then modifies an appearance of the avatar of the spectator in real-time according to the virtual states of the avatar relating to facial expression and body positioning of the avatar.).
Perez, Kipman and Khan are analogous art because they all teach method of tracking/capturing user characteristics and apply to avatar. Khan further teaches updating the avatar based on changes of the corresponding user in real-time. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the avatar modification method based on captured user mannerism (taught in Perez and Kipman), to further use the real-time modification method to keep avatar behavior in sync with the corresponding user (taught in Khan), so as to provide user with more realistic and life-life video game scenes (Khan, col 1, line 22-31). 

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Perez et al (US20110007079) in view of Kipman et al (WO2010088033), Zhang et al (CN114967937) further in view of Beith et al (US20240078731).

Regarding Claim 18. The combination of Perez, Kipman and Zhang fails to explicitly teach, however, Beith teaches The computer-implemented method of claim 17, wherein the data model includes an artificial neural network, wherein the artificial neural network is configured to use machine learning to determine the user mannerism from the input data, optionally wherein the artificial neural network is a convolutional neural network (Beith, abstract, the invention describes a device includes a memory and one or more processors configured to process image data corresponding to a user's face to generate face data. The one or more processors are configured to process sensor data to generate feature data and to generate a representation of an avatar based on the face data and the feature data. The one or more processors are also configured to generate an audio output for the avatar based on the sensor data.
[0090] The face data adjuster 130 is configured to generate the adjusted face data 134 based on the feature data 124 and further based on the face data 132. For example, the face data adjuster 130 can include a deep learning architecture neural network. In an illustrative, non-limiting example, the face data adjuster 130 corresponds to a skin U-Net that includes a convolutional neural network contracting path or encoder followed by a convolutional network expanding path or decoder. The contracting path or encoder can include repeated applications (e.g., layers) of convolution, each followed by a rectified linear unit (ReLU) and a max pooling operation, which reduces spatial information while increasing feature information. The expanding path or decoder can include repeated applications (e.g., layers) of up-convolution and concatenations with high-resolution features from the contracting path, from the feature data 124, or both.).
Perez, Kipman, Zhang and Beith are analogous art because they all teach method of tracking/capturing user characteristics and apply to avatar. Both Zhang and Beith further teaches using machine learning to detect user mannerism. Beith further teaches using convolutional neural network in feature data extraction. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the avatar modification method based on captured user mannerism (taught in Perez, Kipman and Zhang), to further use the convolutional neural network in feature extraction (taught in Beith), so as to improve the accuracy with which the avatar conveys such expressions and emotions of the user (Beith, [0001-0004]).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Perez et al (US20110007079) in view of Kipman et al (WO2010088033) further in view of Beith et al (US20240078731).

Regarding Claim 20. The combination of Perez and Kipman fails to explicitly teach, however, Beith teaches The computing device of claim 19, wherein the computing device comprises or is in communication with an artificial neural network, wherein the artificial neural network is configured to determine the user mannerism (Beith, abstract, the invention describes a device includes a memory and one or more processors configured to process image data corresponding to a user's face to generate face data. The one or more processors are configured to process sensor data to generate feature data and to generate a representation of an avatar based on the face data and the feature data. The one or more processors are also configured to generate an audio output for the avatar based on the sensor data.
[0005] According to one implementation of the present disclosure, a device includes a memory configured to store instructions. The device also includes one or more processors configured to process image data corresponding to a user's face to generate face data. The one or more processors are configured to process sensor data to generate feature data. The one or more processors are also configured to generate a representation of an avatar based on the face data and the feature data. The one or more processors are also configured to generate an audio output for the avatar based on the sensor data. 
[0090] The face data adjuster 130 is configured to generate the adjusted face data 134 based on the feature data 124 and further based on the face data 132. For example, the face data adjuster 130 can include a deep learning architecture neural network. In an illustrative, non-limiting example, the face data adjuster 130 corresponds to a skin U-Net that includes a convolutional neural network contracting path or encoder followed by a convolutional network expanding path or decoder. The contracting path or encoder can include repeated applications (e.g., layers) of convolution, each followed by a rectified linear unit (ReLU) and a max pooling operation, which reduces spatial information while increasing feature information. The expanding path or decoder can include repeated applications (e.g., layers) of up-convolution and concatenations with high-resolution features from the contracting path, from the feature data 124, or both.).
Perez, Kipman and Beith are analogous art because they all teach method of tracking/capturing user characteristics and apply to avatar. Beith further teaches using machine learning to detect user mannerism and using convolutional neural network in feature data extraction. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the avatar modification method based on captured user mannerism (taught in Perez and Kipman), to further use the convolutional neural network in feature extraction (taught in Beith), so as to improve the accuracy with which the avatar conveys such expressions and emotions of the user (Beith, [0001-0004]).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN SHENG whose telephone number is (571)272-5734. The examiner can normally be reached M-F 9:30AM-3:30PM 6:00PM-8:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at 5712723022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Xin Sheng/Primary Examiner, Art Unit 2619
Read full office action
Prosecution Timeline

Dec 07, 2023
Application Filed
Nov 05, 2025
Non-Final Rejection mailed — §103
Feb 05, 2026
Response Filed
Apr 08, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/173,623
Patent 12626326
IMAGE STITCHING WITH AN ADAPTIVE THREE-DIMENSIONAL BOWL MODEL OF THE SURROUNDING ENVIRONMENT FOR SURROUND VIEW VISUALIZATION
3y 2m to grant Granted May 12, 2026
18/367,119
Patent 12620165
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR POPULATING ENVIRONMENT MODELS
2y 7m to grant Granted May 05, 2026
18/367,115
Patent 12614341
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR POPULATING ENVIRONMENT MODELS
2y 7m to grant Granted Apr 28, 2026
18/490,458
Patent 12614337
SYSTEM AND METHODS FOR CUSTOMIZING 3D MODELS
2y 6m to grant Granted Apr 28, 2026
18/796,576
Patent 12614366
AUTOMATIC POINT CLOUD BUILDING ENVELOPE SEGMENTATION (AUTO-CuBES) USING MACHINE LEARNING
1y 8m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
90%
With Interview (+17.2%)
2y 4m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 404 resolved cases by this examiner. Grant probability derived from career allowance rate.