Last updated: April 19, 2026
Application No. 18/783,448
REALTIME INTERACTIONS BETWEEN A USER AND AN IN-VEHICLE ASSISTANT SYSTEM

Non-Final OA §103
Filed
Jul 25, 2024
Examiner
GUO, XILIN
Art Unit
2616
Tech Center
2600 — Communications
Assignee
Nio Technology (Anhui) Co. Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +17.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 456 resolved cases, 2023–2026
Examiner Intelligence

GUO, XILIN View full profile →
Grants 82% — above average
Career Allow Rate
374 granted / 456 resolved
+20.0% vs TC avg
Strong +17% interview lift
Without
With
+17.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
18 currently pending
Career history
474
Total Applications
across all art units
Statute-Specific Performance

§101
7.6%
-32.4% vs TC avg
§103
56.3%
+16.3% vs TC avg
§102
12.8%
-27.2% vs TC avg
§112
19.0%
-21.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 456 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings
Color photographs and color drawings are not accepted in utility applications unless a petition filed under 37 CFR 1.84(a)(2) is granted. Any such petition must be accompanied by the appropriate fee set forth in 37 CFR 1.17(h), one set of color drawings or color photographs, as appropriate, if submitted via the USPTO patent electronic filing system or three sets of color drawings or color photographs, as appropriate, if not submitted via the via USPTO patent electronic filing system, and, unless already present, an amendment to include the following language as the first paragraph of the brief description of the drawings section of the specification:
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Color photographs will be accepted if the conditions for accepting color drawings and black and white photographs have been satisfied. See 37 CFR 1.84(b)(2).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 6, 11-14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Shuster et al (U.S. Patent Application Publication 2009/0128567 A1) in view of Vasylyev (U.S. Patent Application Publication 2024/0412720 A1).

	Regarding claim 1, Shuster discloses a method for in-vehicle interaction, comprising: 
receiving commands (Paragraph [0045], system 400 receives incoming user commands 402 and incoming chat data 404), by a Paragraph [0008], systems and apparatus for managing multi-user, multi-instance animation for interactive play enhance communication between participants in the animation. As shown in FIGS. 1 to 6), wherein each of the commands contains a set of person state indicators characterizing a person’s state (Paragraph [0050], the parser 412 may parse incoming text data to identify the occurrence of key words, phrases, non-verbal character combinations, or any other character strings that are defined in a database 414 ... selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement) at a given time (Paragraph [0030], text or other data, chat data as used herein means data that expresses a verbal (i.e., word-based), dialogue between multiple participants in a real-time or near real-time computing process); 
parsing, by the Paragraph [0050], chat parser 412 may be configured to perform different functions, including a first function of identifying words, phrases, abbreviations, intonations, punctuation, or other chat data indicative of a proscribed automated animated response. In some implementations, the parser 412 may parse incoming text data to identify the occurrence of key words, phrases, non-verbal character combinations, or any other character strings that are defined in a database 414 or other suitable data structure as associated with an animation command or low-level identifier for an animation sequence. The identifying function may use fuzzy logic to identify key words or phrases as known for language filtering in chat and other editing applications, or may require an exact match ... Generally, selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement); 
constructing, by the Paragraphs [0054]-[0055], the chat parser 412 or analogous process may provide animation commands or identifiers for animation sequences to a command interface process 416 ... The command interface 416 may therefore perform a process of integrating separate command streams to provide a single command stream for each avatar. Integration may include prioritization and selection of commands based on priority, adding animation sequences together, spacing initiation of animation sequences at appropriate intervals, or combinations of the foregoing); 
animating, by the Paragraphs [0056]-[0057], the animation and aggregation process 418 may function to receive animation command streams originating from the different command input processes 412, 416, and process the streams for output to remote system clients ... The host animation process 418 may also perform selection of identifiers for animation sequences, retrieving animation sequences, or both, based on incoming command data, avatar data, and environmental rules or states of the avatar environment. In the alternative, these steps may be performed at the client level, with the host process 418 operating on command data only. In addition, process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames, of an action sequence); and 
displaying, by the Paragraph [0058], an output control process 424 may be used to direct and control output animation data 406 to each client at appropriate intervals; FIG. 6; paragraph [0066], rendered scene data may be presented on an output device of each client, for example on a display monitor or screen. Rendered output may be formatted as video output depicting each visible avatar in the scene, which is animated according to commands determined from chat data and optionally from user-specified commands).
It's noted that Shuster does not specifically discloses an assistant system. However, the claim just simply recite “an assistant system” and does not set fourth any elements involved in the system. In additional, Vasylyev discloses an assistant system (Paragraph [0085], FIG. 1 schematically shows an embodiment of an assistant system 2 ...; paragraph [0613], a assistant system 2 may be configured to process each input modality independently. For example, speech recognition module 302 may transcribe the user's spoken command into text, using acoustic and language models trained on home automation vocabulary and grammar ... detect the user's pointing gesture ... parses the user's typed command and extracts the relevant entities (living room, warm white, 72 degrees) and actions (set, adjust) using named entity recognition and semantic parsing techniques).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster incorporate the teachings of Vasylyev, and applying the artificial intelligence (AI) assistant system taught by Vasylyev to implement the assistant system for hosting a multiple-participant animation in order to receive, parse the received commands for indicating the state of the user; then generate and output the animated frames on the display screen. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster according to the relied-upon teachings of Vasylyev to obtain the invention as specified in claim.

	Regarding claim 2, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 1), and Shuster further disclose wherein the set of person state indicators comprises a facial expression indicator for characterizing the person’s facial expression (Paragraph [0010], chat text input by each user may be uploaded and parsed by the central server. Certain words or characters may be associated with different facial expressions. For example "LOL," sometimes used in chat as an abbreviation for "laugh out loud," may be associated with a "laughter" animation sequence for the avatar), and each of the keyframes comprises a facial component (Paragraph [0039], FIG. 3 shows an enlarged view of a face 328 belonging to avatar 322, showing an angry expression, and a face 330 belonging to avatar 324 that is laughing), 
wherein constructing, by the assistant system, the plurality of keyframes based on the set of person state indicators (Paragraphs [0054]-[0055], the chat parser 412 or analogous process may provide animation commands or identifiers for animation sequences to a command interface process 416 ... The command interface 416 may therefore perform a process of integrating separate command streams to provide a single command stream for each avatar. Integration may include prioritization and selection of commands based on priority, adding animation sequences together, spacing initiation of animation sequences at appropriate intervals, or combinations of the foregoing) comprises: 
generating the facial component on each of the keyframes (Paragraphs [0056]-[0057], the animation and aggregation process 418 may function to receive animation command streams originating from the different command input processes 412, 416, and process the streams for output to remote system clients ... The host animation process 418 may also perform selection of identifiers for animation sequences, retrieving animation sequences, or both, based on incoming command data, avatar data, and environmental rules or states of the avatar environment. In the alternative, these steps may be performed at the client level, with the host process 418 operating on command data only. In addition, process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames, of an action sequence) based on the facial expression indicator (Paragraph [0050], selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement; paragraph [0052], the chat parser or analogous process operates to detect any suitable chat data or other collected input that is indicative of a particular emotion, expression or sexual state or arousal to be conveyed by an avatar using an animated facial expression or other animated action).  

	Regarding claim 3, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 2), and Shuster further disclose wherein generating the facial component on each of the keyframes based on the facial expression indicator comprises: 
determining a particular facial element from a set of facial elements (FIG. 4; paragraph [0050], the parser 412 may parse incoming text data to identify the occurrence of key words, phrases, non-verbal character combinations, or any other character strings that are defined in a database 414 or other suitable data structure as associated with an animation command or low-level identifier for an animation sequence ... Generally, selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression), wherein the particular facial element correlates to the facial expression indicator (Paragraph [0010], Chat text input by each user may be uploaded and parsed by the central server. Certain words or characters may be associated with different facial expressions. For example "LOL," sometimes used in chat as an abbreviation for "laugh out loud," may be associated with a "laughter" animation sequence for the avatar); and 
generating the facial component using the particular facial element (FIG. 3; paragraph [0039], avatars 322, 324 may be modeled as jointed articulated figures capable of predetermined movements or animation sequences .. For example, FIG. 3 shows an enlarged view of a face 328 belonging to avatar 322, showing an angry expression, and a face 330 belonging to avatar 324 that is laughing).

	Regarding claim 4, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 1), and Shuster further disclose wherein the set of person state indicators comprises a hand gesture indicator for characterizing the person’s hand gesture (FIG. 4; paragraph [0050], generally, selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement. Avatar actions, for example laughing, leaping for joy, clenching a fist or other gestures may also be indicated and automatically selected; paragraph [0059], FIG. 5A shows an exemplary data table 500 for relating chat data 502 to animation command data 504 ... And a fourth entry shows that "hi" with various marks or a trailing space is related to a hand waving action), and each of the keyframes (Paragraph [0057], process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames ) comprises a hand component (Paragraph [0060], FIG. 5B shows an exemplary second table 550 such as may be used at a parser or downstream process to select a particular animation sequence for a given animation command, using at least one additional selection criteria. In this example, a first column 552 contains an animation command entry "handwave" ... A first entry 558 indicates a "right-handed humanoid" avatar type. For a "handwave" command, an animation sequence identified by the first entry 560 in the third column 556 may be selected), 
wherein constructing, by the assistant system, the plurality of keyframes based on the set of person state indicators (Paragraphs [0056]-[0057], the animation and aggregation process 418 may function to receive animation command streams originating from the different command input processes 412, 416, and process the streams for output to remote system clients ... The host animation process 418 may also perform selection of identifiers for animation sequences, retrieving animation sequences, or both, based on incoming command data, avatar data, and environmental rules or states of the avatar environment. In the alternative, these steps may be performed at the client level, with the host process 418 operating on command data only. In addition, process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames, of an action sequence) comprises: 
generating the hand component on each of the keyframes based on the hand gesture indicator (FIG. 6; paragraph [0061], chat data received at step 604 may be parsed 610 to identify indicators of expressive content to be animated. At step 614, animation sequence data may be selected using chat-animation associations stored in an appropriate data structure 616. If no auto-automate feature is enabled 608, chat data is not parsed and user commands are used to direct character animation according to a command-driven control process 612 ...; paragraph [0065], portal output data may then be generated at step 622 ... At step 628, each client may animate and render avatars present in the scene according to the received scene control data. In the alternative, step 628 may be performed at a host level, in which case low-level scene data would be received at step 626).  
It's noted that Shuster does not specially describe the hand gesture.
In additional, Vasylyev discloses the hand gesture (FIG. 1; paragraph [0613], a assistant system 2 may be configured to process each input modality independently ... Gesture recognition module 306 interprets the user’s hand gesture as a dimming command).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster incorporate the teachings of Vasylyev, and applying the artificial intelligence (AI) assistant system taught by Vasylyev to parse the received commands for indicating the hand gesture. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster according to the relied-upon teachings of Vasylyev to obtain the invention as specified in claim.

	Regarding claim 6, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 1), and Shuster further disclose wherein each of the keyframes comprises an accessory component, wherein constructing, by the assistant system, the plurality of keyframes based on the set of person state indicators comprises: 
generating the accessory component on each of the keyframes (FIG. 4; paragraph [0057], the host animation process 418 may also perform selection of identifiers for animation sequences, retrieving animation sequences, or both, based on incoming command data, avatar data, and environmental rules or states of the avatar environment ... process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames) using a particular accessory element independently selected from a set of accessory elements (Paragraph [0053], the chat parser 412 or analogous process may perform a second function of selecting an identifier for an animation sequence, or an animation command, based on characteristics of detected chat data. The chat parser may use database 414, which may store associations between particular chat data or classes of chat data and particular animation commands or sequence identifiers. Associations between detected chat data and animation commands or sequence identifiers may vary based on a user profile or user preference for the user submitting the chat data).  

	Regarding claim 11, Shuster discloses an in-vehicle interactive system comprising a Paragraph [0008], systems and apparatus for managing multi-user, multi-instance animation for interactive play enhance communication between participants in the animation. As shown in FIGS. 1 to 6) including a screen (FIG. 2; paragraph [0034], display 226), a hardware portion (FIG. 1; paragraph [0028], servers 114 and any or all of clients 104, 106, 108 and 110; paragraph [0029], a system 200 for providing a VRU process may be considered to be comprised of server-side components (to the left of dashed line 222) and client-side components (to the right of dashed line 222)), an assistant storage device (Paragraph [0028], a computer-readable media, such as, for example, a magnetic disk (116, 118) ...), and an assistant processor (Paragraph [0029], chat processor 224), the assistant storage device storing instructions (Paragraph [0028], store executable code and data used in the performance of methods as described herein on a computer-readable media, such as, for example, a magnetic disk (116, 118) ) which, when executed by the assistant processor (Paragraph [0028], servers 114 and any or all of clients 104, 106, 108 and 110 may store executable code and data used in the performance of methods as described herein on a computer-readable media, such as, for example, a magnetic disk (116, 118), optical disk, electronic memory device, or other magnetic, optical, or electronic storage media. Software and data 120 for use in performing the method may be provided to any or all client devices via a suitable communication signal for network 102; paragraph [0030], portal 220 may also interact with a chat processor 224, passing chat data from multiple clients to the chat processor, and session data from the chat processor to multiple clients ...), causes the assistant system to: 
receive commands (Paragraph [0045], system 400 receives incoming user commands 402 and incoming chat data 404), wherein each of the commands contains a set of person state indicators characterizing a person’s state (Paragraph [0050], the parser 412 may parse incoming text data to identify the occurrence of key words, phrases, non-verbal character combinations, or any other character strings that are defined in a database 414 ... selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement) at a given time (Paragraph [0030], text or other data, chat data as used herein means data that expresses a verbal (i.e., word-based), dialogue between multiple participants in a real-time or near real-time computing process); 
parse each of the commands to obtain the set of person state indicators (Paragraph [0050], chat parser 412 may be configured to perform different functions, including a first function of identifying words, phrases, abbreviations, intonations, punctuation, or other chat data indicative of a proscribed automated animated response. In some implementations, the parser 412 may parse incoming text data to identify the occurrence of key words, phrases, non-verbal character combinations, or any other character strings that are defined in a database 414 or other suitable data structure as associated with an animation command or low-level identifier for an animation sequence. The identifying function may use fuzzy logic to identify key words or phrases as known for language filtering in chat and other editing applications, or may require an exact match ... Generally, selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement); 
construct a plurality of keyframes based on the set of person state indicators (Paragraphs [0054]-[0055], the chat parser 412 or analogous process may provide animation commands or identifiers for animation sequences to a command interface process 416 ... The command interface 416 may therefore perform a process of integrating separate command streams to provide a single command stream for each avatar. Integration may include prioritization and selection of commands based on priority, adding animation sequences together, spacing initiation of animation sequences at appropriate intervals, or combinations of the foregoing); 
animate the plurality of keyframes to form an animated visual presentation (Paragraphs [0056]-[0057], the animation and aggregation process 418 may function to receive animation command streams originating from the different command input processes 412, 416, and process the streams for output to remote system clients ... The host animation process 418 may also perform selection of identifiers for animation sequences, retrieving animation sequences, or both, based on incoming command data, avatar data, and environmental rules or states of the avatar environment. In the alternative, these steps may be performed at the client level, with the host process 418 operating on command data only. In addition, process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames, of an action sequence); and 
display the animated visual presentation on the screen of the Paragraph [0058], an output control process 424 may be used to direct and control output animation data 406 to each client at appropriate intervals; FIG. 6; paragraph [0066], rendered scene data may be presented on an output device of each client, for example on a display monitor or screen. Rendered output may be formatted as video output depicting each visible avatar in the scene, which is animated according to commands determined from chat data and optionally from user-specified commands).
It's noted that Shuster does not specifically discloses an assistant system. However, the claim just simply recite “an assistant system” with a screen, a hardware portion, a storage and a processor. In additional, Vasylyev discloses an assistant system (Paragraph [0085], FIG. 1 schematically shows an embodiment of an assistant system 2 ...; paragraph [0613], a assistant system 2 may be configured to process each input modality independently. For example, speech recognition module 302 may transcribe the user's spoken command into text, using acoustic and language models trained on home automation vocabulary and grammar ... detect the user's pointing gesture ... parses the user's typed command and extracts the relevant entities (living room, warm white, 72 degrees) and actions (set, adjust) using named entity recognition and semantic parsing techniques).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster incorporate the teachings of Vasylyev, and applying the artificial intelligence (AI) assistant system taught by Vasylyev to implement the assistant system for hosting a multiple-participant animation in order to receive, parse the received commands for indicating the state of the user; then generate and output the animated frames on the display screen. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster according to the relied-upon teachings of Vasylyev to obtain the invention as specified in claim.

	Regarding claim 12, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 11), and Shuster further disclose wherein the set of person state indicators comprises a facial expression indicator for characterizing the person’s facial expression (Paragraph [0010], chat text input by each user may be uploaded and parsed by the central server. Certain words or characters may be associated with different facial expressions. For example "LOL," sometimes used in chat as an abbreviation for "laugh out loud," may be associated with a "laughter" animation sequence for the avatar), and each of the keyframes comprises a facial component (Paragraph [0039], FIG. 3 shows an enlarged view of a face 328 belonging to avatar 322, showing an angry expression, and a face 330 belonging to avatar 324 that is laughing), 
wherein constructing the plurality of keyframes based on the set of person state indicators (Paragraphs [0054]-[0055], the chat parser 412 or analogous process may provide animation commands or identifiers for animation sequences to a command interface process 416 ... The command interface 416 may therefore perform a process of integrating separate command streams to provide a single command stream for each avatar. Integration may include prioritization and selection of commands based on priority, adding animation sequences together, spacing initiation of animation sequences at appropriate intervals, or combinations of the foregoing) comprises: 
generating the facial component on each of the keyframes (Paragraphs [0056]-[0057], the animation and aggregation process 418 may function to receive animation command streams originating from the different command input processes 412, 416, and process the streams for output to remote system clients ... The host animation process 418 may also perform selection of identifiers for animation sequences, retrieving animation sequences, or both, based on incoming command data, avatar data, and environmental rules or states of the avatar environment. In the alternative, these steps may be performed at the client level, with the host process 418 operating on command data only. In addition, process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames, of an action sequence) based on the facial expression indicator (Paragraph [0050], selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement; paragraph [0052], the chat parser or analogous process operates to detect any suitable chat data or other collected input that is indicative of a particular emotion, expression or sexual state or arousal to be conveyed by an avatar using an animated facial expression or other animated action).

	Regarding claim 13, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 12), and Shuster further disclose wherein generating the facial component on each of the keyframes based on the facial expression indicator comprises: 
determining a particular facial element from a set of facial elements (FIG. 4; paragraph [0050], the parser 412 may parse incoming text data to identify the occurrence of key words, phrases, non-verbal character combinations, or any other character strings that are defined in a database 414 or other suitable data structure as associated with an animation command or low-level identifier for an animation sequence ... Generally, selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression), wherein the particular facial element correlates to the facial expression indicator (Paragraph [0010], Chat text input by each user may be uploaded and parsed by the central server. Certain words or characters may be associated with different facial expressions. For example "LOL," sometimes used in chat as an abbreviation for "laugh out loud," may be associated with a "laughter" animation sequence for the avatar); and 
generating the facial component using the particular facial element (FIG. 3; paragraph [0039], avatars 322, 324 may be modeled as jointed articulated figures capable of predetermined movements or animation sequences .. For example, FIG. 3 shows an enlarged view of a face 328 belonging to avatar 322, showing an angry expression, and a face 330 belonging to avatar 324 that is laughing).  

	Regarding claim 14, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 12), and Shuster further disclose wherein the set of person state indicators comprises a hand gesture indicator for characterizing the person’s hand gesture (FIG. 4; paragraph [0050], generally, selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement. Avatar actions, for example laughing, leaping for joy, clenching a fist or other gestures may also be indicated and automatically selected; paragraph [0059], FIG. 5A shows an exemplary data table 500 for relating chat data 502 to animation command data 504 ... And a fourth entry shows that "hi" with various marks or a trailing space is related to a hand waving action), and each of the keyframes (Paragraph [0057], process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames ) comprises a hand component (Paragraph [0060], FIG. 5B shows an exemplary second table 550 such as may be used at a parser or downstream process to select a particular animation sequence for a given animation command, using at least one additional selection criteria. In this example, a first column 552 contains an animation command entry "handwave" ... A first entry 558 indicates a "right-handed humanoid" avatar type. For a "handwave" command, an animation sequence identified by the first entry 560 in the third column 556 may be selected), wherein constructing the plurality of keyframes based on the set of person state indicators (Paragraphs [0056]-[0057], the animation and aggregation process 418 may function to receive animation command streams originating from the different command input processes 412, 416, and process the streams for output to remote system clients ... The host animation process 418 may also perform selection of identifiers for animation sequences, retrieving animation sequences, or both, based on incoming command data, avatar data, and environmental rules or states of the avatar environment. In the alternative, these steps may be performed at the client level, with the host process 418 operating on command data only. In addition, process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames, of an action sequence) comprises: 
generating the hand component on each of the keyframes based on the hand gesture indicator (FIG. 6; paragraph [0061], chat data received at step 604 may be parsed 610 to identify indicators of expressive content to be animated. At step 614, animation sequence data may be selected using chat-animation associations stored in an appropriate data structure 616. If no auto-automate feature is enabled 608, chat data is not parsed and user commands are used to direct character animation according to a command-driven control process 612 ...; paragraph [0065], portal output data may then be generated at step 622 ... At step 628, each client may animate and render avatars present in the scene according to the received scene control data. In the alternative, step 628 may be performed at a host level, in which case low-level scene data would be received at step 626).  
It's noted that Shuster does not specially describe the hand gesture.
In additional, Vasylyev discloses the hand gesture (FIG. 1; paragraph [0613], a assistant system 2 may be configured to process each input modality independently ... Gesture recognition module 306 interprets the user’s hand gesture as a dimming command).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster incorporate the teachings of Vasylyev, and applying the artificial intelligence (AI) assistant system taught by Vasylyev to parse the received commands for indicating the hand gesture. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster according to the relied-upon teachings of Vasylyev to obtain the invention as specified in claim.

	Regarding claim 16, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 11), and Shuster further disclose wherein each of the keyframes comprises an accessory component, wherein constructing the plurality of keyframes based on the set of person state indicators comprises: 
generating the accessory component on each of the keyframes (FIG. 4; paragraph [0057], the host animation process 418 may also perform selection of identifiers for animation sequences, retrieving animation sequences, or both, based on incoming command data, avatar data, and environmental rules or states of the avatar environment ... process 418 may apply selected animation sequences to avatar model data to prepare output data for every frame, or key frames) using a particular accessory element independently selected from a set of accessory elements (Paragraph [0053], the chat parser 412 or analogous process may perform a second function of selecting an identifier for an animation sequence, or an animation command, based on characteristics of detected chat data. The chat parser may use database 414, which may store associations between particular chat data or classes of chat data and particular animation commands or sequence identifiers. Associations between detected chat data and animation commands or sequence identifiers may vary based on a user profile or user preference for the user submitting the chat data).

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Shuster et al (U.S. Patent Application Publication 2009/0128567 A1) in view of Vasylyev (U.S. Patent Application Publication 2024/0412720 A1) in view of Nietfeld et al (U.S. Patent Application Publication 2018/0361234 A1).

	Regarding claim 5, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 4).
 	However, Shuster does not specifically discloses wherein generating the hand component on each of the keyframes based on the hand gesture indicator comprises: 
determining a particular hand element from a set of hand elements, wherein the particular hand element correlates to the hand gesture indicator; and 
generating the hand component using the particular hand element.
In additional, Nietfeld discloses (Abstract, a controller includes a body having a handle, and an array of proximity sensors spatially distributed on, in, beneath, or near the outer surface of the handle, responsive to a proximity of a user's fingers to that outer surface. A finger tracker converts the output of the array of proximity sensors to a set of joint angles corresponding to a plurality of the user's fingers. The controller may include a renderer for processing the joint angles to deform a hand mesh that is rendered for display ...) wherein generating the hand component on each of the keyframes based on the hand gesture indicator (Paragraph [0059], as shown in FIG. 13 ... The finger tracking algorithm 1300 outputs a set of joint angles (1350), corresponding to each of a user's finger being tracked. On initialization, the raw sensor data (1315) is monitored for a specific hand gesture using gesture detection techniques (1320) ...; paragraph [0071], the finger is fully wrapped around the object when the curl is one. In practice, this is achieved by creating an animation containing one keyframe when the finger is fully extended ...) comprises: 
determining a particular hand element from a set of hand elements (Paragraph [0060], each finger being tracked has a corresponding linear array of capacitive sensors aligned roughly along the length of the finger in certain embodiments), wherein the particular hand element correlates to the hand gesture indicator (FIG. 16; paragraph [0074], the user viewing this scene (1600) receive visual information corresponding to the position and orientation of the other user's fingers (and/or hand gestures)); and 
generating the hand component using the particular hand element (Paragraph [0075], the user viewing the scene (1600) may receive visual representations corresponding to his or her own hand gestures by implementing finger tracking techniques in accordance with the present invention. For example, a portion of the left hand (1640) of the user viewing the scene (1600) appears in the scene with the thumb and index fingers extended).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of Nietfeld, and applying finger tracking techniques taught by Nietfeld to determine a particular hand element correlates to the hand gesture and provide the visual representations corresponding to the hand gesture. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of Nietfeld to obtain the invention as specified in claim.
	Regarding claim 15, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 14).
 	However, Shuster does not specifically discloses wherein generating the hand component on each of the keyframes based on the hand gesture indicator comprises: 
determining a particular hand element from a set of hand elements, wherein the particular hand element correlates to the hand gesture indicator; and 
generating the hand component using the particular hand element.  
In additional, Nietfeld discloses (Abstract, a controller includes a body having a handle, and an array of proximity sensors spatially distributed on, in, beneath, or near the outer surface of the handle, responsive to a proximity of a user's fingers to that outer surface. A finger tracker converts the output of the array of proximity sensors to a set of joint angles corresponding to a plurality of the user's fingers. The controller may include a renderer for processing the joint angles to deform a hand mesh that is rendered for display ...) wherein generating the hand component on each of the keyframes based on the hand gesture indicator (Paragraph [0059], as shown in FIG. 13 ... The finger tracking algorithm 1300 outputs a set of joint angles (1350), corresponding to each of a user's finger being tracked. On initialization, the raw sensor data (1315) is monitored for a specific hand gesture using gesture detection techniques (1320) ...; paragraph [0071], the finger is fully wrapped around the object when the curl is one. In practice, this is achieved by creating an animation containing one keyframe when the finger is fully extended ...) comprises: 
determining a particular hand element from a set of hand elements (Paragraph [0060], each finger being tracked has a corresponding linear array of capacitive sensors aligned roughly along the length of the finger in certain embodiments), wherein the particular hand element correlates to the hand gesture indicator (FIG. 16; paragraph [0074], the user viewing this scene (1600) receive visual information corresponding to the position and orientation of the other user's fingers (and/or hand gestures)); and 
generating the hand component using the particular hand element (Paragraph [0075], the user viewing the scene (1600) may receive visual representations corresponding to his or her own hand gestures by implementing finger tracking techniques in accordance with the present invention. For example, a portion of the left hand (1640) of the user viewing the scene (1600) appears in the scene with the thumb and index fingers extended).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of Nietfeld, and applying finger tracking techniques taught by Nietfeld to determine a particular hand element correlates to the hand gesture and provide the visual representations corresponding to the hand gesture. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of Nietfeld to obtain the invention as specified in claim.
Claims 7-8 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Shuster et al (U.S. Patent Application Publication 2009/0128567 A1) in view of Vasylyev (U.S. Patent Application Publication 2024/0412720 A1) in view of SATOI et al (U.S. Patent Application Publication 2017/0231544 A1).

	Regarding claim 7, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 1).
However, Shuster does not specifically discloses wherein the set of person state indicators comprises a head movement indicator for characterizing the person’s head movement, the method further comprises: 
causing physical movement of a head of the assistant system based on the head movement indicator.  
In additional, SATOI discloses (FIGS. 1 and 3; paragraphs [0116]-[0117], FIG. 4A illustrates image information obtained by the light detector 140 at time T1. The calculation circuit 200 has a face recognition function, and detects whether a human face is included in the image information outputted from the light detector 140. The calculation circuit 200 determines the forehead area (area within a dashed line frame in FIG. 4A)  ... FIG. 4B illustrates image information obtained by the light detector 140 at time T2. During an interval from time T1 to time T2, the test portion of the subject only moves substantially parallel to an image obtaining surface (image capture surface) of the light detector 140, and only rotates around an axis substantially perpendicular to the image obtaining surface. The distance between the light detector 140 and the test portion has not changed. Thus, the calculation circuit 200 detects a change in the position of the forehead in the obtained image, and changes the position of the forehead area for which biological information is measured) wherein the set of person state indicators comprises a head movement indicator for characterizing the person’s head movement (FIG. 6; paragraph [0124], the calculation circuit 200 detects the position of the test portion (forehead) of the subject O ...; paragraph [0131], it is presumed that the head of the subject O, specifically, the forehead as a test portion moves during measurement. Thus, the calculation circuit 200 monitors all the time whether or not the subject (particularly, the head) has moved. For instance, the calculation circuit 200 calculates a motion vector between the consecutive frame images ...), the method further comprises: 
causing physical movement of a head of the assistant system (Paragraph [0207],  FIG. 16 is an illustration schematically depicting a robot 500 and a conversation partner O as a subject according to the sixth embodiment. FIG. 17 is a diagram illustrating a configuration example of the robot 500; paragraph [0209], the robot 500 includes at least one motor 520 that drives each part including the head; paragraph [0208], the robot 500 can adjust the irradiation position of light by moving its head while keeping track of the movement of the subject O. Since the robot 500 faces in the direction of the subject O during a conversation, adjusting the irradiation position of light by moving the head is a natural action) based on the head movement indicator (paragraph [0131], it is presumed that the head of the subject O, specifically, the forehead as a test portion moves during measurement ; paragraph [0208], keeping track of the movement of the subject O).  
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of SATOI, and applying the information detection device taught by SATOI to provide the capability with the system for detecting the movement of the head of the person and cause physical movement of the head of a machine based on detected the motion vector of the person’s head. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of SATOI to obtain the invention as specified in claim.

	Regarding claim 8, the combination of Shuster in view of Vasylyev in view of SATOI discloses everything claimed as applied above (see claim 7).
However, Shuster does not specifically discloses wherein causing physical movement of the head of the assistant system based on the head movement indicator comprises: 
determining a particular motion vector from a set of motion vectors, wherein the particular motion vector correlates to the head movement indicator; and 
controlling rotation of motors mounted on the assistant system according to the particular motion vector.
In additional, SATOI discloses wherein causing physical movement of the head of the assistant system based on the head movement indicator (see claim 7) comprises: 
determining a particular motion vector from a set of motion vectors (FIG. 17; paragraph [0209], the robot 500 includes at least one motor 520 that drives each part including the head ... The control circuit 510 then generates a control signal for controlling an element such as the motor 520 ...), wherein the particular motion vector correlates to the head movement indicator (FIG. 1; paragraph [0075], the biological information detection device 100 includes a light source 110, a light detector 140, and a calculation circuit 200 electrically connected to the light detector 140;  paragraph [0131], it is presumed that the head of the subject O, specifically, the forehead as a test portion moves during measurement. Thus, the calculation circuit 200 monitors all the time whether or not the subject (particularly, the head) has moved. For instance, the calculation circuit 200 calculates a motion vector between the consecutive frame images. When the magnitude of the motion vector is greater than or equal to a threshold value, the calculation circuit 200 determines that the subject O has moved); and 
controlling rotation of motors mounted on the assistant system according to the particular motion vector (FIG. 16; paragraph [0208], the robot 500 in this embodiment has a head that includes the same components as in the biological information detection device 100 in any of the first to third embodiments ... The robot 500 can adjust the irradiation position of light by moving its head while keeping track of the movement of the subject O. Since the robot 500 faces in the direction of the subject O during a conversation, adjusting the irradiation position of light by moving the head is a natural action).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of SATOI, and applying the information detection device taught by SATOI to provide the capability with the system for detecting the movement of the head of the person and cause physical movement of the head of a machine based on detected the motion vector of the person’s head. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of SATOI to obtain the invention as specified in claim.

	Regarding claim 17, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 11).
However, Shuster does not specifically discloses wherein the set of person state indicators comprises a head movement indicator for characterizing the person’s head movement, and wherein execution of the instructions further causes the assistant system to: 
cause physical movement of the hardware portion based on the head movement indicator.  
In additional, SATOI discloses (FIGS. 1 and 3; paragraphs [0116]-[0117], FIG. 4A illustrates image information obtained by the light detector 140 at time T1. The calculation circuit 200 has a face recognition function, and detects whether a human face is included in the image information outputted from the light detector 140. The calculation circuit 200 determines the forehead area (area within a dashed line frame in FIG. 4A)  ... FIG. 4B illustrates image information obtained by the light detector 140 at time T2. During an interval from time T1 to time T2, the test portion of the subject only moves substantially parallel to an image obtaining surface (image capture surface) of the light detector 140, and only rotates around an axis substantially perpendicular to the image obtaining surface. The distance between the light detector 140 and the test portion has not changed. Thus, the calculation circuit 200 detects a change in the position of the forehead in the obtained image, and changes the position of the forehead area for which biological information is measured) wherein the set of person state indicators comprises a head movement indicator for characterizing the person’s head movement (FIG. 6; paragraph [0124], the calculation circuit 200 detects the position of the test portion (forehead) of the subject O ...; paragraph [0131], it is presumed that the head of the subject O, specifically, the forehead as a test portion moves during measurement. Thus, the calculation circuit 200 monitors all the time whether or not the subject (particularly, the head) has moved. For instance, the calculation circuit 200 calculates a motion vector between the consecutive frame images ...), and wherein execution of the instructions further causes the assistant system to: 
cause physical movement of the hardware portion (Paragraph [0207],  FIG. 16 is an illustration schematically depicting a robot 500 and a conversation partner O as a subject according to the sixth embodiment. FIG. 17 is a diagram illustrating a configuration example of the robot 500; paragraph [0209], the robot 500 includes at least one motor 520 that drives each part including the head; paragraph [0208], the robot 500 can adjust the irradiation position of light by moving its head while keeping track of the movement of the subject O. Since the robot 500 faces in the direction of the subject O during a conversation, adjusting the irradiation position of light by moving the head is a natural action) based on the head movement indicator (paragraph [0131], it is presumed that the head of the subject O, specifically, the forehead as a test portion moves during measurement ; paragraph [0208], keeping track of the movement of the subject O).  
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of SATOI, and applying the information detection device taught by SATOI to provide the capability with the system for detecting the movement of the head of the person and cause physical movement of the head of a machine based on detected the motion vector of the person’s head. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of SATOI to obtain the invention as specified in claim.

	Regarding claim 18, the combination of Shuster in view of Vasylyev in view of SATOI discloses everything claimed as applied above (see claim 17).
However, Shuster does not specifically discloses wherein causing physical movement of the hardware portion based on the head movement indicator comprises: 
determining a particular motion vector from a set of motion vectors, wherein the particular motion vector correlates to the head movement indicator; and 
controlling rotation of motors mounted on the hardware portion based on the particular motion vector.
In additional, SATOI discloses wherein causing physical movement of the hardware portion based on the head movement indicator (see claim 17) comprises: 
determining a particular motion vector from a set of motion vectors (FIG. 17; paragraph [0209], the robot 500 includes at least one motor 520 that drives each part including the head ... The control circuit 510 then generates a control signal for controlling an element such as the motor 520 ...), wherein the particular motion vector correlates to the head movement indicator (FIG. 1; paragraph [0075], the biological information detection device 100 includes a light source 110, a light detector 140, and a calculation circuit 200 electrically connected to the light detector 140;  paragraph [0131], it is presumed that the head of the subject O, specifically, the forehead as a test portion moves during measurement. Thus, the calculation circuit 200 monitors all the time whether or not the subject (particularly, the head) has moved. For instance, the calculation circuit 200 calculates a motion vector between the consecutive frame images. When the magnitude of the motion vector is greater than or equal to a threshold value, the calculation circuit 200 determines that the subject O has moved); and 
controlling rotation of motors mounted on the hardware portion based on the particular motion vector (FIG. 16; paragraph [0208], the robot 500 in this embodiment has a head that includes the same components as in the biological information detection device 100 in any of the first to third embodiments ... The robot 500 can adjust the irradiation position of light by moving its head while keeping track of the movement of the subject O. Since the robot 500 faces in the direction of the subject O during a conversation, adjusting the irradiation position of light by moving the head is a natural action).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of SATOI, and applying the information detection device taught by SATOI to provide the capability with the system for detecting the movement of the head of the person and cause physical movement of the head of a machine based on detected the motion vector of the person’s head. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of SATOI to obtain the invention as specified in claim.

Claims 9-10 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shuster et al (U.S. Patent Application Publication 2009/0128567 A1) in view of Vasylyev (U.S. Patent Application Publication 2024/0412720 A1) in view of KATZ (U.S. Patent Application Publication 2024/0362931 A1).

	Regarding claim 9, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 1), Shuster discloses further comprises:
	sending, by the control system (FIG. 2; paragraph [0029], a system 200 for providing a VRU process may be considered to be comprised of server-side components (to the left of dashed line 222) and client-side components (to the right of dashed line 222) ...; paragraph [0030], portal 220 may also interact with a chat processor 224, passing chat data from multiple clients to the chat processor ...; paragraph [0034], at the client level, a player interface module 224 may be installed to receive player inputs from one or more user input devices 228, such as a keyboard, mouse or other pointer, or microphone, and provide data to the VRU engine 218 via portal 222 in response to the input), commands to the assistant system (FIG. 4; paragraph [0045], system 400 receives incoming user commands 402 and incoming chat data 404), wherein each of the commands contains the set of person state indicators (Paragraph [0050], the parser 412 may parse incoming text data to identify the occurrence of key words, phrases, non-verbal character combinations, or any other character strings that are defined in a database 414 ... selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement).  
 	However, Shuster does not specifically discloses further comprises: 
receiving, by a control system, a plurality of images of the person, wherein each of the plurality of images comprises visual information regarding the person’s state; 
processing, by the control system, each of the plurality of images to obtain the set of state indicators characterizing the person’s states.
In additional, KATZ discloses further comprises: 
receiving, by a control system (Paragraph [0005], systems for determining driver control over a vehicle ...), a plurality of images of the person (FIG. 12; paragraph [0062], at step 1202, at which at least one processor of the system receives, from at least one sensor in a vehicle, first information associated with an interior area of the vehicle. For example, the at least one sensor may be at least one image sensor such as at least one camera in the vehicle ...; FIG. 11; paragraph [0238], camera 1100 may face either further toward the driver, further away from the driver, further upward, further downward relative to the driver, or a combination thereof, resulting in a camera angle or perspective that is different than what the one or more processors expect in the captured image information ...), wherein each of the plurality of images comprises visual information regarding the person’s state (Paragraph [0062], the information may be image information associated with a position of the driver's hand(s) on a steering wheel of the vehicle or a relative position of the driver's hand(s) to the steering wheel); 
processing, by the control system, each of the plurality of images to obtain the set of state indicators characterizing the person’s states (Paragraph [0062], at step 1204, the processor may be configured to detect, using the received first information, at least one location of the driver's hand .. At step 1206, based on the received first information, the processor may be configured to determine a level of control of the driver over the vehicle ... At step 1208, the processor may be configured to generate a message or command based on the determined level of control).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of KATZ, and applying the system for determining driver control over a vehicle taught by KATZ to have a camera for capturing image of a person in the vehicle and determine the person’s states based on the captured image for generating the command. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of KATZ to obtain the invention as specified in claim.

	Regarding claim 10, the combination of Shuster in view of Vasylyev in view of KATZ discloses everything claimed as applied above (see claim 9).
However, Shuster does not specifically discloses wherein the set of person state indicators includes a facial expression indicator, a hand gesture indicator, a head movement indicator, or any combination thereof, the method further comprises: 
storing a predetermined numeric threshold corresponding to each of the person state indicators in the set of person state indicators; and 
determining that at least one person state indicator in the set of person state indicators has a numeric value that equals to or is greater than the predetermined numeric threshold corresponding to the at least one person state indicator.
In additional, KATZ discloses wherein the set of person state indicators includes a facial expression indicator, a hand gesture indicator, a head movement indicator, or any combination thereof (Paragraph [0036], a driver monitoring system (DMS) may be configured to monitor driver behavior. DMS may comprise a system that tracks the driver and acts accordingly to the driver's detected state, physical condition, emotional condition, cognitive load, actions, behaviors, driving performance, attentiveness, alertness, drowsiness ... DMS may include modules that detect or predict gestures, motion, body posture, features associated with user alertness, driver alertness, fatigue, attentiveness to the road, distraction, features associated with expressions or emotions of a user, features associated with gaze direction of a user, driver or passenger, showing signs of sudden sickness, or the like.), the method further comprises: 
storing a predetermined numeric threshold corresponding to each of the person state indicators in the set of person state indicators (Paragraph [0106], The machine learning algorithms may also configure the processor to predict that the driver's gaze will shift back towards the road after a second period of time after the driver's gaze has shifted towards the object. The first, and/or second period of time may be values saved in the memory, values that were detected in previous similar event of that driver, or values that represent a statistical value. As a non-limiting example, when a driver begins a gesture toward a multimedia device (such as changing a radio station or selecting an audio track), the processor may predict that the driver's gaze will shift downward and to the side toward the multimedia device for 2 seconds, and then will shift back to the road after another 600 milliseconds ...); and 
determining that at least one person state indicator in the set of person state indicators has a numeric value that equals to or is greater than the predetermined numeric threshold corresponding to the at least one person state indicator (Paragraph [0109],  the processor may be configured to generate an audible or visual message after detecting that the driver's gaze has shifted towards an object for a period of time greater than a predetermined threshold).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of KATZ, and applying the system for determining driver control over a vehicle taught by KATZ to store a predetermined threshold for determining a driver's level in order to generate the command of the person state. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of KATZ to obtain the invention as specified in claim.

	Regarding claim 19, the combination of Shuster in view of Vasylyev discloses everything claimed as applied above (see claim 11), Shuster discloses further comprising a control system communicatively coupled with the assistant system, the control system comprising a control storage device and a control processor, the control storage device storing instructions which, when executed by the control processor (FIG. 2; paragraph [0029], a system 200 for providing a VRU process may be considered to be comprised of server-side components (to the left of dashed line 222) and client-side components (to the right of dashed line 222) ...; paragraph [0030], portal 220 may also interact with a chat processor 224, passing chat data from multiple clients to the chat processor ...; paragraph [0034], at the client level, a player interface module 224 may be installed to receive player inputs from one or more user input devices 228, such as a keyboard, mouse or other pointer, or microphone, and provide data to the VRU engine 218 via portal 222 in response to the input), causes the control system to: 
	send commands to the assistant system (FIG. 4; paragraph [0045], system 400 receives incoming user commands 402 and incoming chat data 404), wherein each of the commands contains the set of person state indicators (Paragraph [0050], the parser 412 may parse incoming text data to identify the occurrence of key words, phrases, non-verbal character combinations, or any other character strings that are defined in a database 414 ... selected textual data may be regarded as indicative of an emotional state or idea that is, is the natural world, often expressed by a facial expression or other bodily movement).
However, Shuster does not specifically discloses receive a plurality of images of the person, wherein each of the plurality of images comprises visual information regarding the person’s state; 
process each of the plurality of images to obtain the set of person state indicators characterizing the person’s states.
 In additional, KATZ discloses (Paragraph [0005], systems for determining driver control over a vehicle ...) receive a plurality of images of the person (FIG. 12; paragraph [0062], at step 1202, at which at least one processor of the system receives, from at least one sensor in a vehicle, first information associated with an interior area of the vehicle. For example, the at least one sensor may be at least one image sensor such as at least one camera in the vehicle ...; FIG. 11; paragraph [0238], camera 1100 may face either further toward the driver, further away from the driver, further upward, further downward relative to the driver, or a combination thereof, resulting in a camera angle or perspective that is different than what the one or more processors expect in the captured image information ...), wherein each of the plurality of images comprises visual information regarding the person’s state (Paragraph [0062], the information may be image information associated with a position of the driver's hand(s) on a steering wheel of the vehicle or a relative position of the driver's hand(s) to the steering wheel); 
process each of the plurality of images to obtain the set of person state indicators characterizing the person’s states (Paragraph [0062], at step 1204, the processor may be configured to detect, using the received first information, at least one location of the driver's hand .. At step 1206, based on the received first information, the processor may be configured to determine a level of control of the driver over the vehicle ... At step 1208, the processor may be configured to generate a message or command based on the determined level of control).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of KATZ, and applying the system for determining driver control over a vehicle taught by KATZ to have a camera for capturing image of a person in the vehicle and determine the person’s states based on the captured image for generating the command. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of KATZ to obtain the invention as specified in claim.

	Regarding claim 20, the combination of Shuster in view of Vasylyev in view of KATZ discloses everything claimed as applied above (see claim 19).
However, Shuster does not specifically discloses wherein the set of person state indicators includes a facial expression indicator, a hand gesture indicator, a head movement indicator, or any combination thereof, and wherein execution of the instructions further causes the control system to: 
store a predetermined numeric threshold corresponding to each of the person state indicator in the set of person state indicators; and 
determine that at least one person state indicator in the set of person state indicators has a numeric value that equals to or is greater than the predetermined numeric threshold corresponding to the at least one person state indicators.
In additional, KATZ discloses wherein the set of person state indicators includes a facial expression indicator, a hand gesture indicator, a head movement indicator, or any combination thereof (Paragraph [0036], a driver monitoring system (DMS) may be configured to monitor driver behavior. DMS may comprise a system that tracks the driver and acts accordingly to the driver's detected state, physical condition, emotional condition, cognitive load, actions, behaviors, driving performance, attentiveness, alertness, drowsiness ... DMS may include modules that detect or predict gestures, motion, body posture, features associated with user alertness, driver alertness, fatigue, attentiveness to the road, distraction, features associated with expressions or emotions of a user, features associated with gaze direction of a user, driver or passenger, showing signs of sudden sickness, or the like.), and wherein execution of the instructions further causes the control system to: 
store a predetermined numeric threshold corresponding to each of the person state indicator in the set of person state indicators (Paragraph [0106], The machine learning algorithms may also configure the processor to predict that the driver's gaze will shift back towards the road after a second period of time after the driver's gaze has shifted towards the object. The first, and/or second period of time may be values saved in the memory, values that were detected in previous similar event of that driver, or values that represent a statistical value. As a non-limiting example, when a driver begins a gesture toward a multimedia device (such as changing a radio station or selecting an audio track), the processor may predict that the driver's gaze will shift downward and to the side toward the multimedia device for 2 seconds, and then will shift back to the road after another 600 milliseconds ...); and 
determine that at least one person state indicator in the set of person state indicators has a numeric value that equals to or is greater than the predetermined numeric threshold corresponding to the at least one person state indicators (Paragraph [0109],  the processor may be configured to generate an audible or visual message after detecting that the driver's gaze has shifted towards an object for a period of time greater than a predetermined threshold).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the system for hosting a multiple-participant animation taught by Shuster in view of Vasylyev incorporate the teachings of KATZ, and applying the system for determining driver control over a vehicle taught by KATZ to store a predetermined threshold for determining a driver's level in order to generate the command of the person state. Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify Shuster in view of Vasylyev according to the relied-upon teachings of KATZ to obtain the invention as specified in claim.

Conclusion
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Xilin Guo whose telephone number is (571)272-5786. The examiner can normally be reached Monday - Friday 9:00 AM-5:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 571-272-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XILIN GUO/Primary Examiner, Art Unit 2616
Read full office action
Prosecution Timeline

Jul 25, 2024
Application Filed
Feb 18, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/484,586
Patent 12602855
LIVE MODEL PROMPTING AND REAL-TIME OUTPUT OF PHOTOREAL SYNTHETIC CONTENT
2y 5m to grant Granted Apr 14, 2026
18/589,428
Patent 12597403
DISPLAY DEVICE FOR A VEHICLE
2y 5m to grant Granted Apr 07, 2026
18/420,037
Patent 12579712
ASSET CREATION USING GENERATIVE ARTIFICIAL INTELLIGENCE
2y 5m to grant Granted Mar 17, 2026
18/586,703
Patent 12579766
SYSTEM AND METHOD FOR RAPID OUTFIT VISUALIZATION
2y 5m to grant Granted Mar 17, 2026
18/432,623
Patent 12573121
Automated Generation and Presentation of Sign Language Avatars for Video Content
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+17.4%)
2y 5m
Median Time to Grant
Low
PTA Risk
Based on 456 resolved cases by this examiner. Grant probability derived from career allow rate.