Last updated: May 29, 2026
Application No. 18/211,001
INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM

Non-Final OA §103
Filed
Jun 16, 2023
Priority
Jun 17, 2022 — JP 2022-097871
Examiner
JONES, CARISSA ANNE
Art Unit
2691
Tech Center
2600 — Communications
Assignee
Gree, Inc.
OA Round
4 (Non-Final)
Interview Optional

— +23.5% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 85% grant rate with +23.5% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 26 resolved cases, 2023–2026
Examiner Intelligence

JONES, CARISSA ANNE View full profile →
Grants 85% — above average
Career Allowance Rate
22 granted / 26 resolved
+22.6% vs TC avg
Strong +24% interview lift
Without
With
+23.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
19 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
0.9%
-39.1% vs TC avg
§103
95.5%
+55.5% vs TC avg
§102
0.9%
-39.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 26 resolved cases
Office Action

§103
DETAILED ACTION
This action is in response to the remarks filed 04/06/2026. Claims 1 – 12 and 16 - 19 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claims 1 – 12 and 16 - 19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Response to Amendment
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 4 and 17 - 18 are rejected under 35 U.S.C. 103 as being unpatentable over Sepulveda et al. (U.S. Pub. No. 2022/0392132, hereinafter “Sepulveda”) in view of Marlow et al. (U.S. Pub. No. 2017/0332044, hereinafter “Marlow”), Sheaffer et al. (U.S. Pub. No. 2020/0105291, hereinafter “Sheaffer”) and Grossinger et al. (U.S. Patent No. 11,276,215, hereinafter “Grossinger”).
Regarding Claim 1, Sepulveda teaches 
An information processing system (see Sepulveda Paragraph [0006], computer system that is in communication with a display generation component and one or more input devices) comprising:
one or more processors (see Sepulveda Paragraph [0007], one or more programs configured to be executed by one or more processors) programmed to:
receive information for generating a video, including information related to movement of a user, information related to sound, and information related to a character object, that is sent from a user terminal of a user (see Sepulveda Paragraph [0176] and Figure 5B, personal electronic device has I/O and input mechanism 508 is, optionally, a microphone, in some examples. Personal electronic device 500 optionally includes various sensors, such as GPS sensor 532, accelerometer 534, directional sensor 540 (e.g., compass), gyroscope 536, motion sensor 538, and/or a combination thereof, all of which can be operatively connected to I/O section 514, Paragraph [0026], the animations of the avatar [video] provide the user with visual feedback about what inputs are being received at the device and about the state of the device);
execute a video chat between a plurality of users using character objects, based on the received information for generating the video (see Sepulveda Paragraph [0096], video conference module 139 includes executable instructions to initiate, conduct, and terminate a video conference between a user and one or more other participants in accordance with user instructions, Paragraph [0035], The device typically supports a variety of applications, such as one or more of the following: a video conferencing application, and Paragraph [0026], the animations of the avatar provide the user with visual feedback about what inputs are being received at the device and about the state of the device);
specify a state of the user terminal (see Sepulveda Paragraph [0223], Accordingly, the animations of the first avatar are based on the user input and different user inputs cause different animations of the first avatar. In some embodiments, the animation includes smooth transitions among multiple poses and/or the animation includes animated movement of the first avatar to transition among the multiple poses. In some embodiments, the second avatar transitions (concurrently with the first avatar) through a plurality of poses (e.g., different from the first avatar). Animating the first avatar differently based on the received input provides the user with visual feedback that the input has been received and about the effect of the input on the state of the computer system, which provides improved visual feedback, Paragraph [0221], Changing a shape and/or rotation of the first avatar provides the user with visual feedback that the system is active and that inputs will be processed. Further, inputs that affect the animation of the first avatar provides the user with visual feedback that the input has been received and about the effect of the input on the state of the computer system, which provides improved visual feedback);

Sepulveda does not expressively teach
change a display of the character object in the video chat corresponding to the user terminal according to the specified state of the user terminal, wherein
when a volume of a sound other than speaking by the user included in the received information related to the sound is greater than or equal to a first value, the one or more processors specify that the user terminal is in a fourth state,
when the one or more processors specify that the user terminal is in the fourth state, as a change in the display of the character object, the one or more processors attach a fourth specific object to the character object and/or apply a fourth specific movement to the character object,
the fourth specific object is an object to indicate that the character object feels that sound of the video chat is difficult to hear, and
the fourth specific movement is a movement to indicate that the character object feels that the sound of the video chat is difficult to hear.

However, Marlow teaches
change a display of the character object in the video chat corresponding to the user terminal according to the specified state of the user terminal (see Marlow Paragraph [0022], In video conferencing systems, the participants may be participating in the video conference in different ways. For example, some users may be participating on a mobile phone with a zoomed view on their face via the phone camera. For such users, sometimes the communication connection may be unstable between the mobile phone and the devices of the other user, which may result in the portrait of the user captured from the phone camera being used as a static avatar. Sometimes, the phone camera is disabled to increase throughput of the mobile phone. To the video conferencing system and the other participants, the static avatar or disabled video may be interpreted as the user not being a participant in the meeting, even though the user may have technical reasons that prevent the video streaming. To address this misinterpretation, the example implementations detect the situations where the static avatar or disabled video may occur and the user does not have a full video stream, and replace the video with a generated animated avatar to indicate that the user is an active participant in the video conference, and Paragraph [0071], the apparatus can detect an interruption in the video stream from the user device and then select the animated avatar based on the connection of the user device. For example, if the user device is still connected to the video conference but the video stream connection is inconsistent, then an animated avatar can be selected to indicate that the user is still active as indicated in FIG. 1, therefore the avatar of the user can be created or changed according to the state of the user terminal, such as an unstable connection),

It would have been obvious to one of ordinary skill in the art before the effective filing date of
the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal (as taught in Sepulveda), with changing a display of the character object in the video chat corresponding to the user terminal according to the specified state of the user terminal (as taught in Marlow), the motivation being to provide a visual cue in a video conference that quickly identifies the status of a user to express and display to other users (see Marlow Paragraphs [0002] – [0003]).

Sepulveda in view of Marlow does not expressively teach
when a volume of a sound other than speaking by the user included in the received information related to the sound is greater than or equal to a first value, the one or more processors specify that the user terminal is in a fourth state,
when the one or more processors specify that the user terminal is in the fourth state, as a change in the display of the character object, the one or more processors attach a fourth specific object to the character object and/or apply a fourth specific movement to the character object,
the fourth specific object is an object to indicate that the character object feels that sound of the video chat is difficult to hear, and
the fourth specific movement is a movement to indicate that the character object feels that the sound of the video chat is difficult to hear.

However, Sheaffer teaches
when a volume of a sound other than speaking by the user included in the received information related to the sound is greater than or equal to a first value, the one or more processors specify that the user terminal is in a fourth state (see Sheaffer Paragraph [0075], The audio analytics module 510 may include a level-distortion detector 520. The level-distortion detector 520 may measure the input level of the audio data. When a microphone is too close to a sound source, the audio input level may be too high. When a microphone is too far away from a sound source, the audio input level may be too low relative to a background noise. The level-distortion detector 520 may compare the input level to one or more threshold levels. When the input level exceeds or falls below a threshold, the level distortion detector 520 may identify a distortion impairment in the audio input. For example, the level-distortion detector 520 may output a detection signal that indicates that the input level is too high or that the input level is too low. The detection signal may also indicate by how much the input level is above or below the threshold),
when the one or more processors specify that the user terminal is in the fourth state, as a change in the display of the character object, the one or more processors attach a fourth specific object to the character object and/or apply a fourth specific movement to the character object (see Sheaffer Figure 12B, textual message object and alert icon are displayed stating that speech is not intelligible, and Paragraph [0006], when executed by the processor, cause the apparatus to emit a user-perceptible alert responsive to the measure of perceptual sound quality passing the corresponding threshold sound quality in real-time; and modify the user-perceptible alert when the measure of perceptual sound quality changes),
the fourth specific object is an object to indicate that the character object feels that sound of the video chat is difficult to hear (see Sheaffer Figure 12B, textual message object and alert icon are displayed stating that speech is not intelligible),

It would have been obvious to one of ordinary skill in the art before the effective filing date of
the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal (as taught in Sepulveda), with changing a display of the character object in the video chat corresponding to the user terminal according to the specified state of the user terminal (as taught in Marlow), the motivation being to provide a visual cue in a video conference that quickly identifies the status of a user to express and display to other users (see Marlow Paragraphs [0002] – [0003]).
It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal in a video chat (as taught in Sepulveda in view of Marlow), with determining that a sound other than a user speaking reaches a certain level, in which a system determines that a user’s device is in a specific state and in response, changes a character object displayed on a screen showing that the sound is hard to hear (as taught in Sheaffer), the motivation being to implement corrective measures/notices to improve quality of audio (see Sheaffer Paragraphs [0050] and [0091]).

Sepulveda in view of Marlow and Sheaffer teaches indicating difficulty hearing, but does not teach an avatar movement that reflects audio

However, Grossinger teaches avatar control using captured audio signals (see Grossinger Column 1, line 15)

It would have been obvious to one of ordinary skill in the art before the effective filing date of
the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal (as taught in Sepulveda), with changing a display of the character object in the video chat corresponding to the user terminal according to the specified state of the user terminal (as taught in Marlow), the motivation being to provide a visual cue in a video conference that quickly identifies the status of a user to express and display to other users (see Marlow Paragraphs [0002] – [0003]).
It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal in a video chat (as taught in Sepulveda in view of Marlow), with determining that a sound other than a user speaking reaches a certain level, in which a system determines that a user’s device is in a specific state and in response, changes a character object displayed on a screen showing that the sound is hard to hear (as taught in Sheaffer), the motivation being to implement corrective measures/notices to improve quality of audio (see Sheaffer Paragraphs [0050] and [0091]).
It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal in a video chat, in which a sound other than a user may reach a certain level, in which a system determines that a user’s device is in a specific state and in response, changes a character object displayed on a screen showing that the sound is hard to hear (as taught in Sepulveda in view of Marlow and Sheaffer), with avatar control using captured audio signals (as taught in Grossinger), the motivation being to additionally provide a visual signal to other users to reflect the audio, which improves the flow of the conversation and prevents disruptions (see Grossinger Abstract and Column 1, lines 40 – 45).

Regarding Claim 2, Sepulveda in view of Marlow, Sheaffer and Grossinger teaches 
The information processing system according to claim 1, wherein
when the one or more processors have not received the information related to the movement of the user from the user terminal, or when the received information related to the movement of the user satisfies a first condition, the one or more processors specify that the user terminal is in a first state (see Sepulveda Paragraph [0233], In some embodiments, (e.g., while the computer system is in the locked state) in response to a determination that user input has not been received (at the computer system) for a first predetermined period of time, the computer system (e.g., 600) displays, via the display generation component, an inactivity animation (e.g., as shown in FIG. 6F) of the first avatar (e.g., 610) (and, optionally, a second avatar (e.g., 612 or 614)) (e.g., an animation indicating inactivity for the first predetermined period of time has elapsed, an animation of the first (and/or second) avatar sleeping (e.g., eyes closed), and/or an animation of the first (and/or second) avatar being bored). In some embodiments, in response to a determination that user input has not been received for a predetermined period of time that is longer than the first predetermined period of time, changing (e.g., subsequent to displaying the inactivity animation of the first avatar) a display state of the display generation component (e.g., dimming the display and/or turning off the display). Animating the first avatar to indicate that no activity has been detected for the first predetermined period of time provides the user with visual feedback about the state of the computer system and that no inputs have been received. The animation of the first avatar that no activity has been detected for the first predetermined period of time also provides the user with a warning that the computer system will take further action (e.g., dim or turn off display) if no activity continues to be detected, which provides improved visual feedback).

Regarding Claim 3, Sepulveda in view of Marlow, Sheaffer and Grossinger teaches
The information processing system according to claim 2, wherein
the first condition is that the one or more processors continue to receive information related to a same movement for a predetermined period of time, or do not receive, for a predetermined period of time, information related to an amount of change in movement that is sent only when the movement changes (see Sepulveda Paragraph [0233], In some embodiments, (e.g., while the computer system is in the locked state) in response to a determination that user input has not been received (at the computer system) for a first predetermined period of time, the computer system (e.g., 600) displays, via the display generation component, an inactivity animation (e.g., as shown in FIG. 6F) of the first avatar (e.g., 610) (and, optionally, a second avatar (e.g., 612 or 614)) (e.g., an animation indicating inactivity for the first predetermined period of time has elapsed, an animation of the first (and/or second) avatar sleeping (e.g., eyes closed), and/or an animation of the first (and/or second) avatar being bored). In some embodiments, in response to a determination that user input has not been received for a predetermined period of time that is longer than the first predetermined period of time, changing (e.g., subsequent to displaying the inactivity animation of the first avatar) a display state of the display generation component (e.g., dimming the display and/or turning off the display). Animating the first avatar to indicate that no activity has been detected for the first predetermined period of time provides the user with visual feedback about the state of the computer system and that no inputs have been received. The animation of the first avatar that no activity has been detected for the first predetermined period of time also provides the user with a warning that the computer system will take further action (e.g., dim or turn off display) if no activity continues to be detected, which provides improved visual feedback).

Regarding Claim 4, Sepulveda in view of Marlow, Sheaffer and Grossinger teaches
The information processing system according to claim 2, wherein
when the one or more processors specify that the user terminal is in the first state, as a change in the display of the character object, the one or more processors attach a first specific object to the character object and/or apply a first specific movement to the character object (see Sepulveda Figure 6F, avatar representations of users are shown “sleeping”, and Paragraph [0233], In some embodiments, (e.g., while the computer system is in the locked state) in response to a determination that user input has not been received (at the computer system) for a first predetermined period of time, the computer system (e.g., 600) displays, via the display generation component, an inactivity animation (e.g., as shown in FIG. 6F) of the first avatar (e.g., 610) (and, optionally, a second avatar (e.g., 612 or 614)) (e.g., an animation indicating inactivity for the first predetermined period of time has elapsed, an animation of the first (and/or second) avatar sleeping (e.g., eyes closed), and/or an animation of the first (and/or second) avatar being bored). In some embodiments, in response to a determination that user input has not been received for a predetermined period of time that is longer than the first predetermined period of time, changing (e.g., subsequent to displaying the inactivity animation of the first avatar) a display state of the display generation component (e.g., dimming the display and/or turning off the display). Animating the first avatar to indicate that no activity has been detected for the first predetermined period of time provides the user with visual feedback about the state of the computer system and that no inputs have been received. The animation of the first avatar that no activity has been detected for the first predetermined period of time also provides the user with a warning that the computer system will take further action (e.g., dim or turn off display) if no activity continues to be detected, which provides improved visual feedback).

Regarding Claim 17, it is rejected similarly as Claim 1. The method can be found in Sepulveda (Paragraph [0006], method).

Regarding Claim 18, it is rejected similarly as Claim 1. The device can be found in Sepulveda (Figure 1A, device).

Claims 5 - 7, and 9 – 10, are rejected under 35 U.S.C. 103 as being unpatentable over Sepulveda et al. (U.S. Pub. No. 2022/0392132, hereinafter “Sepulveda”) in view of Marlow et al. (U.S. Pub. No. 2017/0332044, hereinafter “Marlow”), Sheaffer et al. (U.S. Pub. No. 2020/0105291, hereinafter “Sheaffer”), Grossinger et al. (U.S. Patent No. 11,276,215, hereinafter “Grossinger”) and Binder et al. (U.S. Patent No. 12,267,623, hereinafter "Binder").
Regarding Claim 5, Sepulveda in view of Marlow, Sheaffer and Grossinger teaches all the limitations of claim 4, but does not expressively teach
The information processing system according to claim 4, wherein
the first specific object is an object to indicate that the character object is not looking screen of the video chat, and the first specific movement is a movement to indicate that the character object is not looking at the screen of the video chat.

However, Binder teaches
The information processing system according to claim 4, wherein
the first specific object is an object to indicate that the character object is not looking screen of the video chat, and the first specific movement is a movement to indicate that the character object is not looking at the screen of the video chat (see Binder Column 37, lines 51 – 66, In some embodiments, rendering module 712 prevents device 604 from rendering certain types of visual features of avatar 800. In some embodiments, rendering module 712 predetermines such types of visual features. An example of such type of visual feature includes a predetermined type of pose of avatar 800. For example, rendering module 712 determines whether the pose data represents the predetermined type of pose, e.g., a pose corresponding to user 802/avatar 800 looking down at the floor, or any other pose other participant(s) in a communication session may perceive as rude or inattentive. In accordance with a determination that the pose data represents the predetermined type of pose, rendering module 712 causes device 604 to render avatar 800 in a modified manner using the pose data, e.g., such that rendered avatar 800 does not have the predetermined type of pose).

It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal in a video chat, in which a sound other than a user may reach a certain level, in which a system determines that a user’s device is in a specific state and in response, changes a character object displayed on a screen and avatar movement showing that the sound is hard to hear (as taught in Sepulveda in view of Marlow, Sheaffer and Grossinger), with specifying a state of a user to be reflected in a video conference avatar as not looking at their screen (as taught in Binder), the motivation being to quickly identify the status of a user to express and display to other users, to ensure a realistic and immersive avatar experience (see Binder Column 37, lines 51 – 66).

Regarding Claim 6, Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder teaches 
The information processing system according to claim 1, wherein
when the one or more processors have not received the information related to the sound from the user terminal, or when the received information related to the sound satisfies a second condition, the one or more processors specify that the user terminal is in a second state (see Binder Column 32, lines 1 - 23, visual feature module determines, based on data, one or more sets of data respectively representing different types of visual features of the avatar (such as types of facial features of the avatar). Example types of facial features include mouth movements and/or mouth features corresponding to user speech and facial movements and/or facial features corresponding to non-speech sounds, e.g., yawning, sneezing, coughing, laughing, crying, and the like. Such visual features of the avatar may correspond to respective visual features of the user of device 604. In particular, the sensor(s) of sensor unit 606 may detect data from which visual feature module 702 determines the user's facial features. Accordingly, by animating the avatar according to the determined visual feature(s), the avatar provides a semi-realistic live depiction of the user, e.g., has mouth movement analogous to the user's mouth movement, and has facial features representing the user's emotional state. Therefore, if the user is detected to not have mouth movements, and does not have corresponding speech, then the user would be determined to be in a non-speaking state).

Regarding Claim 7, Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder teaches
The information processing system according to claim 6, wherein
when the one or more processors specify that the user terminal is in the second state, as a change in the display of the character object, the one or more processors attach a second specific object to the character object and/or apply a second specific movement to the character object (see Binder Column 37, lines 27 - 36, in accordance with speech detection module 714 determining that user 802 is not speaking, device 604 renders avatar 800 using the pose data, but without using the mouth movement data, the non-speech movement data, and/or the emotion data (data determined based on the audio data stream). This can prevent, for instance, incorrectly rendering avatar 800's mouth movement and/or emotional state consistent with background speech that does not correspond to user 802's mouth movement and/or emotional state).

Regarding Claim 9, Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder teaches
The information processing system according to claim 1, wherein
when the one or more processors receive information indicating that a specific application is running or being displayed at the user terminal, the one or more processors specify that the user terminal is in a third state (see Binder Figure 4A and Column 6, lines 34 – 44, device supports a variety of applications, including a digital music player application and Column 37, lines 8 – 20, Determining whether user 802 is speaking may distinguish user 802's speech and sounds from background noise and speech. For example, if the audio data stream indicates user 802's speech or sound, but the vibration data stream does not, speech detection module 714 determines that user 802 is not speaking or making sound. Specifically, user 802's speech or sounds may cause both the audio data stream and the vibration data stream to indicate speech or sound, e.g., by causing detectable vibrations of user 802's skull bones (e.g., of a certain degree and/or type). Accordingly, if the audio data stream indicates user speech or sound, but the vibration data stream does not, the audio may be background speech or noise).

Regarding Claim 10, Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder teaches
The information processing system according to claim 9, wherein
when the one or more processors receive information indicating that a music playback application is running at the user terminal, the one or more processors specify that the user terminal is in the third state (see Binder Figure 4A and Column 6, lines 34 – 44, device supports a variety of applications, including a digital music player application and Column 37, lines 8 – 20, Determining whether user 802 is speaking may distinguish user 802's speech and sounds from background noise and speech. For example, if the audio data stream indicates user 802's speech or sound, but the vibration data stream does not, speech detection module 714 determines that user 802 is not speaking or making sound. Specifically, user 802's speech or sounds may cause both the audio data stream and the vibration data stream to indicate speech or sound, e.g., by causing detectable vibrations of user 802's skull bones (e.g., of a certain degree and/or type). Accordingly, if the audio data stream indicates user speech or sound, but the vibration data stream does not, the audio may be background speech or noise).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Sepulveda et al. (U.S. Pub. NO. 2022/0392132, hereinafter “Sepulveda”) in view of Marlow et al. (U.S. Pub. No. 2017/0332044, hereinafter “Marlow”), Sheaffer et al. (U.S. Pub. No. 2020/0105291, hereinafter “Sheaffer”), Grossinger et al. (U.S. Patent No. 11,276,215, hereinafter “Grossinger”), Binder et al. (U.S. Patent No. 12,267,623, hereinafter "Binder") and Chavez et al. (U.S. Patent No. 11,082,465, hereinafter "Chavez").
Regarding Claim 8, Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder teaches
The information processing system according to claim 7, wherein
the second specific movement is a movement to indicate that the character object is not speaking (see Binder Column 37, lines 27 - 36, in accordance with speech detection module 714 determining that user 802 is not speaking, device 604 renders avatar 800 using the pose data, but without using the mouth movement data, the non-speech movement data, and/or the emotion data (data determined based on the audio data stream). This can prevent, for instance, incorrectly rendering avatar 800's mouth movement and/or emotional state consistent with background speech that does not correspond to user 802's mouth movement and/or emotional state).

Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder does not expressively teach
the second specific object is an object to indicate that the character object is not speaking

However, Chavez teaches
the second specific object is an object to indicate that the character object is not speaking (see Chavez Figure 6A, in which Participant 102D does have background noise which is shown as a symbol in respective user tile (item 602D), and the avatar has a straight face indicating they are not speaking because the mouth is not moving)

It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal in a video chat, in which a sound other than a user may reach a certain level, in which a system determines that a user’s device is in a specific state and in response, changes a character object displayed on a screen and avatar movement showing that the sound is hard to hear, and specifying an additional state of a user to be reflected in a video conference avatar as not looking at their screen by displaying a corresponding object and avatar movement (as taught in Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder), with a specific object displayed indicating a user is not speaking (as taught in Chavez), the motivation being to clarify to other participants that any audio originating from said user is background and erroneous audio to therefore avoid interruption and confusion (see Chavez Column 1, lines 36 - 47).

Claims 11 – 12 are rejected under 35 U.S.C. 103 as being unpatentable over Sepulveda et al. (U.S. Pub. NO. 2022/0392132, hereinafter “Sepulveda”) in view of Marlow et al. (U.S. Pub. No. 2017/0332044, hereinafter “Marlow”), Sheaffer et al. (U.S. Pub. No. 2020/0105291, hereinafter “Sheaffer”), Grossinger et al. (U.S. Patent No. 11,276,215, hereinafter “Grossinger”),  Binder et al. (U.S. Patent No. 12,267,623, hereinafter "Binder”) and Jolliff et al. (U.S. Pub. No. 2009/0300525, hereinafter "Jolliff").
Regarding Claim 11, Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder teaches all the limitations of Claim 10, but does not expressively teach 
The information processing system according to claim 10, wherein
when the one or more processors specify that the user terminal is in the third state, as a change in the display of the character object, the one or more processors attach a third specific object to the character object and/or apply a third specific movement to the character object.

However, Jolliff teaches
The information processing system according to claim 10, wherein
when the one or more processors specify that the user terminal is in the third state, as a change in the display of the character object, the one or more processors attach a third specific object to the character object and/or apply a third specific movement to the character object (see Jolliff Paragraph [0059], As another example, background noise may be monitored (e.g., using the mobile device's microphone) for music and other sounds that may be used to infer the user's mood. For example, if the background noise includes music with added up-tempo beat, an avatar expressing a happy mood may be selected. As this example illustrates, by increasing the number of sensors used and the variety of information considered, a system can better infer the user's current status).

It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal in a video chat, in which a sound other than a user may reach a certain level, in which a system determines that a user’s device is in a specific state and in response, changes a character object displayed on a screen and avatar movement showing that the sound is hard to hear, and specifying an additional state of a user to be reflected in a video conference avatar as indicating a user terminal is running a music playback application (as taught in Sepulveda in view of Marlow, Sheaffer, Grossinger and Binder), with a representative object and/or movement to signify a user has background noise such as music (as taught in Jolliff), the motivation being to render an avatar's movements and/or expressions that graphically depicts the status of the user (see Jolliff, Paragraphs [0058] - [0059]).

Regarding Claim 12, Sepulveda in view of Marlow, Sheaffer, Grossinger, Binder and Jolliff teach
The information processing system according to claim 11, wherein
the third specific object is an object to indicate that the character object is listening to music (see Jolliff Paragraph [0059], As another example, background noise may be monitored (e.g., using the mobile device's microphone) for music and other sounds that may be used to infer the user's mood. For example, if the background noise includes music with added up-tempo beat, an avatar expressing a happy mood may be selected (object indicator being a smiling face). As this example illustrates, by increasing the number of sensors used and the variety of information considered, a system can better infer the user's current status), and
the third specific movement is a movement to indicate that the character object is listening to music (see Jolliff Paragraph [0059], As another example, background noise may be monitored (e.g., using the mobile device's microphone) for music and other sounds that may be used to infer the user's mood. For example, if the background noise includes music with added up-tempo beat, an avatar expressing a happy mood may be selected (movement indicator being the avatar's facial movement changing to a happy face). As this example illustrates, by increasing the number of sensors used and the variety of information considered, a system can better infer the user's current status).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Sepulveda et al. (U.S. Pub. NO. 2022/0392132, hereinafter “Sepulveda”) in view of Marlow et al. (U.S. Pub. No. 2017/0332044, hereinafter “Marlow”), Sheaffer et al. (U.S. Pub. No. 2020/0105291, hereinafter “Sheaffer”), Grossinger et al. (U.S. Patent No. 11,276,215, hereinafter “Grossinger”) and Chavez et al. (U.S. Patent No. 11,082,465, hereinafter "Chavez").
Regarding Claim 16, Sepulveda in view of Marlow, Sheaffer and Grossinger teach all the limitations of claim 1, but do not expressively teach
The information processing system according to any of claim 1, wherein
when the one or more processors specify that the user terminal is in the fourth state, the one or more processors generate the video without including information related to the sound when the volume of the other sound is greater than or equal to a second value.

However, Chavez teaches
The information processing system according to any of claim 13, wherein
when the one or more processors specify that the user terminal is in the fourth state, the one or more processors generate the video without including information related to the sound when the volume of the other sound is greater than or equal to a second value (see Chavez Column 4, lines 35 – 42, the conference server may determine if a participant's endpoint should automatically be muted or automatically notified to go on mute in response to determining that the audio portion from an endpoint is extraneous to the video conference (e.g., the participant's speech is not intended for the video conference, speech is indiscernible, audio comprises background noise, etc.) and Column 5, lines 66 – 67 and Column 6, lines 1 – 3, Aspects of any one or more of the foregoing embodiments include the video conference server to automatically mute an endpoint associated with the corresponding audio portion when a confidence score is above a threshold, therefore sound is not included from user since they are muted).

It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal in a video chat, in which a sound other than a user may reach a certain level, in which a system determines that a user’s device is in a specific state and in response, changes a character object displayed on a screen and avatar movement showing that the sound is hard to hear (as taught in Sepulveda in view of Marlow, Sheaffer and Grossinger), with generating a video without including information related to sound when the volume of the other sound is greater than or equal to a second value (as taught in Chavez), the motivation being to avoid erroneous and background audio by preventing it from being transmitted with a user’s video to therefore avoid interruption and confusion (see Chavez Column 1, lines 36 - 47).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Sepulveda et al. (U.S. Pub. NO. 2022/0392132, hereinafter “Sepulveda”) in view of Marlow et al. (U.S. Pub. No. 2017/0332044, hereinafter “Marlow”), Sheaffer et al. (U.S. Pub. No. 2020/0105291, hereinafter “Sheaffer”), Grossinger et al. (U.S. Patent No. 11,276,215, hereinafter “Grossinger”) and Sangberg et al. (U.S. Pub. No. 2009/0002479, hereinafter "Sangberg").
Regarding Claim 19, Sepulveda in view of Marlow, Sheaffer and Grossinger teach all the limitations of claim 1, but do not expressively teach
The information processing system according to claim 1, wherein the fourth specific object is attached to the ears of the character object, and the fourth specific movement is covering the ears of the character object.

However, Sangberg teaches
The information processing system according to claim 1, wherein the fourth specific object is attached to the ears of the character object, and the fourth specific movement is covering the ears of the character object (see Sangberg Paragraph [0063], The image processor 132 displays (Block 316) the avatar on the display 134. During the ongoing videoconference, one or more portrait commands are received from the first communication terminal 110 via the transceiver 142 and the communication controller 140. The operations database 156 contains groups of operations that are configured to carry out different avatar modifications, such as to modify the mouth on the avatar from smiling to frowning, to open and close the avatar's mouth, to blink the avatar's eyes, to cover the avatar's ears with hands, to add/remove computer-generated sunglasses to the avatar, etc.).

It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a custom video conference avatar generated using user terminal data and specifying a state of the user terminal in a video chat, in which a sound other than a user may reach a certain level, in which a system determines that a user’s device is in a specific state and in response, changes a character object displayed on a screen and avatar movement showing that the sound is hard to hear (as taught in Sepulveda in view of Marlow, Sheaffer and Grossinger), with a representative object in a video conference being attached, and covering, the ears of an avatar (as taught in Sangberg), the motivation being to communicate an emotion, status, or other signals through an avatar’s movements to provide a visual cue to other users (see Sangberg Paragraphs [0064] – [0065]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Refer to PTO-892, Notice of References Cited for a listing of analogous art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARISSA A JONES whose telephone number is (703)756-1677. The examiner can normally be reached Telework M-F 6:30 AM - 4:00 PM CT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on 5712727503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CARISSA A JONES/               Examiner, Art Unit 2691     

/DUC NGUYEN/               Supervisory Patent Examiner, Art Unit 2691
Read full office action
Prosecution Timeline

Show 4 earlier events
Jul 14, 2025
Response Filed
Sep 18, 2025
Non-Final Rejection mailed — §103
Dec 17, 2025
Response Filed
Jan 06, 2026
Final Rejection mailed — §103
Mar 19, 2026
Response after Non-Final Action
Apr 06, 2026
Request for Continued Examination
Apr 07, 2026
Response after Non-Final Action
Apr 20, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/299,777
Patent 12598267
IMAGE CAPTURE APPARATUS AND CONTROL METHOD
2y 12m to grant Granted Apr 07, 2026
18/354,967
Patent 12598354
INFORMATION PROCESSING SERVER, RECORD CREATION SYSTEM, DISPLAY CONTROL METHOD, AND NON-TRANSITORY RECORDING MEDIUM
2y 8m to grant Granted Apr 07, 2026
18/124,682
Patent 12593004
DISPLAY METHOD, DISPLAY SYSTEM, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM STORING PROGRAM
3y 0m to grant Granted Mar 31, 2026
18/163,371
Patent 12556468
QUALITY TESTING OF COMMUNICATIONS FOR CONFERENCE CALL ENDPOINTS
3y 0m to grant Granted Feb 17, 2026
18/297,357
Patent 12556655
Efficient Detection of Co-Located Participant Devices in Teleconferencing Sessions
2y 10m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

4-5
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+23.5%)
2y 7m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 26 resolved cases by this examiner. Grant probability derived from career allowance rate.