Last updated: April 19, 2026
Application No. 18/055,591
Adaptively Muting Audio Transmission of User Speech for Assistant Systems

Final Rejection §103
Filed
Nov 15, 2022
Examiner
SCHMIEDER, NICOLE A K
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Meta Platforms Inc.
OA Round
4 (Final)
Interview Optional

— +34.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 167 resolved cases, 2023–2026
Examiner Intelligence

SCHMIEDER, NICOLE A K View full profile →
Grants 68% — above average
Career Allow Rate
113 granted / 167 resolved
+5.7% vs TC avg
Strong +34% interview lift
Without
With
+34.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
25 currently pending
Career history
192
Total Applications
across all art units
Statute-Specific Performance

§101
21.9%
-18.1% vs TC avg
§103
46.7%
+6.7% vs TC avg
§102
13.0%
-27.0% vs TC avg
§112
13.9%
-26.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 167 resolved cases
Office Action

§103
DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on   11/25/2025. 
Claims 1-21 are pending and have been examined.
All previous objections/rejections not mentioned in this Office Action have been withdrawn by the examiner. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 20, and 21 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Please see the updated mappings below citing Kumar as teaching the determination that a user has completed speaking to the assistant system based on the content of the subsequent speech, in combination with Voight teaching that audio transmissions are sent from the user to second users in response to determining that the user has completed speaking to the assistant system. 
Claim Objections
Claims 1, 20, and 21 are objected to because of the following informalities: Claims 1, 20, and 21 recite “one or more of the second users” in lines 15, 16, and 17, respectively. The Examiner suggests amending the claims to recite –the one or more of the second users—in order to maintain clear antecedent basis.  Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-8, 11-14, and 17-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (U.S. PG Pub No. 2021/0120206), as found in the IDS, hereinafter Liu, in view of Kumar (U.S. PG Pub No. 2022/0383871), hereinafter Kumar, in view of Voight et al. (U.S. Patent No. 10,250,973), hereinafter Voight, and further in view of Lovitt et al. (U.S. PG Pub No. 2018/0366118), hereinafter Lovitt.

Regarding claims 1, 20, and 21, Liu teaches
(claim 1) A method comprising, by one or more computing systems (a computer system performing methods [0153]):
(claim 20) One or more computer-readable non-transitory storage media embodying software that is operable when executed to (the system includes memory, i.e. storage media, for storing instructions for the processor to execute, i.e. embodying software that is operable when executed to [0155-7]):
(claim 21) A system comprising (computer system [0153]):
(claim 21) one or more processors (system includes a processor [0155]); and
(claim 21) a non-transitory memory coupled to the processors comprising first instructions executable by the processors, the processors operable when executing the first instructions to (the system includes memory, i.e. memory, for storing instructions for the processor to execute, i.e. first instructions executable by the processors, the processors operable when executing the first instructions to [0155-7]):

receiving, from a client system associated with a user during a dialog session, a first speech input from the user, wherein the user is in an audio communication with one or more second users (a user in a video call with other users, i.e. the first user is in an audio communication with one or more second users, the user has a first client system, and the assistant system receives from the first client system a spoken input from the user, i.e. receiving from a client system associated with a user during a dialog session a first speech input from the user [0099-100]);
determining, based on contextual information associated with the first(the assistant system may receive and detect a wake-word for the assistant, i.e. determining based on contextual information associated with the first speech input during the dialog session, indicating the user is about to make a request of the assistant, i.e. determining…an intent of the user to speak to an assistant system [0099-100]);
in response to determining, based on the contextual information, that the intent of the user is to speak to the assistant system without speaking to the one or more second users:
sending, to the client system, instructions for muting audio transmission of subsequent speech inputs from the user during the dialog session to the one or more of the second users in the audio communication (in response to the assistant system receiving the wake-word from the first client system in addition to the explicit instruction ‘mute me’, i.e. in response to determining based on the contextual information that the intent of the user is to speak to the assistant system without speaking to the one or more second users, the assistant system sends instructions to the first client system, i.e. sending to the first client system instructions, to mute the video call at the first client system so that other users participating in the video call on their respective client systems cannot hear the first user’s request, i.e. instructions for muting audio transmission of subsequent speech inputs from the user during the dialog session to the one or more of the second users in the audio communication [0099-100]). 
While Liu provides detecting a wake-word to start an interaction with an assistant, Liu does not specifically teach determining that the user has completed speaking to the assistant system based on the content of subsequent speech inputs, and thus does not teach
in response to determining, based on content of the subsequent speech inputs, that the user has completed speaking to the assistant system without speaking to the one or more second users….
Kumar, however, teaches in response to determining, based on content of the subsequent speech inputs, that the user has completed speaking to the assistant system without speaking to the one or more second users…(the user may invoke the virtual assistant by speaking an invocation input, where the voice input of the first user to the communication session is muted based on receiving the communication input so as to not interrupt the audio of the communication session, i.e. user…speaking to the assistant system without speaking to the one or more second users, and the virtual assistant remains active during the communication session to receive additional voice commands until the user speaks a revocation input, such as “revoke virtual assistant”, i.e.  in response to determining based on content of the subsequent speech inputs that the user has completed speaking to the assistant system without speaking to the one or more second users [0025],[0029-30],[0044-5]).
Liu and Kumar are analogous art because they are from a similar field of endeavor in managing audio and interactions with voice assistants during calls with multiple participants. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the detecting a wake-word to start an interaction with an assistant teachings of Liu with the detection of a revocation input to deactivate the virtual assistant as taught by Kumar. It would have been obvious to combine the references to enable the virtual assistant to be deactivated to maintain privacy/abide by privacy laws by preventing the virtual assistant from having access to content and/or metadata associated with the communication session (Kumar [0027],[0045]).
While Liu in view of Kumar provides revoking a virtual assistant using a spoken input, Liu in view of Kumar does not specifically teach sending instructions for audio transmission of subsequent speech inputs to the second users after completing speaking with the assistant system, and thus does not teach
in response to determining…that the user has completed speaking to the assistant system without speaking to the one or more second users, sending, to the client system, instructions for audio transmission of further subsequent speech inputs from the user to one or more of the second users in the audio communication.
Voight, however, teaches in response to determining…that the user has completed speaking to the assistant system without speaking to the one or more second users, sending, to the client system, instructions for audio transmission of further subsequent speech inputs from the user to one or more of the second users in the audio communication (when the user is facing the direction of the VPA and not the call, the outbound call audio is muted and the microphone output is routed to the VPA, i.e. the user…speaking to the assistant system without speaking to the one or more second users, and when the user is looking in a direction of an active call and not the VPA, i.e. in response to determining…that the user has completed speaking to the assistant system without speaking to the one or more second users, the mic output audio is not muted and routed to the call associated with that direction (3:3-18),(3:56-4:3)).
Liu, Kumar, and Voight are analogous art because they are from a similar field of endeavor in providing in-call virtual assistant help to users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the revoking a virtual assistant using a spoken input teachings of Liu, as modified by Kumar, with performing an action to mute and unmute microphone output depending on whether or not the user is engaging the VPA as taught by Voight. enable the user to direct their speech to different audiences by changing where they are looking (Voight (3:3-37)).
While Liu in view of Kumar and Voight provides muting the user so that they can maintain user privacy, Liu in view of Kumar and Voight does not specifically teach determining that the user wishes to speak to the assistant while continuing to speak to the other users, and thus does not teach
in response to determining, based on the contextual information, that the intent of the user is to speak to the assistant system while continuing to speak to the one or more second users, sending, to the client system, sending instructions for audio transmission of the subsequent speech inputs from the user during the dialog session to the one or more of the second users in the audio communication. 
Lovitt, however, teaches in response to determining, based on the contextual information, that the intent of the user is to speak to the assistant system while continuing to speak to the one or more second users, sending, to the client system, sending instructions for audio transmission of the subsequent speech inputs from the user during the dialog session to the one or more of the second users in the audio communication (a rendering policy may be defined that automatically directs certain types of information only to a requestor, i.e. the intent of the user is to speak to the assistant system without speaking to the one or more second users, or to include others, i.e. the intent of the user is to speak to the assistant system while continuing to speak to the one or more second users, such as when a user says “tell me” versus “tell us”, i.e. in response to determining, based on the contextual information, where the virtual assistant may allow user utterances and responses to be presented to the other participants, i.e. sending, to the client system, sending instructions for audio transmission of the subsequent speech inputs from the user during the dialog session to the one or more of the second users in the audio communication, or may provide instructions to mute the user, present different audio to the other participants, or establish a secondary communication channel, depending on the rendering policy and the target indicators in the utterances [0027],[0043],[0056-60]).
Liu, Kumar, Voight, and Lovitt are analogous art because they are from a similar field of endeavor in managing audio and interactions with voice assistants during calls with multiple participants. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the muting the user so that they can maintain user privacy teachings of Liu, as modified by Kumar and Voight, with adjusting how utterances and responses are presented to different participants based on rendering policies and information related to the queries as taught by Lovitt. It would have been obvious to combine the references to enable use of rendering policies for different types of information that can be customized by the user and enable compartmentalization of sensitive information (Lovitt [0043]).
Regarding claim 2, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Liu further teaches 
receiving, from the client system, a non-speech input from the user, wherein: 
the non-speech input is associated with the first speech input; 
the non-speech input comprises one or more of a pose, a gaze, a touch input, or a gesture…(the assistant system may detect a gaze of the first user during the video call, i.e. receiving…a non-speech input from the first user… wherein the non-speech input comprises one or more of a pose, a gaze, a touch input, or a gesture, where nonverbal user input is received from a client system associated with the user, i.e. receiving from the client system a non-speech input from the first user, and where the context engine indicates properties of a scene gathered during the video call, as well as the first user, i.e. wherein the non-speech input is associated with the first portion of the speech input [0049],[0101],[0107-8]);
the contextual information includes the non-speech input (where the request of a user may be inferred based on a gaze, i.e. contextual information includes the non-speech input). 
Where Lovitt further teaches wherein the non-speech input is configured to activate the assistant system (the user can press a hardware button or activate a software UI element to identify when the user is presenting a spoken utterance to a virtual assistant, i.e. the non-speech input is configured to activate the assistant system [0054]).
Where the motivation to combine is the same as previously presented.
Regarding claim 3, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Liu further teaches
the first speech input includes a wake-word associated with the assistant system (the assistant system may receive and detect a wake-word for the assistant, i.e. a wake-word associated with the assistant system, indicating the user is about to make a request of the assistant, i.e. first speech input [0099-100]).  

Regarding claim 4, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Lovitt further teaches 
instructions for providing blank audio data replacing the subsequent speech inputs from the user during the dialog session (blocking of a command or query such as not providing audio after the “Hey Cortana” to the other participants, i.e. subsequent speech inputs from the user during the dialog session, may include instructions to present different audio, such as silence, to the other participants, i.e. instructions for providing blank audio data replacing the subsequent speech inputs from the user [0060]).  
Where the motivation to combine is the same as previously presented.

Regarding claim 5, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Liu further teaches
the audio communication is based on a first application running on the client system having access to audio signals captured by a microphone of the client system (user input received by microphones of the client system, i.e. audio signals captured by a microphone of the first client system, may be sent by an RTC module to a video communication system, i.e. the multi-channel audio communication is based on a first application running on the first client system having access to audio signals [0089]), wherein the audio signals captured by the microphone of the client system are accessible to one or more second applications running on the client system (user input received by microphones of the client system, i.e. audio signals captured by a microphone of the first client system, are sent to an RTC module, where the RTC module may deliver audio packets to the video communication system, and may further call a relevant app, such as the client-assistant service module to analyze the user input, i.e. accessible to one or more second applications running on the first client system [0089-91]).  

Regarding claim 6, Liu in view of Kumar, Voight, and Lovitt teaches claim 5, and Lovitt further teaches
in response to the determining, based on the contextual information, that the intent of the user is to speak to the assistant system without speaking to the one or more second users, sending, to the client system, instructions for suspending access of the one or more second applications to subsequent audio signals corresponding to the subsequent(a rendering policy may be defined that automatically directs certain types of information only to a requestor, i.e. the intent of the user is to speak to the assistant system without speaking to the one or more second users, or to include others, such as when a user says “tell me” versus “tell us”, i.e. in response to determining, based on the contextual information, where the virtual assistant may provide instructions to mute the user, present different audio to the other participants, or establish a secondary communication channel, depending on the rendering policy and the target indicators in the utterances, i.e. sending, to the client system, instructions for suspending access of the one or more second applications to subsequent audio signals corresponding to the subsequent speech inputs from the user during the dialog session [0027],[0043],[0056-60]).  
Where the motivation to combine is the same as previously presented.

Regarding claim 7, Liu in view of Kumar, Voight, and Lovitt teaches claim 5, and Lovitt further teaches
in response to the determining, based on the contextual information, that the intent of the user is to speak to the assistant system without speaking to the one or more second users, sending, to the client system, instructions for providing blank audio data replacing the subsequent speech inputs from the (a rendering policy may be defined that automatically directs certain types of information only to a requestor, i.e. the intent of the user is to speak to the assistant system without speaking to the one or more second users, or to include others, such as when a user says “tell me” versus “tell us”, i.e. in response to determining based on the contextual information, where the virtual assistant may provide instructions to mute the user, present different audio such as silence to the other participants, i.e. providing blank audio data replacing the subsequent speech inputs from the user during the dialog session, or establish a secondary communication channel, depending on the rendering policy and the target indicators in the utterances, i.e. sending to the client system instructions for providing blank audio data…to the one or more second applications [0027],[0043],[0056-60]). 
Where the motivation to combine is the same as previously presented.

Regarding claim 8, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Liu further teaches
the contextual information comprises one or more of (the context engine may analyze properties of a scene of a video call and enter the context data into a chart for later use in responding to a user query, i.e. contextual information  [0107-9]:
a virtual location associated with the user;
a real-world location associated with the user (the user location [0107-9]);
a volume associated with the first speech input;
a tone associated with the first speech input;
an attention state of the user associated with the first speech input (eye tracking/gaze estimation of the user during the video call [0107-9]);
an eye gaze of the user associated with the first speech input (eye tracking/gaze estimation of the user during the video call [0107-9]);
a pose of the first user associated with the first portion of the speech input;
or a degree of separation between the first user and each of the one or more second users on social network.  

Regarding claim 11, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Liu further teaches
generating, by the assistant system, a response to one or more of the first speech input and the subsequent speech inputs (the assistant system may generate a personalized communication content for the user, i.e. generating…a response, that comprises the retrieved information or status of services from the user request, i.e. one or more of the first speech input and the subsequent speech inputs [0037-8],[0100-1]);
sending, to the client system, instructions for presenting the response to the user (the assistant system may present the response to the user in text and/or images on a display, i.e. sending…instructions for presenting the response to the user, of the client system, i.e. to the client system [0037-8]).  
Where Voight further teaches sending, to the client system, instructions for muting audio transmission from the one or more second users to the user in the audio communication during the dialog session (when the user is facing the direction of the VPA and not the call, the microphone output is routed to the VPA, i.e. during the dialog session, and the call audio from the call is ducked by having a decreased volume or being muted, i.e. muting audio transmission from the one or more second users to the user in the audio communication, which is controlled by the VUI, i.e. sending, to the client system, instructions (3:3-18),(3:56-4:3)).
And where the motivation to combine is the same as previously presented.

Regarding claim 12, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Liu further teaches
generating, by the assistant system, another response to one or more of the first the speech input and the subsequent speech inputs (the assistant system may generate a personalized communication content for the user, i.e. generating…a response, that comprises the retrieved information or status of services from the user request, i.e. response to one or more of the first the speech input and the subsequent speech inputs [0037-8],[0100-1]);
 sending, to the first client system, instructions for presenting the other response to the user (the assistant system may present the response to the user in text and/or images on a display, i.e. sending…instructions for presenting the response to the first user, of the client system, i.e. to the first client system [0037-8]).  
Where Voight further teaches sending, to the client system, instructions for reducing a volume of one or more audio transmissions from the one or more second users to the user in the audio communication to a predetermined volume during the dialog session (when the user is facing the direction of the VPA and not the call, the microphone output is routed to the VPA, i.e. during the dialog session, and the call audio from the call is ducked by having a decreased volume or being muted, i.e. reducing a volume of one or more audio transmissions from the one or more second users to the user in the audio communication, where the volume is adjusted to compensate for the direction the user is looking, such as when mixing VPA responses into the call audio, i.e. predetermined volume, which is controlled by the VUI, i.e. sending, to the client system, instructions (3:3-18),(3:56-4:24)).
And where the motivation to combine is the same as previously presented.

Regarding claim 13, Liu in view of Kumar, Voight, and Lovitt, teaches claim 1, and Liu further teaches
wherein determining, based on the content of the subsequent speech inputs, that the user has completed speaking to the assistant system without speaking to the one or more second user includes a detection of an endpoint of the subsequent speech inputs (the ASR module includes an end-pointing model that may detect when the end of an utterance is reached, i.e. detection of an endpoint of the subsequent speech inputs, to select the text that corresponds to the audio input, i.e. based on the content of the subsequent speech inputs [0052]).
Where Kumar teaches that the system uses speech recognition techniques and recognizes a spoken revocation input for the virtual assistant (see [0025],[0029-30],[0044-5],[0064].
And where the motivation to combine is the same as previously presented.

Regarding claim 14, Liu in view of Kumar, Voight, and Lovitt, teaches claim 1, and Kumar further teaches
determining that the first user has completed speaking to the assistant system without speaking to the one or more second users is further based on one or more of: …a touch input performed by the user… (the user may provide a revocation input to deactivate the virtual assistant, i.e. determining that the first user has completed speaking to the assistant system without speaking to the one or more second users, in the same manner as invoking the virtual assistant, such as input from a clickable input element of a graphical user interface, i.e. touch input performed by the user [0023]).
Where Voight further teaches a change of a pose of the user, a change of a gaze of the user (when the user is facing the direction of the VPA, the VUI mutes the mic output to the call, and when the user returns to the conversation, i.e. determining that the first user has completed speaking to the assistant, by turning their head to look in the direction of the call, i.e. based on one or more of a change of a pose, a gaze…of the user (3:3-36),(3:56-4:24)).  
Where Liu further teaches activation of an assistant system may be based on a user’s gaze or a user gesture [0049].
Where Kumar [0023] and Voight (3:3-36),(3:56-4:24) both teach the ending of an interaction with a virtual assistant based on non-verbal input from a user.
And where the motivation to combine is the same as previously presented.

Regarding claim 17, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Liu further teaches 
the audio communication is based on a first application having access to audio signals captured by a microphone of the client system (user input received by microphones of the client system, i.e. audio signals captured by a microphone of the client system, may be sent by an RTC module to a video communication system, i.e. the audio communication is based on a first application running on the first client system having access to audio signals [0089]), wherein the audio signals captured by the microphone of the client system are accessible to one or more second applications running on the client system (user input received by microphones of the client system, i.e. audio signals captured by a microphone of the client system, are sent to an RTC module, where the RTC module may deliver audio packets to the video communication system, and may further call a relevant app, such as the client-assistant service module to analyze the user input, i.e. accessible to one or more second applications running on the client system [0089-91]).  

Regarding claim 18, Liu in view of Kumar, Voight, and Lovitt teaches claim 17, and Voight further teaches
in response to the determining that the user has completed speaking to the assistant system without speaking to the one or more second users, sending, to the client system, instructions for resuming access of the one or more second applications to further subsequent audio signals corresponding to the further subsequent speech inputs to be captured by the microphone of the client system (different configured directions in which the user can look are associated with different voice-based interfaces, i.e. second applications, and when a user looks in the direction of the VPA, the utterance is routed by the VUI to the VPA, and the mic output to the other directions associated with the other voice-based interfaces is muted, and when the user returns to the conversation by looking in the direction of the conversation, i.e. in response to the determining that the user has completed speaking to the assistant system without speaking to the one or more second users, the microphone signals are routed to the call, i.e. sending, to the client system, instructions for resuming access of the one or more second applications to further subsequent audio signals corresponding to the further subsequent speech inputs to be captured by the microphone of the client system (3:3-37),(3:56-4:3)).  
Where the motivation to combine is the same as previously presented.

Regarding claim 19, Liu in view of Kumar, Voight, and Lovitt teaches claim 17, and Voight further teaches
in response to the determining that the user has completed speaking to the assistant system without speaking to the one or more second users, sending, to the client system, instructions for providing live audio data of the further subsequent speech inputs from the user to the one or more second applications (different configured directions in which the user can look are associated with different voice-based interfaces, i.e. second applications, and when a user looks in the direction of the VPA, the utterance is routed by the VUI to the VPA, and the mic output to the other directions associated with the other voice-based interfaces is muted, and when the user returns to the conversation by looking in the direction of the conversation, i.e. in response to the determining that the user has completed speaking to the assistant system without speaking to the one or more second users, the microphone signals are routed to the call, i.e. sending to the first client system…instructions for providing live audio data of the further subsequent speech input from the user to the one or more second applications (3:3-37),(3:56-4:3)).  
Where the motivation to combine is the same as previously presented.

Claim(s) 9 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Liu, in view of Kumar, in view of Voight, in view of Lovitt, and further in view of Hassan et al. (U.S. PG Pub No. 2024/0031533), hereinafter Hassan.

Regarding claim 9, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Lovitt further teaches
in response to the determining that the intent of the user is to speak to the assistant system without speaking to the one or more second users, sending, to the client system, instructions for presenting an indication…that the audio transmission to the one or more second users in the audio communication is –altered-- (a rendering policy may be defined that automatically directs certain types of information only to a requestor, i.e. the intent of the user is to speak to the assistant system without speaking to the one or more second users, or to include others, such as when a user says “tell me” versus “tell us”, i.e. in response to determining based on the contextual information, where the virtual assistant may provide instructions to mute the user and present different audio to the other participants, such as a spoken indication that the requester is interacting with the virtual assistant, i.e. instructions for presenting an indication … that the audio transmission to the one or more second users in the audio communication is altered, or establish a secondary communication channel, depending on the rendering policy and the target indicators in the utterances, i.e. sending to the client system instructions [0027],[0043],[0056-60]). 
While Liu in view of Kumar, Voight, and Lovitt provides an indication to the other participants that the user is interacting with a virtual assistant, Liu in view of Kumar, Voight, and Lovitt does not specifically teach presenting an indication to the user that the user is muted, and thus does not teach
sending, to the client system, instructions for presenting an indication to the user that the audio transmission to the one or more second users in the audio communication is muted.  
Hassan, however, teaches sending, to the first client system, instructions for presenting an indication to the first user that the audio transmission to one or more of the second users in the multi-channel audio communication is muted (the GUI shows the user, i.e. sending to the first client system instructions for presenting an indication, when they are muted, such that their voice is not transmitted to other client devices, i.e. indication to the first user that the audio transmission to one or more of the second users in the multi-channel audio communication is muted Fig. 5C,5E,[0037-8],[0043],[0059]).  
Liu, Kumar, Voight, Lovitt, and Hassan are analogous art because they are from a similar field of endeavor in managing audio data during communications between multiple users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the indication that the user is interacting with a virtual assistant teachings of Liu, as modified by Kumar, Voight, and Lovitt, with providing a visual notification to the user that they are muted as taught by Hassan. It would have been obvious to combine the references to enable a system that can notify a participant when they are muted while speaking and allow for automatically unmuting the user (Hassan [0038],[0059]).

Regarding claim 10, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Lovitt further teaches
in response to the determining that the intent of the user is to speak to the assistant system without speaking to the one or more second users, sending, to one or more second client systems associated with the one or more second users, respectively, instructions for presenting an indication to a corresponding second user that the audio transmission from the user is –altered-- (a rendering policy may be defined that automatically directs certain types of information only to a requestor, i.e. the intent of the user is to speak to the assistant system without speaking to the one or more second users, or to include others, such as when a user says “tell me” versus “tell us”, i.e. in response to determining, where the virtual assistant may provide instructions to mute the user and present different audio to the other participants, such as a spoken indication that the requester is interacting with the virtual assistant, i.e. instructions for presenting an indication to a corresponding second user that the audio transmission from the user is –altered--, or establish a secondary communication channel, depending on the rendering policy and the target indicators in the utterances, which can be performed at the requester and participant device, i.e. sending, to one or more second client systems associated with the one or more second users, respectively [0027],[0043],[0056-60]. 
While Liu in view of Kumar, Voight, and Lovitt provides the an indication to the other participants that the user is interacting with a virtual assistant, Liu in view of Kumar, Voight, and Lovitt does not specifically teach presenting an indication that a user is muted, and thus does not teach
sending, to one or more second client system associated with the one or more second users, respectively, instructions for presenting an indication to a corresponding second user that the audio transmission from the user is muted.  
Hassan, however, teaches sending, to one or more second client system associated with the one or more second users, respectively, instructions for presenting an indication to a corresponding second user that the audio transmission from the user is muted (the user interface module of the client devices for the different participants, i.e. one or more second client system associated with the one or more second users respectively, can receive a signal to update the display of each meeting interface to change symbols, i.e. sending…instructions for presenting an indication to the corresponding second user, when a participant is muted, such that their voice is not transmitted to other client devices, i.e. indication to the corresponding second user that the audio transmission from the user is muted Fig. 5C,5E,[0037-8],[0043],[0059]).  
Liu, Kumar, Voight, Lovitt, and Hassan are analogous art because they are from a similar field of endeavor in managing audio data during communications between multiple users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the indication that the user is interacting with a virtual assistant teachings of Liu, as modified by Kumar, Voight, and Lovitt, with providing a visual notification to the users of which participants are muted as taught by Hassan. It would have been obvious to combine the references to enable a system that can notify a participant when they are muted while speaking and allow for automatically unmuting the user (Hassan [0038],[0059]).
Regarding claim 15, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Voight further teaches
in response to determining that the user has completed speaking to the assistant system without speaking to the one or more second users, sending, to the client system, instructions…that the audio transmission of the further subsequent speech inputs from the user to the one or more second users in the audio communication is unmuted (when the user is facing the direction of the VPA and not the call, the outbound call audio is muted and the microphone output is routed to the VPA, i.e. the user…speaking to the assistant system without speaking to the one or more second users, and when the user is looking in a direction of an active call and not the VPA, i.e. in response to determining that the user has completed speaking to the assistant system without speaking to the one or more second users, the mic output audio is not muted and routed to the call associated with that direction, i.e. sending to the client system instructions…that the audio transmission of the further subsequent speech inputs from the user to the one or more second users in the audio communication is unmuted (3:3-18),(3:56-4:3)). 
While Liu in view of Kumar, Voight, and Lovitt provides the muting and unmuting of audio during a video call, Liu in view of Kumar, Voight, and Lovitt does not specifically teach presenting an indication that a user is unmuted, and thus does not teach
sending, to the client system, instructions for presenting an indication to the user that the audio transmission of the further subsequent speech inputs from the user to the one or more second users in the audio communication is unmuted.
Hassan, however, teaches sending, to the client system, instructions for presenting an indication to the user that the audio transmission of the further subsequent speech inputs from the user to (the GUI shows the user, i.e. sending to the client system instructions for presenting an indication, when they are unmuted, such that their voice is transmitted to other client devices, i.e. indication to the user that the audio transmission of the further subsequent speech input from the user to the one or more second users in the audio communication is unmuted Fig. 5C,5E,[0037-8],[0043],[0059]).  
Liu, Kumar, Voight, Lovitt, and Hassan are analogous art because they are from a similar field of endeavor in managing audio data during communications between multiple users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the muting and unmuting of audio during a video call teachings of Liu, as modified by Kumar, Voight, and Lovitt, with providing a visual notification to the user that they are muted or unmuted as taught by Hassan. It would have been obvious to combine the references to enable a system that can notify a participant when they are muted while speaking and allow for automatically unmuting the user (Hassan [0038],[0059]).

Regarding claim 16, Liu in view of Kumar, Voight, and Lovitt teaches claim 1, and Voight further teaches
in response to determining that the user has completed speaking to the assistant system without speaking to the one or more second users, sending…instructions…that the audio transmission from the user is unmuted (when the user is facing the direction of the VPA and not the call, the outbound call audio is muted and the microphone output is routed to the VPA, i.e. the user…speaking to the assistant system without speaking to the one or more second users, and when the user is looking in a direction of an active call and not the VPA, i.e. in response to determining that the user has completed speaking to the assistant system without speaking to the one or more second users, the mic output audio is not muted and routed to the call associated with that direction, i.e. sending…instructions…that the audio transmission from the user is unmuted (3:3-18),(3:56-4:3)). 
While Liu in view of Kumar, Voight, and Lovitt provides the muting and unmuting of audio during a video call, Liu in view of Kumar, Voight, and Lovitt does not specifically teach presenting an indication that a user is unmuted, and thus does not teach
sending, to one or more second client systems associated with the one or more second users, respectively, instructions for presenting an indication to a corresponding second user that the audio transmission from the user is unmuted.  
Hassan, however, teaches sending, to one or more second client systems associated with the one or more second users, respectively, instructions for presenting an indication to a corresponding second user that the audio transmission from the user is unmuted (the user interface module of the client devices for the different participants, i.e. one or more second client system associated with the one or more second users respectively, can receive a signal to update the display of each meeting interface to change symbols, i.e. sending…instructions for presenting an indication to the corresponding second user, when a participant is muted, such that their voice is not transmitted to other client devices, or unmuted, i.e. indication to the corresponding second user that the audio transmission from the user is unmuted Fig. 5C,5E,[0037-8],[0043],[0059]).  
Liu, Kumar, Voight, Lovitt, and Hassan are analogous art because they are from a similar field of endeavor in managing audio data during communications between multiple users. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the muting and unmuting of audio during a video call teachings of Liu, as modified by Kumar, Voight, and Lovitt, with providing a visual notification to the users of which participants are muted or unmuted as taught by Hassan. It would have been obvious to combine the references to enable a system that can notify a participant when they are muted while speaking and allow for automatically unmuting the user (Hassan [0038],[0059]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICOLE A K SCHMIEDER whose telephone number is (571)270-1474. The examiner can normally be reached 8:00 - 5:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICOLE A K SCHMIEDER/Primary Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Nov 15, 2022
Application Filed
Nov 08, 2024
Non-Final Rejection — §103
Feb 13, 2025
Interview Requested
Mar 11, 2025
Applicant Interview (Telephonic)
Mar 11, 2025
Examiner Interview Summary
Mar 19, 2025
Response Filed
May 06, 2025
Final Rejection — §103
Aug 08, 2025
Request for Continued Examination
Aug 11, 2025
Response after Non-Final Action
Aug 21, 2025
Non-Final Rejection — §103
Nov 24, 2025
Examiner Interview Summary
Nov 24, 2025
Applicant Interview (Telephonic)
Nov 25, 2025
Response Filed
Feb 04, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/219,339
Patent 12572751
ELECTRONIC DEVICE AND CONTROLLING METHOD OF ELECTRONIC DEVICE
2y 5m to grant Granted Mar 10, 2026
17/626,617
Patent 12567408
MULTI-MODAL SMART AUDIO DEVICE SYSTEM ATTENTIVENESS EXPRESSION
2y 5m to grant Granted Mar 03, 2026
17/938,173
Patent 12554930
TRANSFORMER-BASED TEXT ENCODER FOR PASSAGE RETRIEVAL
2y 5m to grant Granted Feb 17, 2026
17/418,679
Patent 12542131
SYSTEM AND METHOD FOR COMMUNICATING WITH A USER WITH SPEECH PROCESSING
2y 5m to grant Granted Feb 03, 2026
17/667,487
Patent 12531071
PACKET LOSS CONCEALMENT METHOD AND APPARATUS, STORAGE MEDIUM, AND COMPUTER DEVICE
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
68%
Grant Probability
99%
With Interview (+34.0%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 167 resolved cases by this examiner. Grant probability derived from career allow rate.