Last updated: April 19, 2026
Application No. 18/478,866
SYSTEMS AND METHODS FOR AUTOMATICALLY UNMUTING CONFERENCE PARTICIPANTS

Non-Final OA §103§112
Filed
Sep 29, 2023
Examiner
MATAR, AHMAD
Art Unit
2693
Tech Center
2600 — Communications
Assignee
Ringcentral Inc.
OA Round
3 (Non-Final)
Interview Optional

— +11.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 13 resolved cases, 2023–2026
Examiner Intelligence

MATAR, AHMAD View full profile →
Grants only 38% of cases
Career Allow Rate
5 granted / 13 resolved
-23.5% vs TC avg
Moderate +12% lift
Without
With
+11.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
6 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.9%
-36.1% vs TC avg
§103
46.2%
+6.2% vs TC avg
§102
23.1%
-16.9% vs TC avg
§112
23.1%
-16.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 13 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination
A request for continued examination (RCE) under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/23/2025 has been entered.

Response to Amendment
This is in response to Applicant’s amendment which was filed on 10/23/2025 and has been entered. 
Claims 1, 12, 15 and 20 have been amended. Claims 18 and 19 have been cancelled.  Claim 22 has been added. Claims 1-17 and 20-22 are pending in this application, with claims 1, 15 and 20 being independent.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.

Claim Rejections - 35 USC § 112
Claim 22 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The claim recites the “the fourth association” (line 19).  There is insufficient antecedent basis for this limitation in the claim.  The term is previously referred to as “fourth associated” (line 12). 

Claim Rejections - 35 USC § 103
Claims 1 – 7, 11, 15 – 18 and 20 – 21  are rejected under 35 U.S.C. 103 as being unpatentable over Rose et al (US Patent 9936078 B2 (hereinafter “Rose”) in view of Roper (US 12192018 B1)

Regarding Claim 1, Rose discloses a method and a system for conference call unmuting and specifically teaches a method for automatically controlling audio in a conference, comprises: 
receiving, at a conference system (telephone system 104, Fig. 1) , at least a first audio stream and a second audio stream of the conference (the voice of the second participant in Rose – see device 112- reads on the claimed “first audio stream” and the voice of the first participant in Rose – see device 110- reads on the claimed “second audio stream” (see abstract).
In Rose, the first audio stream (which reads on the claimed “second audio stream” is muted  after the conference starts, see for example, col. 8, lines 62-64.
	Retrieving the context that is spoken and that includes the identifier in an audio of the first audio stream (second participant in Rose may speak a command associated with unmuting a participant along with the identifier of the first participant, the identifier is taught as “fourth caller”, nickname, code name or “John Doe”, see col. 8, line 65- col. 9, line 4.  
Classifying, by execution of the conference system, a context with which the identifier is mentioned in the audio of the first audio stream (see “unmute John Doe” which serves as one type of request or expression of “desire” (as taught by Rose) to unmute), and
	unmuting, by execution of the conference system, the second audio stream (“the first participant in Rose) in response to the identifier (e.g., “John Doe”) being linked to the second audio stream and further in response to the context (which may be “unmute” or other words to express the desire for the participant to unmute) from the audio of the first audio stream (voice of second participant in Rose) comprising a request that a user associated with the second audio stream (first participant) speak. Rose teaches speaking a command or words to express the desire to unmute John Doe which serve as the request to unmute, and teaches that once the communication device 110 of the first participant is unmuted, the first participant is able to speak and be heard by the other participants on the conference call, see col. 2, line 29-39
	Note that Rose teaches (col. 9, lines 5 – 15 and col. 11, lines 21-23)  that the method includes unmuting the voice communication device of the first participant (read as the claimed second audio stream) in response receiving the request to unmute from the second participant (read as the claimed first audio stream). The request to unmute from the second participant includes speaking an identifier of the first participant and  speaking a command or words  that signal the desire associated with unmuting a participant. 
	Rose does not explicitly teach unmuting based on “classifying …. the context associated with the identifier as a statement or question that is directed to a user that is identified by the identifier and that invokes a response from a user that is identified by the identifier without including a command or signal to unmute the user or the second audio stream”.
On one hand, Rose also teaches that the request to unmute includes the second participant speaking a command associated with unmuting a particular/identified participant. For example, the command may be “unmute” or some other word or group of words that signal the apparatus 200 of a desire for the first participant to unmute, see col. 9, lines 5-10.
Thus, Rose does suggest the use of “some other word or group of words” for signaling the “desire of the first participant to unmute”.  The word or group of words to signal a desire for the first participant io unmute” may obviously read on the claimed “statement or question that invoke a response from a user identified by the identifier, to unmute”.   Numerous obvious examples would be within the teachings of Rose such as “would John Doe tell us about his project” and “could the fourth caller speak to us about the new product”.   One may obviously express his/her desire to have a participant unmute and speak with issuing a “command”.  The context may obviously be spoken before and after the “identifier” which follows the natural language used by the conference participants.
It is important to note that analyzing “natural language”, one of urinary skill in the art may obviously  equate between “inviting one to unmute” and “inviting one to speak”. One would naturally unmute in order to speak.  A conferencing system which analyzes a request to unmute is also expected to analyze a request to speak. 
	On the other hand, Roper discloses a conferencing system analogous to that of Rose and teaches (col. 29, lines 25-40) the use of Machine Language (ML) technique to receive a sequence of words and determine whether an attention cue was intended for one particular participant. For example, in an embodiment where an attention cue is invoked audibly by a conference participant, if the participant says “Do you have any thoughts on Project White House,” the ML technique may semantically analyze the words and determine that the speaker is requesting the attention of (invoking a response from) participant A (one specific particular participant) for which Project White House is relevant (this is parallel to using a word or group of words in Rose to signal the desire for one particular participant to unmute and speak). That is “White House” is specifically associated with Participant A and reads on the claimed “identifier”. The context as taught by Roper maybe a mention of words around the identifier (White House).   The above examples clearly show that the attention of one particular participant is requested by asking a relevant question without issuing a command to unmute. 
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to analyze the spoken words in Rose as taught by Roper so that the one relevant participant will be unmuted when the current speaker is requesting the attention of that participant (e.g., requesting that the particular participant speaks). This offers more flexibility as the attention of the participant may be requested using different spoken words (natural language) which will be analyzed.  The use of Machine Language in a conference as suggested by Roper adds great well-known advantage.

	Claims 15 and 20 are rejected for the same reasons. 

Regarding claims 2 and 16, Rose teaches (col. 8, line 58- col. 9, line 15) the request to unmute may include the second participant speaking the name or other identifier of the first participant. For example, where the first participant is a fourth caller, the identifier may be “fourth caller” and the second participant may say “fourth caller.” The identifier may be nickname, a code name, one or more assigned words, and the like.  See also the statement or question in the combination of Rose and Roper maybe a word or group of word that signal the desire for the first participant to unmute. 
Rose does not explicitly teach that the retrieved context (e.g., unmute or other group of words) comes after the identifier (e.g., John Doe”).  Since, Rose teaches the use of “code name, one or more assigned words or the like”, it would have been an obvious variation to be able to use statement or a question  “John Doe may now discuss with the new plan for project Y” or “Can John Doe inform us about the upcoming changes in management?” 

Regarding claims 3 and 17, in Rose, participants may be identified by a name, nickname (different identifiers for the plurality of participants 114, Fig. 1).  One participant such as the first participant in Rose may obviously be identified by a name, a number, nickname and/or code name such as ‘fourth caller”, “John Doe” or  “fourth caller, Mr. John Doe”.   
The above obvious variations discussed in claims 2, 3, 16, 17 offer more flexibility and provide options (e.g., use the name and/or nickname for the same conference participant such as Jim, Jim Smith,  Jimmy).  

	Regarding claim 4, Rose teaches determining that the identifier corresponds to a name or value (code or number) that is associated with the second audio stream (e.g., “John Doe”.)

Regarding Claims 5, 6, and 18, the claimed “detecting one or more questioning words, instructions, or directions …… “ read on detecting the “words which signal the desire” for the participant to speak” ) as taught by Rose and Roper (as discussed above)  

Regarding claim 7,  as discussed above, when a participant joins the conference, such as the third participant, the participant is muted.  The claimed limitation simply reads on the second participant speaking a participant’s name or nickname such as “third caller” or “John Smith” in a conference conversation without using words which signal the desire to have the participant “unmute”.  In this case, the third caller will remain muted because the second participant simply mentions the “third participant” without requesting to unmute (without a request to speak). 
Roper teaches (col. 29, lines 25 – 40) that the ML technique may receive the sequence of words and determine whether an attention cue was intended. For example, in an embodiment where an attention cue is invoked audibly by a participant, if the participant says “Do you have any thoughts on Project White House,” the ML technique may semantically analyze the words and determine that the speaker is requesting the attention of the participant A for which Project White House is relevant. Alternatively, if the speaker says “I liked the movie with the aliens and the White House” the ML technique may determine that no attention cue was intended and refrain from alerting the video conference provider 310 of the attention cue.  That is “White House” is associated with Participant A and reads on the claimed “identifier associated with the third audio”.  The context as taught by Roper maybe a mention of the identifier (White House) – as in “I liked the movie with the aliens and the White House”  without the need for Participant A (third participant) to speak. 
Thus, it would have been obvious to use the teachings of Roper in Rose so that a casual mention of John Doe will not result in unmuting John Doe to avoid unintentional unmuting which may result of background noise, etc.  

Regarding Claim 11, Rose teaches the use of name, nickname, code name, a number ….etc.  obviously (if not inherently), one participant (e.g., associated with the second audio stream) may be identified by more than one of the above identifiers such as name (“John Doe”, number, code name ….).  Obviously,  a participant may be identified by first name, last name (or both), nickname and/or title.

Regarding claim 21, Roper teaches (col. 28, lines 45-60) that the keywords in the keyword database 822 may be gathered from participant A or from the client device 340a. Participant A may be prompted to input personally identifiable keywords that the attention assistance module 805 should monitor the virtual meeting for (e.g., nickname), this is read as the assistance module “receiving the identifier in textual format in response to the user joining the conference). The attention assistance module 805 may gather keywords from folders, calendars, emails, or other applications running on the client device 340a (another way of receiving the identifier in a textual format). Roper also teaches (col. 29, lines 10-17) that to identify a context, the attention assistance module 805 may employ a trained machine learning (“ML”) technique to semantically analyze the speech or transcript associated with the identified keyword 815 to determine additional related keywords and/or descriptors (this is read as performing text-to-speech conversion on the keywords). To perform the analysis, the trained ML technique may be provided the keywords. In Col. 29, lines 25-32,  the ML technique may receive the sequence of words and determine whether an attention cue was intended. For example, in an embodiment where an attention cue is invoked audibly by a participant, if the participant says “Do you have any thoughts on Project White House,” the ML technique may semantically analyze the words and determine that the speaker is requesting the attention of the participant A for which Project White House is relevant.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Rose et al in view of Roper as applied to claim 1 above, and further in view Jorasch et al (US 20210399911 A1). 
Regarding Claim 8, as discussed above, the combination of Rose and Roper teaches that the second audio stream (“first participant” in Rose) is unmuted and the participant begins to speak.
The combination of Rose and Roper does not teach muting the second audio stream when the second audio stream ends. 
Jorasch et al teaches (PP 1780 ) that if the central controller took the participant off mute (unmute) , once they stop speaking ………. the central controller could put the user back on mute.
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of this application to incorporate the teachings of Jorasch et al in the method of Rose so that the unmuted second audio stream (which corresponds to the first participant of Rose) would be muted again when the stream ends (participant stops speaking).  Muting a participant after he/she stops speaking has obvious advantages such as preventing background noise and/or accidental audio from that participant.  One participant at time speaks. 

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Rose et al in view of Roper as applied to claim 1 above, and further in view O'Connell (US 20230125307 A1).
 Regarding Claims 9, as discussed above, Rose teaches that the second audio stream (“first participant” in Rose) is unmuted/enabled by the command of the participant associated with the first audio stream (“second participant” in Rose) -without user input-
Rose does not teach that the second audio stream is enabled when the speaker in first audio stream (“the second participant in Rose) stops speaking.  That is, it does not teach delaying the unmuting of the second audio stream until a speaker speaking in the first audio stream finishes  speaking. 
O'Connell (PP 0076)  teaches that a linguistic model may capture how a user speaks, i.e., their intonation when ending in a question and the user's mute status. For example, linguistically, some people may frequently ask follow-up questions after an initial interrogatory sentence. The disclosed linguistic module may track a user's history and how the pattern compares to this type of speech pattern. In an example a user may say, ““So where are we with the Q1 results? . . . Do we have those numbers?” or “what time are you planning on sending the RFP?  I want to do some review of it before the deadline””. In PP 0077, it teaches providing an indicator to a user [such as the claimed participant associated with the second audio stream) to speak if it is likely that all speaking or participating parties are [such as the claimed participant associated with the first audio stream]  finished speaking. 
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of this application to include the teachings of O’Connell in the method of Rose and Roper so that it will unmute/”enable” audio from the first participant in Rose (read as the claimed second audio stream) in response to determining that the second participant in Rose (read as the claimed first participant) finishes speaking.
This provides a more organized conference wherein only one participant speaks at time. 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Rose et al in view of Roper and O'Connell, as applied to Claim 9 above and further in view of Jorasch et al.
Regarding claim 10, the combination of Rose, Roper and O’Connell as discussed above with respect to claim 9 does not teach muting the claimed first audio stream without user input when the speaker in the first audio stream stops speaking. 
Jorasch et al teaches (PP 1780 ) that if the central controller took the participant off mute (unmute) , once they stop speaking ………. the central controller could put the user back on mute.
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of this application to incorporate the teachings of Jorasch et al in the method disclosed by the combination of Rose, Roper and O’Connell so that the unmuted first audio stream (which corresponds to the second participant of Rose) would be muted when the stream ends (participant stops speaking).  Muting a participant after he/she stops speaking has obvious advantages such as preventing background noise and/or accidental audio from that participant.  Only one participant at a time speaks. 

Claims 12 and 13 are ejected under 35 U.S.C. 103 as being unpatentable over Rose et al in view of Roper as applied to claim 1 above, and further in view of Singh (US 20230129467 A1).
Regarding Claim 12,  the combination of Rose and Roper, as discussed above does not teach the scenario wherein the identifier is mentioned in an audio of a third audio stream, and associating the identifier with the second audio stream in response to the second audio stream becoming active after the identifier is mentioned in the third audio stream.
However, the above scenario is taught by Singh which teaches (PP 0128]  an example wherein an instant sentence being processed may be “Alex, can you talk about the project?” where the sentence is associated with “speaker 1” (read as the claimed audio of a third audio stream) , and the NLP engine 630 may determine the intent data 632 to be “question intent” and the entity data 634 to be {entity name: “Alex”; entity type: person}. According to the routine 1000, the name identification engine 640 may select the next sentence associated with “speaker 2” (read as second audio stream) , which may be “Yes, let me give a summary of the project”, and may determine the speaker’s name for “speaker 2” to be “Alex” for the foregoing example sentence.   Responding by stating “Yes, let me give a summary of the project” reads on the claimed limitation of “the second audio stream becomes active”).
See also “self-introduction” in PP 43. 
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the scenario taught by Singh in the method of Rose and Roper so that an identifier of a newly introduced participant such as Alex in the example may be automatically associated with the participant.  Alex responded and Alex’s name will be associated with that speaker/second audio stream. 

Regarding Claim 13, the combination of Rose and Roper, as discussed above, does not teach detecting the identifier in an “introduction” in the second audio stream and associating the identifier with the second audio stream.
Singh discloses a method to analyze audio data to identify different speakers.  It teaches (PP 43) that in an example of a conversation, the first portion of the data may be the first words including “Hello my name is Alex”. The speaker’s name identification system 100 may determine the speaker's intent to be a self-introduction intent, and the speaker’s name identification system 100 may determine that the name of the first speaker who spoke the first words 162 is “Alex.”  The name “Alex” is read as the claimed “identifier” in the introduction.  In PP 135, Singh teaches that other types of “identifiers” such as “Lead Engineer” may be used to identify the speaker.
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to use the teachings of Singh (i.e., to detect an identifier such as “Alex” and associate it with the speaker) in the method taught by Rose so that the claimed participant associated with the second audio stream (“first participant” in Rose) may provide a self-instruction with an identifier such as name (John Doe in Rose, or Alex in Singh).  This is beneficial when a new participant joins or when a participant has a new title (e.g., Lead Engineer) or works on a new project (Boston project in Singh). 

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Rose et al in view of Roper as applied to claim 1 above, and further in view of AI et al (US 20150154960 A1) or alternatively in view of Beccay (20130143539 A1). 
The claim recites detecting that the first audio stream is associated with a conference host; and
muting the second audio stream and other audio streams of the conference other than the first audio stream in response to determining that the conference host starts the conference.
Rose teaches muting the voice a participant of voice device  110 (read as the claimed “second audio stream”) and other participants 114 Fig. 1 prior to receiving the request to unmute, see abstract and PP09, but it does not teach that the muting of the participants is in response to detecting that conference host starts the conference.  
However, AI et al teaches (PP 0016 ) that the host may have special speaking privileges such that the host is the only one who may talk at a certain time while all other user attendees of the communication session are muted. The host may also select one or more particular individuals within the communication session to talk while all others are muted (teaches that the host also has the ability to unmute).  AI et al also teaches ([0024] At step S302), the system server 100 identifies one or more meeting attendees who are currently speaking in the communication session. 
Alternatively, Beccay et al  teaches that the moderator can have additional controls that are not available to other participants of the conference. For example, muting one or more participants, speaking priority (i.e. when moderator speaks all participants are muted), PP 0027
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the teachings of AI et al or Beccay in order mute all conference participants when the host (e.g., the second participant in Rose) starts the conference.  This is beneficial to avoid background noise from other participants while the host is talking.  This is the well-known broadcast type of conferencing (host speaks, others are muted). 

Claims 1, 15, 20 and 21 are ejected under 35 U.S.C. 103 as being unpatentable over Rose et al, as discussed above, in view of Singh (US 20230129467 A1).

For Claims 1, 15 and 20, Rose has been discussed above in detail.  Rose does not specifically teach the use of a statement or question that is not a command, direction or instruction to unmute the second audio stream.  
On one hand, Rose also teaches that the request to unmute includes the second participant speaking a command associated with unmuting a particular/identified participant. For example, the command may be “unmute” or some other word or group of words that signal the apparatus 200 of a desire for the first participant to unmute, see col. 9, lines 5-10.
Thus, Rose does suggest the use of “some other word or group of words” for signaling the “desire of the first participant to unmute”.  The word or group of words to signal a desire for the first participant io unmute” may obviously read on the claimed “statement or question that invoke a response from a user identified by the identifier, to unmute” without using an unmute command.   Numerous examples would be within the teachings of Rose such as “would John Doe tell us about his project” and “could the fourth caller speak to us about the new product”.   The context may obviously be spoken before and after the “identifier” which follows the natural language used by the conference participants.

On the other hand, Singh teaches (PP 0128]  an example wherein an instant sentence being processed may be a question such as “Alex, can you talk about the project?” where the statement is associated with “speaker 1” (read as the claimed audio of the first audio stream) , and the NLP engine 630 may determine the intent data 632 to be “question intent” and the entity data 634 to be {entity name: “Alex”; entity type: person}.  Further, “Hi Joe. How are you?” where the sentence is associated with “speaker 1”, and the NLP engine 630 may determine the intent data 632 to be “intent to engage” and the entity data 634 to be {entity name: “Joe”; entity type: person}

Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the scenario taught by Singh in the method of Rose so that the second audio stream would be unmuted in response to a question or s statement mentioned in the first audio stream.  This offers additional convenient scenario to fully benefit from analyzing the words provided by in the audio of the first audio stream to unmute the desired participant when a question is directed to that participant. 

For claim 21, Singh discloses a method to analyze audio data to identify different speakers.  It teaches (PP 43) that in an example of a conversation, the first portion of the data may be the first words including “Hello my name is Alex”. The speaker’s name identification system 100 may determine the speaker's intent to be a self-introduction intent, and the speaker’s name identification system 100 may determine that the name of the first speaker who spoke the first words 162 is “Alex.”  The name “Alex” is read as the claimed “identifier”.  Based on the self-introduction intent, the speaker name identification system 100 may determine that the name of the first speaker 150 who spoke the first words 162 is “Alex.” Based on this determination, the speaker name identification system 100 may output an indication of the first speaker name, for example, a text indication 166 representing “First Speaker=Alex” and may associate the text indication 166 with the first words 162 spoken by the first speaker.   [0100] In some implementations, the speaker name identification system 100 may include the diarization engine 620, which may be configured to transcribe an audio recording and generally identify different speakers. The diarization engine 620 may use one or more speech-to-text techniques and/or speech recognition techniques to transcribe the audio recording. The speech-to-text techniques may involve using one or more of machine learning models (e.g., acoustic models, language models, neural network models, etc.).  
That is, in Singh, the speech “Hello my name is Alex” is converted to the text “First Speaker = Alex” using speech-to-text techniques.
It would have been obvious to utilize the speech-to-text (or text-to-speech) conversion in the system of Rose which provides additional convenient options for the participants to introduce themselves via spoken words or text. 

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Rose in view of Roper, as applied to claim 1 above, and further in view of Hewinson (US 8515025 B1).  
Claim 22 recites what is well-known in the art which is having more than one conference participant utilize and be associated with one terminal (one stream). 
The combination of Rose and Roper does not teach the feature of sharing one terminal (on stream) by two conference participants.  
Hewinson teaches the use of “conference call voice-to-name matching” and teaches the use of voice recognition module 22 (Fig. 1) to recognize certain spoken keywords such as the names of the participants 12a – 12e from utterances of the participants such as “My name is John” (see col. 5, lines 7-15).  Each participant has a particular identifier such as first name, surname, codename, etc. (see abstract, and col, col. 5, lines 50-65). 
Fig. 1 shows participant 12a (John) using terminal 16a, participant 12b (Jane) using terminal 16b, two participants 12c (Tom) and 12d (Mike) using/sharing terminal 16c (stated differently, the telephone 24c can be a speaker phone, and both participants 12c and 12d can communicate within the teleconference via the same telephone 24c) – see col. 5, lines 50-65
Hewinson teaches that system 10 can allow teleconferencing between any number of participants 12a-12e, between any number of client terminals 16a-16d, and between any number of telephones 24a-24d.  It further teaches (PP 18) that the voice recognition module 22 can detect the names or identifiers of the participants 12a-12e from the context, syntax, etc. of the utterances during the teleconference. For instance, the voice recognition module 22 can be operable to specifically detect the phrase "My name is --------" or another predetermined introductory phrase when uttered during the teleconference, and by recognizing the introductory phrase. See Col. 6, lines 7-24

Hewinson teaches receiving identifying information (such as first name, a surname, a codename, etc., “my name is John) for each of the first audio stream (from terminal 16C, Fig. 1) and the second audio stream (from terminal 16a, Fig. 1)
creating a first association between the identifying information for the first audio stream and the first audio stream (participant 12a (John) will be associated with terminal 16a) and a second association between the identifying information for the second audio stream and the second audio stream (participant 12c (Tom) will be associated with terminal 16c.)
monitoring the audio from the first audio stream and the audio from the second audio stream during an introductory phase of the conference that precedes a start of the conference; (see “My name is John ………) See Col. 6, lines 7-24
 creating a third association between the first audio stream and a first user identifier that is mentioned during the introductory phase as an introduction by a speaker of the first audio stream or as a reply by the speaker of the first audio stream to a greeting in a different audio stream that precedes the reply by the speaker of the first audio stream (this reads on the scenario when another participant such as 12a (Pauline) joins participant 12a (John) is his office – Both John and Pauline would share terminal 16a and a speaker phone which is within the teachings of Hewinson.  Pauline using terminal 16a would read on the claimed “first user”.  
a fourth association between the second audio stream and a second user identifier that is mentioned during the introductory phase as an introduction by the speaker of the second audio stream or as a reply by the speaker of the second audio stream to a greeting in a different audio stream that precedes the reply by the speaker of the second audio stream (The claimed “second user” would read on participant 12d (Mike) who is sharing terminal 16c.)

It would have been obvious to one of ordinary skill in the art before the effective filing date of the current invention to utilize the feature taught by Hewinson in the combination of Rose and Roper which allows for more versatility in the conference setting which includes the well-known feature of having two conference participants utilizing/sharing one terminal and being associated with one stream.  
In the combination, detecting the identifier that is mentioned in the audio of the first audio stream would result in comparing the identifier against the second association and the fourth association (two participants sharing one terminal) and determining that the identifier is associated to the second audio stream.  That is, in the combination the stream associated with the two participants would be unmuted if either participants is mentioned.  Again, allowing sharing allows for added versatility and allows of a natural office setting (shared office) to be included. 

Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Rose in view of Singh, as applied to claim 1 above, and further in view of Hewinson (US 8515025 B1).  
	Similar to the rejection above,  The combination of Rose and Singh does not teach the feature of sharing one terminal (on stream) by two conference participants.  
Hewinson teaches the use of “conference call voice-to-name matching” and teaches the use of voice recognition module 22 (Fig. 1) to recognize certain spoken keywords such as the names of the participants 12a – 12e from utterances of the participants such as “My name is John” (see col. 5, lines 7-15).  Each participant has a particular identifier such as first name, surname, codename, etc. (see abstract, and col, col. 5, lines 50-65).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current invention to utilize the feature taught by Hewinson in the combination of Rose and Singh which allows for more versatility in the conference setting which includes the well-known feature of having two conference participants utilizing/sharing one terminal and being associated with one stream.  

Response to Arguments
Applicant's arguments filed 10/23/2025 have been fully considered but they are not persuasive.
Applicant argues (page 16 of the remarks) that Rose uses unmute command such as "unmute John Doe".  Examiner agrees that Rose teaches the use of "unmute command" (as one option) .  
Applicant also argues that Rose does not disclose or suggest performing the automatic unmuting of an audio stream by classifying context around a user identifier as statement or question that invokes a response from a user that is identified by the identifier".  Examiner respectfully disagrees.  This limitation, as discussed in the above rejection, reads on the teaching in Rose of using "words to signal the desire" of the speaking conference participant to unmute the identified participant.   Examples of possible obvious questions or statements within the teachings of Rose may be as “would John Doe tell us about his project?” and “could the fourth caller speak to us about the new product?”.   Words to "signal the desire” for the identified participant to unmute and speak do not necessarily have to have the word "unmute".  One may obviously “signal the desire" by making a statement or asking a question, and one of ordinary skill in the art may incorporate numerous obvious words or expressions to signal the desire to have a conference participant respond/speak.  
Obviously and evidently, Rose did not intend to limit and restricts the invention to the specific utterance of “unmute” command. 
 
Applicant argues (page 15) that in Roper, an attention cue is given because a topic of interest or content that is relevant to a user is mentioned (e.g., "Project White House"") and argues that none of the examples [in Roper] involve a user identifier and classifying the context associated with the user.  Examiner respectfully disagrees.  In Roper, "Project White House" is intended for one particular participant (Participant A) and the context "do you have any thought" classifies the context as a specific attention cue invoking a response specifically from Participant A (and not from any participant as argued by Applicant).  

Applicant argues (page 15)  that "How is the weather in Denver?" may be directed to any conference participant and does not specify any particular user identifier that the context is directed to.   Examiner respectfully disagrees.  The question "How is the weather in Denver" is not directed to "any conference participant" because, according to Roper, "the ML can semantically analyze and determine that the speaker is intending an attention cue (to get the attention of a particular participant).  The word “Denver” is an identifier (location) for one particular conference participant.  Roper teaches the following:
An attention cue may include personally identifiable information of a multi-meeting participant. For example, an attention cue may include the participant's name, a topic of interest to the multi-meeting participant, a project or topic that the participant is involved in, a location corresponding to the participant (e.g., office location or location of the multi-meeting participant), or a statement otherwise involving the multi-meeting participant, Col. 21, line 57 - col. 22, line 2.
The keywords may include words relating to the identity of participant A or personally identifiable information about participant A, such as the first and last name of participant A. In some cases, the keywords may include an office or location of participant A. Other keywords may include the names of projects, areas, or topics that participant A is involved in, Col. 28, line 36 - 43
The attention cue comprises personally identifiable information corresponding to a participant associated with the first client device and comprises at least one of: a name of the participant; a project involving the participant; a company corresponding to the participant; or a location corresponding to the participant, Col. 37, lines 21 – 28. 

Again, Applicant’s argues (on page 15) that “How is the weather in Denver” [In Roper] may be directed to any participant, and “Do you have any thoughts on Project White House” may also be intended for any participant is contrary to the explicit teachings of Roper. This is highly unusual interpretation since one would normally not expect a participant who lives in Boston to be asked about the weather in Denver, and one who does not work on or maybe totally unfamiliar with “Project White House” to answer.  Further, this interpterion goes against the explicit teaching of Roper which clearly states:
 “ (138) The ML technique may receive the sequence of words and determine whether an attention cue was intended. For example, in an embodiment where an attention cue is invoked audibly by a participant, if the participant says “Do you have any thoughts on Project White House,” the ML technique may semantically analyze the words and determine that the speaker is requesting the attention of the participant A [one particular participant] for which Project White House is relevant. Similarly, if the speaker says “How is the weather in Denver?,” the ML technique can semantically analyze and determine that the speaker is intending an attention cue. Alternatively, if the speaker says “I liked the movie with the aliens and the White House” the ML technique may determine that no attention cue was intended and refrain from alerting the video conference provider 310 of the attention cue.

(21) When the attention assistant identifies an attention cue for a multi-meeting participant [one single participant,  the attention assistant may alert the multi-meeting participant. For example, when an attention cue is identified, the multi-meeting participant may be prompted by a visual or audible alert. Upon the alert, the multi-meeting participant  [one single participant] can switch his or her attention to the relevant meeting, as desired. In some embodiments, the attention assistant may automatically switch between meeting instances to direct the multi-meeting participant's attention to the relevant content.

82) An attention cue for the participant A may include content that is relevant to the participant A, such as for example, the participant A's name [this is exactly similar to the example provided by applicant’s spec “Jim, do you have the updated data”], company name, job title, projects that the participant A is involved in, or other topics that relate to the participant A. If the video conference provider 310 determines an attention cue is being exchanged in the first meeting 350, the video conference provider 310 may alert the participant A of the attention cue. 

Furthermore, it would be chaotic if any/all participant(s) would respond to “Do you have any thoughts on Project White House,” or “How is the weather in Denver?  That is not what is intended by teachings of Roper. 

In view of the above teachings, it is abundantly clear that, based on the explicit teachings of Roper, the question “How is the weather in Denver” is not directed to any participant as argued by applicant.  On the contrary, it is meant as an attention cue for a particular conference participant whose location is Denver.  Again, it is very unlikely that a question about the weather in Denver would normally be directed to “any participant” (e.g., participants from New York or California). 

Applicant argues that Roper [the secondary reference] does not teach unmuting.  The unmuting feature is taught by the primary reference, Rose.  The test for obviousness is not whether the features of a secondary reference (Roper) may be bodily incorporated into the structure of the primary reference (Rose); nor is it that the claimed invention must be expressly suggested in any one or all of the references. Rather, the test is what the combined teachings of the Rose and Roper references would have suggested to those of ordinary skill in the art.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981).




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHMAD F. MATAR whose telephone number is (571)272-7488. The examiner can normally be reached M-F 9 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/AHMAD F. MATAR/               Supervisory Patent Examiner, Art Unit 2693
Read full office action
Prosecution Timeline

Sep 29, 2023
Application Filed
Jun 01, 2025
Non-Final Rejection — §103, §112
Jul 22, 2025
Interview Requested
Jul 31, 2025
Interview Requested
Aug 19, 2025
Applicant Interview (Telephonic)
Aug 19, 2025
Examiner Interview Summary
Aug 25, 2025
Response Filed
Sep 12, 2025
Final Rejection — §103, §112
Oct 23, 2025
Request for Continued Examination
Nov 03, 2025
Response after Non-Final Action
Jan 20, 2026
Non-Final Rejection — §103, §112
Feb 04, 2026
Interview Requested
Feb 18, 2026
Applicant Interview (Telephonic)
Feb 18, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/281,130
Patent 12574458
METHOD FOR TRANSMITTING CALL AUDIO DATA AND APPARATUS
2y 5m to grant Granted Mar 10, 2026
18/380,314
Patent 12563143
Pre-Authentication for Interactive Voice Response System
2y 5m to grant Granted Feb 24, 2026
18/525,494
Patent 12549669
System and method to evaluate microservices integrated in Interactive Voice Response (IVR) operations
2y 5m to grant Granted Feb 10, 2026
18/146,616
Patent 12462816
AUDIO ENCODING METHOD AND CODING DEVICE
2y 5m to grant Granted Nov 04, 2025
13/468,030
Patent 9137370
Call center input/output agent utilization arbitration system
2y 5m to grant Granted Sep 15, 2015
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
38%
Grant Probability
50%
With Interview (+11.5%)
2y 7m
Median Time to Grant
High
PTA Risk
Based on 13 resolved cases by this examiner. Grant probability derived from career allow rate.
SYSTEMS AND METHODS FOR AUTOMATICALLY UNMUTING CONFERENCE PARTICIPANTS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email