Last updated: April 19, 2026
Application No. 18/757,952
METHODS AND SYSTEMS FOR AUTOMATIC QUEUING IN CONFERENCE CALLS

Non-Final OA §103§DP
Filed
Jun 28, 2024
Examiner
NGUYEN, PHUNG HOANG JOSEPH
Art Unit
2691
Tech Center
2600 — Communications
Assignee
Capital One Services LLC
OA Round
1 (Non-Final)
Interview Optional

— +32.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 877 resolved cases, 2023–2026
Examiner Intelligence

NGUYEN, PHUNG HOANG JOSEPH View full profile →
Grants 79% — above average
Career Allow Rate
694 granted / 877 resolved
+17.1% vs TC avg
Strong +32% interview lift
Without
With
+32.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
32 currently pending
Career history
909
Total Applications
across all art units
Statute-Specific Performance

§101
5.6%
-34.4% vs TC avg
§103
56.8%
+16.8% vs TC avg
§102
15.2%
-24.8% vs TC avg
§112
8.2%
-31.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 877 resolved cases
Office Action

§103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-20 of the instant application 18/757,952 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent 12,052,391. Although the claims at issue are not identical, they are not patentably distinct from each other as seen by a representative sample of comparative analysis between the instant claim  and the issued claim 1.

The instant application: 18/757,952
U.S. Patent 12,052,391
1. (Currently Amended) A computer-implemented method, the computer- implemented method comprising: receiving call data associated with a conference call;  analyzing the call data to identify a first participant, a second participant, and a call topic;  including whether the first participant and the second participant are speaking at the same time;  muting the second participant based on the tracking and the call data; displaying an element on a graphical user interface (GUI) associated with the conference call to indicate the first participant and the second participant were speaking at the same time and that the second participant is muted; and unmuting the second participant based on one or more topic modeling neural network models determining that the call topic has shifted.
1. A computer-implemented method, the computer-implemented method comprising: receiving call data associated with a conference call; analyzing the call data to identify a plurality of participants on the conference call and a call topic; determining whether two or more participants of the plurality of participants are speaking at a same time; based upon a determination that the two or more participants are speaking at the same time, tracking the two or more participants to identify a first participant that continues speaking and a second participant that voluntarily yields to the first participant; determining that the yielding second participant is more relevant to the call topic than the first participant; muting the first participant based on determining that the yielding second participant is more relevant to the call topic; displaying a queuing element on a graphical user interface (GUI) associated with the conference call to indicate the two or more participants are speaking at the same time and that the first participant is in a queue to speak after the second participant has finished; determining when the second participant has stopped speaking by: extracting an audio portion of the call data; processing the audio portion to determine text by a speech-to-text function; processing the text to form text feature vectors; and determining the second participant shifted from the first call topic to a second call topic based on one or more topic modeling neural network models; muting audio input of devices of all participants other than the first participant; and displaying an indicator that it is the first participant's turn to speak.

From the above evidence, it is clear that the claim limitations of the instant application are covered by the limitations of claimed invention of the issued patent. It would have been obvious to the ordinary artisan before the effective filing date to broaden the claim in order to enjoy greater coverage and protection.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 6-14 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Gorti et al (US 2013/0176910) in view of Abuelsaad et al (US 2019/0324712), Billigmeier et al (US 2021/0065203) and  Kotri et al (US 2019/0066663).

Claim 1. Gorti teaches a computer-implemented method, the computer- implemented method comprising: 
receiving call data associated with a conference call, (Gorti: incoming audio signal,  [0032]);
analyzing the call data to identify a first participant, a second participant and a call topic (Gorti teaches identify speakers by voice/speaker recognition, PIN, user name and/or phone number, [0032] to discuss relevant question/statement/keyword, [0030]); 

tracking the conference call including whether the first participant and the second participant are speaking at the same time; (Gorti: Fig. 3A 314, [0041])
muting the second participant based on the tracking and the call data; (Gorti: To determine whether one participant should be blocked from interrupting another participant, the interrupt handler 206 obtains and/or references identifying information stored by the participant recorder 202, which, as described above, has access to the preference settings 216 and, thus, the priority rankings of the participants, [0029]);  
displaying an element on a graphical user interface (GUI) associated with the conference call to indicate the first participant and the second participant were speaking at the same time and that the second participant is muted; (Gorti: Fig. 4 shows the activities during a conference displaying 402 of all participants with ranking, displaying 420 of current speaker based on the relevancy, displaying 410 of  blocked participants who attempted to interrupt and indicator 404 and a warning symbol 404 is displayed (e.g., as a flashing image) in a section 406 dedicated to informing the participant of a blocked attempt to interrupt another participant) and 
Gori teaches, “unmuting the second participant”, [0043, 0053], Gorti does not teach, “based on one or more topic modeling neural network models determining that the call topic has shifted”.
Abuelsaad teaches “Since the context “music” is not contextually relevant to the current topic of discussion “venues,” teleconference management program 101 will determine that John's set of utterances are not relevant to the current discussion, [0033]… program 101 dynamically mutes an audio signal corresponding to a non-relevant set of utterances, [0037]. 
Gorti and Abuelsaad do not utilize the neural network model.  
Billigmeier teaches, “configured to identify potential new topical key words, such a machine-learning model may be, for example, a clustering model trained via supervised or unsupervised learning utilizing a plurality of historical customer-service interaction transcripts as training data to flag particular words/phrases within one or more transcripts as potential topical keywords and/or to map those potential topical keywords to particular topics reflected within the topics data store 211 and/or to suggest potential new topics for storage in the topics data store 211, [0062].  
Similarly Kotri uses a neural network leaning model, i.e., RNN, to identify/generate topic label based on the words (written or spoken) contained in the corresponding thematic segment, [0014, 0019, 0026]”.  Examiners notices further that Kotri, in view of the determination of the call topic has shifted as required by the claim, also teaches that “In operation 640, the topic module 250 determines that a subset of the plurality of sentences relate to a first topic based on an output of the RNN module 240. For example, an output of the RNN corresponding to a sentence of the long content may exceed a predetermined threshold, indicating that the sentence and those preceding it relate to a first topic and the following sentence relates to a different topic, [0041-0043].

Claim 11. (Currently Amended) A system, the system comprising: a memory storing instructions; and a processor executing the instructions to perform a process including: receiving call data associated with a conference call; analyzing the call data to identify a plurality of participants on the conference call and a call topic; tracking the conference call including whether a first participant and a second participant are speaking at a same time; based on the first participant and second participant speaking at the same time, continue tracking the conference call including whether the second participant has stopped speaking on the call topic by:  processing an extracted audio portion to determine text by a speech-to-text function;  processing the text to form text feature vectors; and processing the text feature vectors though one or more topic modeling neural network models to determine the second participant is no longer speaking on the call topic;  muting audio device input of all participants other than the first participant; and displaying an indicator on a graphical user interface (GUI) that the first participant may speak.  
Claim 20. (Currently Amended) A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform a method, the method comprising: receiving call data associated with a conference call;  analyzing the call data to identify a plurality of participants on the conference call and a call topic; tracking the conference call including whether a first participant and a second participant are speaking at a same time; based on the first participant and second participant  speaking  at the same time, continue tracking the conference call including whether the second participant has stopped speaking by:  processing an extracted audio portion to determine text by a speech-to-text function;  processing the text to form text feature vectors; and  processing the text feature vectors though one or more topic modeling neural network models to determine the second participant is no longer speaking on the call topic;  muting audio device input of all participants other than the first participant; and displaying an indicator that the first participant may speak.

Both independent claims 11 and 20 are rejected in the same manner as claim 1 with further clarification on the “processing an extracted audio portion to determine text by a speech-to-text function;  processing the text to form text feature vectors; and  processing the text feature vectors though one or more topic”.  Abuelsaad utilizes the Speech-To-Test (STT) software where each set of utterances is transcribed into text through STT software and keywords are identified within the transcribed text document. In some embodiments, teleconference management program 101 determines a context through the use of speech analytics software (i.e., audio mining software) to spot keywords and phrases from a set of utterances. Here, phoneme sequences corresponding to the set of utterances are matched with phoneme sequences of words, [0030].  Similarly Billigmeier teaches, “ the automatic transcriptions models may comprise standardized speech-to-text models, machine-learning based models (e.g., trained via supervised learning training algorithms) and/or the like. The generated transcription may identify speech from the customer-service representative (e.g., which may be identified based on metadata associated with audio generated from a microphone associated with the customer-service representative; identified based on voice-recognition models, and/or the like) and speech from the customer, so as to distinguish between speakers within the transcript, [0061]”. 

Though Abuelsaad and Billigmeier do not use the term “vector”, the detailed description of the STT role and functionality suggests/implies vector.  Furthermore, Kotri teaches, “The vector module 230 converts sentences from the long content into vectors suitable for use as input to the RNN module 240. For example, each word in a sentence may be converted to a vector using pre-trained word vectors generated by GloVe: Global Vectors for Word Representation, Pennington et al. (2014) and the sentence vector may be created by averaging the word vectors for the sentence, [0026-0027]”.  

Therefore, it would have been obvious to the ordinary artisan before the effective filing date to modify the teaching of Gordi to include the teaching of Abuelsaad allowing the muted participant to speak when the new topic arrives with greater relevance and also to include the teaching of Billigmeier or Kotri utilizing one of the best training models in the neural network to identify the keyword/topic for discussion. 

Claim 2, wherein analyzing the call data to identify a first participant, a second participant, and a call topic further includes:  extracting an audio portion of the call data;  processing the audio portion to form a feature vector; processing the feature vector through the one or more topic modeling neural network models to map utterances to one or more entities; and mapping the one or more entities to the first participant or the second participant.  (See the independent claims or at least Kotri: database of vectors corresponding to words for use in generating inputs to the RNN module 240 by the vector module 230, a set of weights for use in the LSTM cells of the RNN module 240, threshold values for use by the topic module 250 in evaluating the output of the RNN module 240, files containing long content, files containing thematic segments generated from the long content, or any suitable combination thereof may be stored by the storage module 270, [0030]).

Claims 3 and 13. wherein analyzing the call data to identify a first participant, a second participant, and a call topic further includes:  identifying one or more devices connected to the conference call;  extracting an audio portion of the call data;  processing the audio portion to associate utterances to the one or more devices; and mapping the one or more devices to the first participant or the second participant.  (See the independent claims or at least Gorti: Communication sessions may also be used to send and/or exchange any number and/or type(s) of participant information (e.g., a conference call participant identification number, a password, a personal identification number (PIN), a user name, and/or a phone number)… of VoIP-enabled devices 108, 110, 112 and 114, [0014];  Abuelsaad:  In accordance with FIG. 2, the following exemplary scenario will be used. In this exemplary scenario, a teleconference is currently being conducted to discuss planning for an automotive tradeshow. Currently, four conference lines (Line A, Line B, Line C, and Line D) (independently shown in FIG. 1) have joined the teleconference. Line A is a mobile phone with a single participant currently sitting in a waiting area of an airport. Line B is a cordless phone with a single participant sitting in an office. Line C is a speaker phone with a group of participants located in a conference room. Line D is a mobile tablet with a single participant located at home, [0021]).

Claim 4, further comprising: placing the second participant in a queue to speak after the call topic has shifted; and displaying an indication on the graphical user interface (GUI) that the second participant is to speak after the call topic has shifted or when the first participant stops speaking.  (See the independent claims).

Claims 6 and 16. (Previously Presented) The computer-implemented method of claim 1, wherein unmuting the second participant is based on the one or more topic modeling neural network models determining a transition point.  (see the independent claims or at least Abuelsaad: a discussion transitions from one topic to another, [0034]).

Claim 7. (Previously Presented) The computer-implemented method of claim 1, wherein determining when the call topic has shifted includes: tracking the call data to determine keywords of the conference call; and determining the first participant has stopped speaking based on the keywords.  (See the independent claims or at least Kotri: A point of change in audio content is a change of voice (e.g., a first presenter stops talking and a second presenter begins talking), a pause of at least a predetermined duration, [0058]).

Claim 8, wherein determining the first participant and the second participant are speaking at the same time includes: determining whether different audio inputs from different devices are input at the same time or determining that there are two or more voices being input at the same time; and based upon a determination that the different audio inputs are input from the different devices at the same time or that there are two or more voices being input at the same time, determining two or more participants are speaking at the same time. (Abuelsaad: See Fig. 1 of various lines and [0028] At step S203, teleconference management program 101 identifies an identity associated with each set of utterances. In an embodiment, teleconference management program 101 uses speaker verification software to verify an identity of a speaker associated with a set of utterances. Here, a speech sample (i.e., utterance) is compared against a previously created voice signature (i.e., voice print, template, or model). In an embodiment, teleconference management program 101 uses voice recognition software (i.e., speaker identification software) to identify an identity of a speaker associated with a set of utterances. Speaker identification software identifies a speaker based on unique characteristics included within a speech sample, [0028].   Gorti: when two speakers attempt to speak at substantially the same time, [0029]).

Claims 9-10, wherein determining the first participant and the second participant are speaking at the same time further includes, before determining the two or more participants are speaking at the same time: determining whether an audio input of the different audio inputs is a background noise; and based on a determination that the audio input of the different audio inputs is not background noise, determining two or more participants are speaking at the same time. (Abuelsaad: teleconference management program 101 identifies a plurality of utterances from the plurality of audio signals. An utterance may generally be understood as a verbal communication (e.g., word or statement), non-lexical communication (e.g., exclamations, sighs, laughs, cries, and shouts), or background noises detected by teleconference management program 101 during a teleconference, [0023]) wherein determining the first participant and the second participant are speaking at the same time further includes, before determining two or more participants are speaking at the same time: determining whether a voice of the two or more voices is a non-verbal utterance; and based on a determination that the voice is not a non-verbal utterance, determining two or more participants are speaking at the same time.  (Abuelsaad: teleconference management program 101 dynamically mutes an audio signal corresponding to a non-relevant set of utterances, [0037]).

Claim 12. (Previously Presented) The system of claim 11, wherein analyzing the call data to identify a plurality of persons on the conference call includes: extracting a second audio portion of the call data; processing the second audio portion to form a second feature vector;  processing the second feature vector through the one or more topic modeling neural network models to map utterances to one or more entities; and mapping the one or more entities to the plurality of persons.  (See claim 2).

Claim 14. (Previously Presented) The system of claim 11, wherein the received call data includes predetermined words; and determining that the second participant is more relevant to the call topic than the first participant includes comparing the speaking of the first and second participant to the predetermined words.  (Abuelsaad: Based on keywords and/or phrases associated with utterances from John, John's utterances are determined to be associated with the context “music., [0033]).

Claim 17. (Previously Presented) The system of claim 11, wherein determining when the second participant has stopped speaking on the call topic includes: determining whether the text includes keywords; and based upon a determination that the text includes the keywords, determining the second participant has stopped speaking on the call topic.  (See the independent claims or at least Kotri: A point of change in audio content is a change of voice (e.g., a first presenter stops talking and a second presenter begins talking), a pause of at least a predetermined duration, [0058]).

Claims 18-19, wherein determining the first participant and the second participant are speaking at the same time includes: determining whether different audio inputs from different devices are input at the same time or determining that there are two or more voices being input at the same time; and based upon a determination that the different audio inputs are input from the different devices at the same time or that there are two or more voices being input at the same time, determining two or more participant are speaking at the same time; (Abuelsaad: teleconference management program 101 identifies a plurality of utterances from the plurality of audio signals. An utterance may generally be understood as a verbal communication (e.g., word or statement), non-lexical communication (e.g., exclamations, sighs, laughs, cries, and shouts), or background noises detected by teleconference management program 101 during a teleconference, [0023]), wherein determining the first participant and the second participant are speaking at the same further includes, before determining two or more participant are speaking at the same time:  determining whether an audio input of the different audio inputs is a background noise; and based on a determination that the audio input of the different audio inputs is not background noise, determining the two or more participant are speaking at the same time.  (Abuelsaad: teleconference management program 101 dynamically mutes an audio signal corresponding to a non-relevant set of utterances, [0037]).

Claim(s) 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Gorti  in view of Abuelsaad, Billigmeier and  Kotri and further in view of Sarris (US 2016/0255126).

Claims 5 and 15,Gorti does not teach “the GUI, as displayed on a device associated with the second participant, includes an opt-out element that when selected by input of the second participant removes the second participant from the queue”.
Sarris teaches, leave queue button 2027, when pressed will remove the participant (301) from the waiting queue (20230, [0167].
Therefore it would have been obvious to the ordinary artisan before the effective filing date to incorporate the teaching of Sarris into the teaching of Gorti to specifically provide a waiting participant an opportunity to leave the waiting queue if he or she deems waiting is no longer necessary. 

				Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUNG-HOANG J. NGUYEN whose telephone number is (571)270-1949. The examiner can normally be reached Reg. Sched. 6:00-3:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PHUNG-HOANG J NGUYEN/Primary Examiner, Art Unit 2691
Read full office action
Prosecution Timeline

Jun 28, 2024
Application Filed
Feb 18, 2026
Non-Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/843,731
Patent 12598256
DISRUPTED-SPEECH MANAGEMENT ENGINE FOR A MEETING MANAGEMENT SYSTEM
2y 5m to grant Granted Apr 07, 2026
18/518,577
Patent 12591408
DISPLAY APPARATUS AND METHOD INCORPORATING INTEGRATED SPEAKERS WITH ADJUSTMENTS
2y 5m to grant Granted Mar 31, 2026
17/989,972
Patent 12587612
Method and Device for Invoking Public or Private Interactions during a Multiuser Communication Session
2y 5m to grant Granted Mar 24, 2026
18/256,155
Patent 12587705
LIVESTREAMING AUDIO PROCESSING METHOD AND DEVICE
2y 5m to grant Granted Mar 24, 2026
18/629,549
Patent 12587700
GROUPING IN A SYSTEM WITH MULTIPLE MEDIA PLAYBACK PROTOCOLS
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
99%
With Interview (+32.1%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 877 resolved cases by this examiner. Grant probability derived from career allow rate.
METHODS AND SYSTEMS FOR AUTOMATIC QUEUING IN CONFERENCE CALLS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email