DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Introduction
This office action is in response to communications filed 10/30/2025. Claims 1-7, 9-14 and 16-20 are pending, and likewise have been examined.
Response to Amendment
Amendment filed 07/30/2025 has been fully considered by Examiner. The objection to Claim 7 has been withdrawn. The 112(f) interpretation for elements of Independent Claim 1 and likewise depending claims 2-13 have been maintained.
Response to Arguments
Applicant's arguments, See Remarks, Pg 7-11, filed 10/30/2025 have been fully considered but they are not persuasive.
Examiner believes the prior art of record teaches the claimed limitations.
Applicant argues that the audio of Munoz is transcribed then delivered to the moderator and not transmitting the user utterance in audio format by the microphone.
Examiner argues that Munoz provides both options for sending audio or text to the moderator manager(Para [0013], Ln 1-22, moderator manager may begin to receive audio and video streams from one or more participants present in the call) this audio being originally captured by a microphone(Para [0038], Ln 1-7, conference application A (105) may receive audio and video input streams from cameras and microphones of the client device A).
Applicant argues that Munoz only teaches the communications to the moderator when the user takes an action, and not using the communication channel without direct action form the user.
Examiner argues that the claims are not limited to the degree that applicant asserts. The claims do not exclude some action being taken by the user. Munoz provides examples in(Para [0013], Ln 1-22, A synthetic moderator may be invoked during a call with one or more participants present in the call. Invocation may involve one or more participants pronouncing a phrase (e.g., common activation phrase, user-defined phrase, keyword, any phrase with microphone muted in software, etc.) and interacting with an activation element in the user interface (e.g., activation button, activation slider, toggle, checkbox, interactive image and/or animation, etc.). Also, even if the claims were limited to that extent, the activation phrase could be considered a trigger. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
For these reasons Examiner believes that the prior art of record teaches the claimed limitations.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: “a controller configured to” on line 4 of claim 1, “a communicator configured to” on line 6 of claim 1, “a dialog system that is configured to” on line 8 of claim 1, “the controller” on line 10 of claim 1, “the dialog system” on line 11 of claim 1, “the communicator” on line 12 of claim 1, “the dialog system” on line 14 of claim 1, “a storage configured to store” on line 1 of claim 2, “the storage” on line 1 of claim 4, “a speech recognition module configured to” on line 1 of claim 5, “the storage” on line 1 of claim 6, “the controller” on line 1 of claim 9, “the controller” on line 1 of claim 11, “the dialog system” on line 4 of claim 11, “the controller” on line 1 of claim 12, “the communicator” on line 2 of claim 12, “the controller” on line 1 of claim 13, “the communicator” on line 2 of claim 13.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-7 and 10-13 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Munoz et al. (US 20230059158 A1).
Regarding Claim 1:
Munoz teaches a user terminal comprising: a microphone through which a speech of a user is input(Para [0038], Ln 1-7, conference application A (105) may receive audio and video input streams from cameras and microphones of the client device A);
a speaker through which a speech of a counterpart is output during a call(Para [0151], Ln 1-4, data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device. Para [[[0013], Ln 1-4, conference calls for the participants of the calls);
a controller configured to activate a speech recognition function upon receiving a trigger signal during the call(Para [0064], Ln 1-10, participant of the client device A (304) may pronounce a keyword phrase triggering the activation (322) that may engage a synthetic moderator operating on the client device A (304). The moderator manager may receive the notification message about synthetic moderator being engaged and moderator manager may activate transcription of audio from each of the call participants);
and a communicator configured to transmit information related to the speech of the user which is input through the microphone after the trigger signal is input and information related to content of the call to a dialogue system that is configured to perform the speech recognition function(Para [0052], Ln 1-13, by receiving an audio stream with speech and transcribing the speech from the audio stream to text…..In one embodiment, a moderator manager includes the transcription program and process the audio stream to generate utterance text. Para [0064], Ln 1-10, participant of the client device A (304) may pronounce a keyword phrase triggering the activation (322) that may engage a synthetic moderator operating on the client device A (304). The moderator manager may receive the notification message about synthetic moderator being engaged and moderator manager may activate transcription of audio from each of the call participants),
wherein the controller is further configured to control the speaker to output a system response transmitted from the dialogue system(Para [0151], Ln 1-4, data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device. Para [0072], Ln 1-13, moderator manager on the client device (304) produces a result that may identify the resolved scheduled call date and time…...The moderator manager may have each of the current call participants hear the response on their devices. Para [0021], Ln 1-11, moderator manager may produce a response to one or more participants before, during, or after the intent is executed…..In one embodiment, the response may be presented with synthesized and recorded speech).
and wherein the communicator is further configured to transmit the speech of the user which is input by the microphone to the counterpart through a first channel and to transmit the speech of the user to the dialogue system through a second channel(Para [0015], Ln 1-12, the first participant may have muted microphone and invoke the synthetic moderator by speaking. The first participant may hear a synthesized audio message and the second and third participants may not receive indication of the synthetic moderator being engaged by the first participant. The first participant may interact with the synthetic moderator while second and third participants are not aware of the interaction).
Regarding Claim 2:
Munoz teaches the user terminal of claim 1, further including a storage configured to store the information related to the content of the call(Para [0023], Ln 1-7, moderator manager may record the intent, gather information and context, store a response to persistent storage…and use at least some of the recorded information when extracting intents).
Regarding Claim 3:
Munoz teaches the user terminal of claim 2, wherein the information related to the content of the call includes the speech of the user and the speech of the counterpart that are input during the call(Para [0013], Ln 13-18, moderator manager may begin to receive textual representation of audio stream (e.g., transcription). For example, the first participant may pronounce activation phrase. Transcription of audio streams of the first, second, and third participants may be sent to moderator manager. Para [0052], Ln 1-13, by receiving an audio stream with speech and transcribing the speech from the audio stream to text…..In one embodiment, a moderator manager includes the transcription program and process the audio stream to generate utterance text).
Regarding Claim 4:
Munoz teaches the user terminal of claim 3, wherein the storage is configured to store the information related to the content of the call in a form of an audio signal(Para [0052], Ln 1-13, by receiving an audio stream with speech and transcribing the speech from the audio stream to text…..In one embodiment, a moderator manager includes the transcription program and process the audio stream to generate utterance text. Para [0130], Ln 1-10, computing system (800) may include one or more computer processor(s) (802), non-persistent storage).
Regarding Claim 5:
Munoz teaches the user terminal of claim 3, further including a speech recognition module configured to convert the speech of the user and the speech of the counterpart that are input during the call into text(Para [0052], Ln 1-13, by receiving an audio stream with speech and transcribing the speech from the audio stream to text…..In one embodiment, a moderator manager includes the transcription program and process the audio stream to generate utterance text. Para [0064], Ln 1-10, participant of the client device A (304) may pronounce a keyword phrase triggering the activation (322) that may engage a synthetic moderator operating on the client device A (304). The moderator manager may receive the notification message about synthetic moderator being engaged and moderator manager may activate transcription of audio from each of the call participants. The moderator manager executes on the client device A (304). In one embodiment, the moderator manager may execute on the server (302)).
Regarding Claim 6:
Munoz teaches the user terminal of claim 5, wherein the storage is configured to store the information related to the content of the call in a form of text(Para [0023], Ln 1-7, moderator manager may record the intent, gather information and context, store a response to persistent storage…and use at least some of the recorded information when extracting intents. Para [0052], Ln 1-13, by receiving an audio stream with speech and transcribing the speech from the audio stream to text…..In one embodiment, a moderator manager includes the transcription program and process the audio stream to generate utterance text).
Regarding Claim 7:
Munoz teaches the user terminal of claim 2, wherein the system response is generated based on the information related to the content of the call and the speech of the user which is input through the microphone after the trigger signal is input(Para [0016], Ln 1-5, moderator manager may process the video, audio and text received from one or more participants to extract a requested or expected action (e.g., intent). Para [0118], Ln 1-9, activation (724) is triggered by the first participant pronouncing a keyword phrase that may engage a synthetic moderator on the client device A (704)……and moderator manager may activate transcription of audio from one or more call participants.
Regarding Claim 10:
Munoz teaches the user terminal of claim 1, wherein the trigger signal includes a predetermined specific word spoken by the user to the counterpart during the call(Para [0013], Ln 1-10, synthetic moderator may be invoked during a call with one or more participants present in the call. Invocation may involve one or more participants pronouncing a phrase (e.g., common activation phrase, user-defined phrase, keyword, any phrase).
Regarding Claim 11:
Munoz teaches the user terminal of claim 3, wherein the controller is further configured to transmit the information related to the content of the call stored within a predetermined time period based on a time point at which the speech recognition function is activated to the dialogue system through the communicator(Para [0103], Ln 1-14, agenda functionality may have an intent to be executed at one or more execution moments during the call (e.g., after one quarter of the scheduled call time passes, during an interruption and/or a moment of silence, ten minutes before the call is scheduled to end, etc.)….The moderator manager may create a delayed intent with execution moment set to the next audio silence and/or one or more participants may pronounce a keyword and/or a phrase that moderator manager may recognize as request to engage the agenda functionality. Para [0105], Ln 1-8, third participant (using the client device E (612)) may request the agenda functionality to perform the updated intent (634) by speaking the agenda points that were covered since the last time agenda functionality intent was executed. The moderator manager may process the transcript of third participant's speech, extract the intent to mark some of the agenda points as completed, and execute the intent).
Regarding Claim 12:
Munoz teaches the user terminal of claim 1, wherein the controller is further configured to control the communicator to transmit the system response to the counterpart in a case in which the system response is related to the content of the call(Para [0022], Ln 1-8, the first participant may initiate an intent to schedule a call. The moderator manager may gather context and information, execute the intent, and initiate playback of pre-recorded synthesized utterance that may be perceived by one or more connected call participants that the call was scheduled).
Regarding Claim 13:
Munoz teaches the user terminal of claim 1, wherein the controller is further configured to control the communicator to transmit the system response to the counterpart according to selection of the user(Para [0026], Ln 1-6, the first participant may initiate an intent for spoken call points to be summarized and distributed to one or more call participants before and after the call ends).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 9 and 14, 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Munoz as applied to claim 1 above, and further in view of Campbell(US 20240233728 A1).
Regarding Claim 9:
Munoz teaches the user terminal of claim 1, but does not teach wherein, the controller is further configured to close the first channel so that the speech of the user input through the microphone is not transmitted to the counterpart in response to the receiving the trigger signal.
In the same field of conference calls assistants, Campbell teaches wherein, the controller is further configured to close the first channel so that the speech of the user input through the microphone is not transmitted to the counterpart in response to the receiving the trigger signal(Para [0028], Ln 1-12, In response to detecting the user pose, method 200 mutes the internet call to receive a voice command, at 202. For example, a user may be on a conference call when they decide to initiate the voice commands. However, the user may not want to interrupt the call or allow the other users on the conference call to hear that the user is initiating the voice commands. The user may also not want to have to click on the screen or keyboard to activate the voice command feature. Therefore, the user can use the gesture to initiate the voice command and in response, the conference call may be muted. The user would then be able to provide the voice commands to navigate or control an application).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Munoz, with the Mute functionality of Campbell, as it improves user convenience(Para [0028], Ln 1-12).
Regarding Claim 14:
Munoz teaches a method of controlling a user terminal, the method comprising: receiving a speech of a user through a microphone(Para [0038], Ln 1-7, conference application A (105) may receive audio and video input streams from cameras and microphones of the client device A);
outputting, through a speaker, a speech of a counterpart during a call(Para [0151], Ln 1-4, data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device. Para [[[0013], Ln 1-4, conference calls for the participants of the calls);
storing information related to content of the call(Para [0023], Ln 1-7, moderator manager may record the intent, gather information and context, store a response to persistent storage…and use at least some of the recorded information when extracting intents);
activating, by a controller, a speech recognition function upon receiving a trigger signal during the call(Para [0064], Ln 1-10, participant of the client device A (304) may pronounce a keyword phrase triggering the activation (322) that may engage a synthetic moderator operating on the client device A (304). The moderator manager may receive the notification message about synthetic moderator being engaged and moderator manager may activate transcription of audio from each of the call participants);
transmitting information related to the speech of the user which is input through the microphone after the trigger signal is input and information related to the content of the call to a dialogue system that is configured to perform the speech recognition function(Para [0052], Ln 1-13, by receiving an audio stream with speech and transcribing the speech from the audio stream to text…..In one embodiment, a moderator manager includes the transcription program and process the audio stream to generate utterance text. Para [0064], Ln 1-10, participant of the client device A (304) may pronounce a keyword phrase triggering the activation (322) that may engage a synthetic moderator operating on the client device A (304). The moderator manager may receive the notification message about synthetic moderator being engaged and moderator manager may activate transcription of audio from each of the call participants);
and controlling, by the controller, the speaker to output a system response transmitted from the dialogue system(Para [0151], Ln 1-4, data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device. Para [0072], Ln 1-13, moderator manager on the client device (304) produces a result that may identify the resolved scheduled call date and time…...The moderator manager may have each of the current call participants hear the response on their devices. Para [0021], Ln 1-11, moderator manager may produce a response to one or more participants before, during, or after the intent is executed…..In one embodiment, the response may be presented with synthesized and recorded speech).
Munoz does not specifically teach wherein the method further includes transmitting the speech of the user input through the microphone during the call to the counterpart through a first channel of the communicator, wherein the transmitting of the information to the dialogue system that is configured to perform the speech recognition function includes closing the first channel and transmitting the speech of the user to the dialogue system through a second channel of the communicator.
In the same field of conference calls assistants, Campbell teaches wherein the method further includes transmitting the speech of the user input through the microphone during the call to the counterpart through a first channel of the communicator, wherein the transmitting of the information to the dialogue system that is configured to perform the speech recognition function includes closing the first channel and transmitting the speech of the user to the dialogue system through a second channel of the communicator(Para [0028], Ln 1-12, In response to detecting the user pose, method 200 mutes the internet call to receive a voice command, at 202. For example, a user may be on a conference call when they decide to initiate the voice commands. However, the user may not want to interrupt the call or allow the other users on the conference call to hear that the user is initiating the voice commands. The user may also not want to have to click on the screen or keyboard to activate the voice command feature. Therefore, the user can use the gesture to initiate the voice command and in response, the conference call may be muted. The user would then be able to provide the voice commands to navigate or control an application).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Munoz, with the Mute functionality of Campbell, as it improves user convenience(Para [0028], Ln 1-12).
Regarding Claim 16:
The combination of Munoz and Campbell teaches the method of claim 14, and Munoz teaches further including, in a case in which the system response is related to the content of the call, transmitting the system response to the counterpart through the communicator(Para [0022], Ln 1-8, the first participant may initiate an intent to schedule a call. The moderator manager may gather context and information, execute the intent, and initiate playback of pre-recorded synthesized utterance that may be perceived by one or more connected call participants that the call was scheduled).
Regarding Claim 17:
The combination of Munoz and Campbell teaches the method of claim 14, and Munoz teaches further including: receiving a selection of the user as to whether to transmit the system response to the counterpart; and transmitting the system response to the counterpart through the communicator based on the selection of the user(Para [0026], Ln 1-6, the first participant may initiate an intent for spoken call points to be summarized and distributed to one or more call participants before and after the call ends).
Claim(s) 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Munoz et al.(US 20230059158 A1), and further in view of Rodriguez Bravo et al. (US 20230290348 A1) and further in view of Campbell(US 20240233728 A1).
Regarding Claim 18:
Munoz teaches a dialogue management method comprising: receiving, from a user terminal, information related to content of a call between a user and a counterpart(Para [0052], Ln 1-13, by receiving an audio stream with speech and transcribing the speech from the audio stream to text…..In one embodiment, a moderator manager includes the transcription program and process the audio stream to generate utterance text. Para [0064], Ln 1-10, participant of the client device A (304) may pronounce a keyword phrase triggering the activation (322) that may engage a synthetic moderator operating on the client device A (304). The moderator manager may receive the notification message about synthetic moderator being engaged and moderator manager may activate transcription of audio from each of the call participants);
predicting an intention of the user based on the information related to the content of the call(Para [0016], Ln 1-5, moderator manager may process the video, audio and text received from one or more participants to extract a requested or expected action (e.g., intent));
proactively generating a system response corresponding to the predicted intention of the user(Para [0072], Ln 1-13, moderator manager on the client device (304) produces a result that may identify the resolved scheduled call date and time…...The moderator manager may have each of the current call participants hear the response on their devices. Para [0021], Ln 1-11, moderator manager may produce a response to one or more participants before, during, or after the intent is executed…..In one embodiment, the response may be presented with synthesized and recorded speech);
transmitting the system response to the user terminal(Para [0021], Ln 1-11, moderator manager may produce a response to one or more participants before, during, or after the intent is executed…..In one embodiment, the response may be presented with synthesized and recorded speech. Para [0064], Ln 1-10, The moderator manager executes on the client device A (304). In one embodiment, the moderator manager may execute on the server (302));
Munoz does not teach in response to receiving a speech of the user related to the system response from the user terminal after the call ends, generating a new system response corresponding to the received speech of the user; and transmitting the new system response to the user terminal.
In the same field of conference calls assistants, Rodriguez Bravo teaches in response to receiving a speech of the user related to the system response from the user terminal after the call ends, generating a new system response corresponding to the received speech of the user(Para [0081], Ln 1-27, The Primary AI Assistant listens for the wake keywords or phrases that indicate that the participants need to perform a scheduling action…primary user’s command is ambiguous, the Primary AI Assistant generates the expected or draft item in block 823 but waits for completion of the call to resolve any ambiguities…..Once the call has ended, the Primary AI Assistant provides a call summary to the primary user in block 819. The primary user can give the Primary AI Assistant additional commands and retrieve a history of calls for similar meetings or group of users. Once the call has ended, the Primary AI Assistant queries the primary user to resolve any ambiguities with the saved items, in block 825. Once the schedule and task items are complete, the Primary AI Assistant provides the items to participants’ user PDAs either directly or via electronic communication as discussed above);
and transmitting the new system response to the user terminal(Para [0081], Ln 1-27, Once the schedule and task items are complete, the Primary AI Assistant provides the items to participants’ user PDAs either directly or via electronic communication as discussed above).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify Munoz with the disambiguation of Rodriguez Bravo as it improves the quality of the assistants output(Para [0081], Ln 1-27).
The combination of Munoz and Rodriguez Bravo does not specifically teach wherein the method further includes transmitting, by the user terminal, the speech of the user which is input through the microphone during the call to the counterpart through a first channel; wherein the receiving, from a user terminal, information related to content of a call between a user and a counterpart includes closing the first channel, and receiving the speech of the user which is input through the microphone through a second channel of the user terminal.
In the same field of conference calls assistants, Campbell teaches wherein the method further includes transmitting, by the user terminal, the speech of the user which is input through the microphone during the call to the counterpart through a first channel; wherein the receiving, from a user terminal, information related to content of a call between a user and a counterpart includes closing the first channel, and receiving the speech of the user which is input through the microphone through a second channel of the user terminal(Para [0028], Ln 1-12, In response to detecting the user pose, method 200 mutes the internet call to receive a voice command, at 202. For example, a user may be on a conference call when they decide to initiate the voice commands. However, the user may not want to interrupt the call or allow the other users on the conference call to hear that the user is initiating the voice commands. The user may also not want to have to click on the screen or keyboard to activate the voice command feature. Therefore, the user can use the gesture to initiate the voice command and in response, the conference call may be muted. The user would then be able to provide the voice commands to navigate or control an application).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Munoz and Rodriguez Bravo, with the Mute functionality of Campbell, as it improves user convenience(Para [0028], Ln 1-12).
Regarding Claim 19:
The combination of Munoz, Rodriguez Bravo and Campbell teaches the dialogue management method of claim 18, but does not teach wherein, upon ending the call, the user terminal is configured to activate a speech recognition function.
In the same field of conference calls assistants, Rodriguez Bravo teaches wherein, upon ending the call, the user terminal is configured to activate a speech recognition function(Para [0081], Ln 1-27, Once the call has ended, the Primary AI Assistant provides a call summary to the primary user in block 819. The primary user can give the Primary AI Assistant additional commands and retrieve a history of calls for similar meetings or group of users. Once the call has ended, the Primary AI Assistant queries the primary user to resolve any ambiguities with the saved items, in block 825. Once the schedule and task items are complete, the Primary AI Assistant provides the items to participants’ user PDAs either directly or via electronic communication as discussed above. Para [0065], Ln 14-23, voice recognition module 411 may use a trained natural language processing (NLP) and/or acoustic model to identify a current speaker and respond to commands and instructions from the primary user).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Munoz, Rodriquez Bravo and Campbell with the disambiguation of Rodriquez Bravo as it improves the quality of the assistants output(Para [0081], Ln 1-27).
Claim(s) 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Munoz, Rodriguez Bravo and Campbell as applied to claim 19 above, and further in view of B M S et al. (US 20210390144 A1).
Regarding Claim 20:
The combination of Munoz, Rodriquez Bravo and Campbell teaches the dialogue management method of claim 19, but does not teach further including, after the call ends, determining whether the speech of the user received from the user terminal is related to the system response.
In the same field of conference call assistants, B M S teaches further including, after the call ends, determining whether the speech of the user received from the user terminal is related to the system response(Para [0149], Ln 1-10, The conferencing server 116 may send the response feedback request as part of a message at the end of conference meeting. Para [0102], Ln 1-22, The participants may provide feedback in the form of a message sent from a respective conference client device 108 to the conferencing server 116 (step S410). Upon receiving the message, and based on the feedback, the AWL engine 140 may update the responses in the response database 220 and/or the AI-bot query response database 224, and/or train one or more of the recommendation engine 216, the query engine 228, and the response generator 236 (step S411). This update and/or training may be performed, at least in part, by analyzing the feedback, determining that a response to a query previously provided was accurate, inaccurate, off-topic, provided quickly, provided with a significant delay, acceptable, unacceptable…..The training of the AI/ML engine 140, based on the feedback, may allow future SME selections (e.g., for consideration as candidates for consultation) to provide better and/or quicker responses).
It would have been obvious for one skilled in the art, at the effective time of filling, to modify the combination of Munoz, Rodriquez Bravo and Campbell with the feedback system of B M S, as it improves the quality of the system responses(Para [0102], Ln 1-22).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEXANDER G MARLOW whose telephone number is (571)272-4536. The examiner can normally be reached Monday - Thursday 10:00 am - 8:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richmond Dorvil can be reached at (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ALEXANDER G MARLOW/Assistant Examiner, Art Unit 2658
/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658