DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-17 and 19-20 are pending, and claims 1, 8 and 15 are independent claims.
Response to Arguments
Applicant’s arguments with respect to claims 1-17 and 19-20, filed on 10/30/2025, on page 8, para 7-8, have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 5, 8 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Geppert et al. Pub. No. US 2010/0251142 A1 (Geppert) in view of Mai et al. Pat Pub. No. US 2013/0329866 A1 (Mai), and further in view of Basu et al. Pat App No. US 20080300872 A1 (Basu).
Regarding Claim 1, Geppert discloses a system comprising:
a non-transitory computer-readable medium (Geppert, para 0022; a non-transitory computer-readable medium);
a communications interface (Geppert, para 0024; communications interface); and
a processor communicatively coupled to the non-transitory computer-readable medium and the communications interface (Geppert, para 0022; a non-transitory computer-readable medium in connection with the necessary hardware components, such as the processor), the processor configured to execute processor-executable instructions (Geppert, para 0066; computer-executable instructions) stored in the non-transitory computer-readable medium (Geppert, para 0022; stored in a non-transitory computer-readable medium) to:
join a video conference hosted by a video conference provider (Geppert, para 0004, video conference sessions; Geppert, para 0040, For example, the user can click a call icon, a video conference icon, an IM icon, an email icon, or a social media icon to invite another user to join the communication session… The system 100 then automatically contacts that person in their desired mode, a sender preferred mode, a currently available mode based on presence information, or in a common available mode between the participants and joins that person to the conference call);
receive a request to join a sidebar meeting (Geppert, para 0030, a sidebar communication session (i.e., sidebar meeting) may occur; Geppert, para 0050, if the host were to simply speak up ("Are you ready to join us again?"), then the sidebar groups could hear and respond).
Geppert does not specifically disclose receive from the video conference provider: a first set of audio and video streams corresponding to a main meeting of the video conference, a second set of audio and video streams corresponding to the sidebar meeting of the video conference, identify, by a sidebar assistant, one or more keywords in an audio stream from the second set of audio and video streams, generating, by the sidebar assistant, a note based on the one or more keywords identified in the audio stream from the second set of audio and video streams, determine, by the sidebar assistant, an instruction to perform one or more operations based on semantic meaning of the one or more keywords, determining at least one of the first or second sets of audio and video streams based on the one or more keywords, and perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams.
However, Mai, in the same field of endeavor, discloses:
receive from the video conference provider (Mai, para 0016, the host or participant initiating the breakout session and/or the host of the main meeting, may configure the system to alert a member or members of the sub-group):
a first set of audio and video streams corresponding to a main meeting of the video conference (Mai, Figure 1, para 0022, Output 116 can be an audio stream, a video stream, or an audiovisual stream; [i.e., “audiovisual stream” from the main conference, Figure 1, Element 116 in Figure 1]);
a second set of audio and video streams corresponding to the sidebar meeting of the video conference (Mai, Figure 1, para 0022, output 118 can be an audio stream, a video stream, or an audiovisual stream; [i.e., audio and video streams from the breakout (sidebar) session; Element 118 in Figure 1]),
identify, by a sidebar assistant, one or more keywords in an audio stream from the second set of audio and video streams corresponding to the sidebar meeting (Mai, para 0016, a member of the sub-group, such as the host or participant initiating the breakout session and/or the host of the main meeting, may configure the system to alert a member or members of the sub-group to return to the main meeting upon detecting certain predefined keywords. As used herein, a keyword may be single word or may also be a plurality of words such as a phrase. In particular embodiments, keywords may include the names of the meeting participants attending the breakout session and/or groups affiliated with a meeting participant (e.g., the participant's department such as marketing). For example, the system can be configured to alert members of a sub-group if the host of the main meeting states "alright everyone, let's re-convene." In an example embodiment, speech recognition technology is employed to detect when keywords or phrases are spoken in the main meeting. A signal can be communicated to members of the sub-group (for example, the video of a sidebar session can flicker or flash red in the background when key words in the main session are detected));
Geppert in view of Mai do not specifically disclose determine, by the sidebar assistant, an instruction to perform one or more operations based on a semantic meaning of the one or more keywords, determining which of at least one of the first or second sets of audio and video streams based on the one or more keywords to perform the one or more operations on, and perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams.
However, Basu, in the same field of endeavor, discloses:
determine, by the sidebar assistant, an instruction to perform one or more operations based on a semantic meaning of the one or more keywords (Basu, para 0053-0054, FIG. 5 depicts a system 500 that can provide additional context for a hierarchical display of keywords forming a scalable summary in accord with various aspects of the subject innovation… Speech recognition component 506 can receive, parse, and translate audio information associated with or descriptive of content 504 into text. Summarization component 508 can receive such text and generate keywords descriptive of content 504, and assign a keyphrase relevance rank to each keyword… zoom component 510 can control a density, font size, etc. of keywords presented within an available space to modify a level of detail associated with a summary and zoom factor. System 500 can further provide additional context to keywords presented on browsing interface 502 (e.g., as generated by summarization component 508 and populated by zoom component 510… System 500 can enable a user to control display of keywords and additional words presented in association with context component 512);
determining which of at least one of the first or second sets of audio and video streams based on the one or more keywords to perform the one or more operations on (Basu, para 0067-0068, FIG. 9 depicts a sample methodology 900 for presenting scalable summaries of content in accord with aspects of the subject disclosure. At 902, content is analyzed to identify distinctive patterns of speech contained therein. Such speech can be in the form of a commentary (e.g., broadcast news), discussion (e.g., professional lecture), overview, etc., associated with some audio and/or video content… The relevance rank(s) can indicate a likelihood of occurrence of a keyword and/or how representative a keyword is of a topic of discussion or other aspect of content. The relevance rank can be established at least in part on non-verbal cues (pitch, tone, loudness, and/or pauses of a speaker's voice), speaker turn information including a number of occurrences of a keyword in a speaker turn, visual cues, a TFIDF factor associated with a keyword, or combinations thereof. At 908, portions of recorded content are mapped to the keywords. Such mapping can, for example, allow the portions of recorded content to be accessed and/or played back by a user by selecting the keyword. As a more specific example, each keyword can be a link (e.g., hyperlink HTML link, XML link, and the like) to a local or remote data store containing the recorded content (see, for instance, FIG. 13 infra). Selecting the keyword can begin playback of the content at a point related to the keyword. For example, selection of a keyword can cause a recording to begin playing at a point in which the selected keyword occurs in the recording. At 910, a number of keywords are presented based on the relevance scale and a zoom factor. The zoom factor can be based, for instance, on an amount of graphical space available to render keywords, and a threshold level established by a user, or a default value. The zoom factor can be compared to the relevance scale associated with each keyword to determine whether a particular keyword is to be rendered or not. Consequently, by adjusting the zoom factor a user can increase and decrease a number of keywords presented, thereby transitioning from a broad overview to a detailed description of content in accord with aspects disclosed herein.); and
perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams (Basu, para 0056-0059, FIG. 6 illustrates a further example system 600 that provides scalable summaries of audio and/or video content in accord with aspects of the subject innovation. Content 602 can include any suitable auditory and/or visual information that includes or can be associated with a speech, text, and/or conversation based description or document (e.g., described by text, or speech, or discussed in conversation, etc. such that aspects of the audio and/or video information can be distinguished from other aspects and articulated via such speech, text, and/or conversation; examples could include closed caption text information broadcast with news, played with movies, etc.) … Summarization component 610 can generate a plurality of keywords associated with content 602 and associate a keyword rank with each keyword, as described supra. Additionally, keywords can be grouped at least in regard to a topic of conversation(s) associated with a keyword and a speaker turn(s) articulating a keyword, as described above. Zoom component 612 can display a number of keywords as a function of keyword rank and a zoom factor, such that particular topics can be selected and display of a number of keywords associated with those topics can be increased or decreased. Additionally, zoom component 612 can display larger or fewer numbers of keywords associated with particular speaker turns in order to give a user varied control of the display of information associated with content 602. Mapping component 614 can associate one or more keywords with recorded portions of content 602. Such association can enable a user to access and play a portion (e.g., on a media player device, electronic video and/or audio playback device, etc.) the portion of content 602 related to a selected keyword. For example, a bigram "lion charges" associated with a summary of a jungle safari film can initiate playback of an audio/video recording where a commentator is discussing a lion charging prey, and/or where a video portion of the recording is depicting such events ).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Basu in the method of Geppert in view of Mai because this would enable keywords to be organized as a function of occurrence within a summary presentation, where keywords appearing before and after each other are displayed in a distinct manner indicating such sequence (e.g., keywords occurring earlier in time can appear above, to the left of, etc., keywords that occur later in time) and a quick visual scan of keywords as a function of timeline can indicate to a viewer a manner in which a conversation, discussion etc. progresses over time (Basu, para 0051). ().
Regarding Claim 5, Geppert in view of Mai and Basu discloses the system of all the limitations of claim 4, wherein the processor is configured to execute further processor-executable instructions (Geppert, para 0066; computer-executable instructions) stored in the non-transitory computer-readable medium (Geppert, para 0022; stored in a non-transitory computer-readable medium).
Mai further teaches:
display a sidebar note, wherein the sidebar note comprises the note (Mai, para 0052, At 904, a breakout session (or sidebar) is initiated by a sub-group of participants of the main meeting. A separate audio, and in some embodiments video and/or text, stream is provided to members of the breakout session; [“sidebar text” as “sidebar note”]).
Regarding Claim 8, Geppert discloses a method comprising:
joining a video conference hosted by a video conference provider (Geppert, para 0004, video conference sessions; Geppert, para 0040, For example, the user can click a call icon, a video conference icon, an IM icon, an email icon, or a social media icon to invite another user to join the communication session… The system 100 then automatically contacts that person in their desired mode, a sender preferred mode, a currently available mode based on presence information, or in a common available mode between the participants and joins that person to the conference call);
receiving a request to join a sidebar meeting receiving from the video conference provider (Geppert, para 0030, a sidebar communication session (i.e., sidebar meeting) may occur; Geppert, para 0050, if the host were to simply speak up ("Are you ready to join us again?"), then the sidebar groups could hear and respond);
Geppert does not specifically disclose a first set of audio and video streams corresponding to a main meeting of the video conference, a second set of audio and video streams corresponding to the sidebar meeting of the video conference, identifying, by a sidebar assistant, one or more keywords in an audio stream from the second set of audio and video streams, determining, by the sidebar assistant, an instruction to perform one or more operations based on semantic meaning of the one or more keywords, determining which of at least one of the first or second sets of audio and video streams based on the one or more keywords to perform the one or more operations on, and perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams.
However, Mai, in the same field of endeavor, teaches:
a first set of audio and video streams corresponding to a main meeting of the video conference (Mai, Figure 1, para 0022, Output 116 can be an audio stream, a video stream, or an audiovisual stream; [i.e., “audiovisual stream” from the main conference, Figure 1, Element 116 in Figure 1]); and
a second set of audio and video streams corresponding to the sidebar meeting of the video conference (Mai, Figure 1, para 0022, output 118 can be an audio stream, a video stream, or an audiovisual stream; [i.e., audio and video streams from the breakout (sidebar) session; Element 118 in Figure 1] );identifying, by a sidebar assistant, one or more keywords in an audio stream from the second set of audio and video streams (Mai, para 0016, a member of the sub-group, such as the host or participant initiating the breakout session and/or the host of the main meeting, may configure the system to alert a member or members of the sub-group to return to the main meeting upon detecting certain predefined keywords. As used herein, a keyword may be single word or may also be a plurality of words such as a phrase. In particular embodiments, keywords may include the names of the meeting participants attending the breakout session and/or groups affiliated with a meeting participant (e.g., the participant's department such as marketing). For example, the system can be configured to alert members of a sub-group if the host of the main meeting states "alright everyone, let's re-convene." In an example embodiment, speech recognition technology is employed to detect when keywords or phrases are spoken in the main meeting. A signal can be communicated to members of the sub-group (for example, the video of a sidebar session can flicker or flash red in the background when key words in the main session are detected));
Geppert in view of Mai do not specifically disclose determine, by the sidebar assistant, an instruction to perform one or more operations based on a semantic meaning of the one or more keywords, determining which of at least one of the first or second sets of audio and video streams based on the one or more keywords to perform the one or more operations on, and perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams.
However, Basu, in the same field of endeavor, discloses:
determining, by the sidebar assistant, an instruction to perform one or more operations based on a semantic meaning of the one or more keywords (Basu, para 0053-0054, FIG. 5 depicts a system 500 that can provide additional context for a hierarchical display of keywords forming a scalable summary in accord with various aspects of the subject innovation… Speech recognition component 506 can receive, parse, and translate audio information associated with or descriptive of content 504 into text. Summarization component 508 can receive such text and generate keywords descriptive of content 504, and assign a keyphrase relevance rank to each keyword … zoom component 510 can control a density, font size, etc. of keywords presented within an available space to modify a level of detail associated with a summary and zoom factor. System 500 can further provide additional context to keywords presented on browsing interface 502 (e.g., as generated by summarization component 508 and populated by zoom component 510… System 500 can enable a user to control display of keywords and additional words presented in association with context component 512);
determining which of at least one of the first or second sets of audio and video streams based on the one or more keywords to perform the one or more operations on (Basu, para 0067-0068, FIG. 9 depicts a sample methodology 900 for presenting scalable summaries of content in accord with aspects of the subject disclosure. At 902, content is analyzed to identify distinctive patterns of speech contained therein. Such speech can be in the form of a commentary (e.g., broadcast news), discussion (e.g., professional lecture), overview, etc., associated with some audio and/or video content… The relevance rank(s) can indicate a likelihood of occurrence of a keyword and/or how representative a keyword is of a topic of discussion or other aspect of content. The relevance rank can be established at least in part on non-verbal cues (pitch, tone, loudness, and/or pauses of a speaker's voice), speaker turn information including a number of occurrences of a keyword in a speaker turn, visual cues, a TFIDF factor associated with a keyword, or combinations thereof. At 908, portions of recorded content are mapped to the keywords. Such mapping can, for example, allow the portions of recorded content to be accessed and/or played back by a user by selecting the keyword. As a more specific example, each keyword can be a link (e.g., hyperlink HTML link, XML link, and the like) to a local or remote data store containing the recorded content (see, for instance, FIG. 13 infra). Selecting the keyword can begin playback of the content at a point related to the keyword. For example, selection of a keyword can cause a recording to begin playing at a point in which the selected keyword occurs in the recording. At 910, a number of keywords are presented based on the relevance scale and a zoom factor. The zoom factor can be based, for instance, on an amount of graphical space available to render keywords, and a threshold level established by a user, or a default value. The zoom factor can be compared to the relevance scale associated with each keyword to determine whether a particular keyword is to be rendered or not. Consequently, by adjusting the zoom factor a user can increase and decrease a number of keywords presented, thereby transitioning from a broad overview to a detailed description of content in accord with aspects disclosed herein); and
perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams (Basu, para 0056-0059, FIG. 6 illustrates a further example system 600 that provides scalable summaries of audio and/or video content in accord with aspects of the subject innovation. Content 602 can include any suitable auditory and/or visual information that includes or can be associated with a speech, text, and/or conversation based description or document (e.g., described by text, or speech, or discussed in conversation, etc. such that aspects of the audio and/or video information can be distinguished from other aspects and articulated via such speech, text, and/or conversation; examples could include closed caption text information broadcast with news, played with movies, etc.) … Summarization component 610 can generate a plurality of keywords associated with content 602 and associate a keyword rank with each keyword, as described supra. Additionally, keywords can be grouped at least in regard to a topic of conversation(s) associated with a keyword and a speaker turn(s) articulating a keyword, as described above. Zoom component 612 can display a number of keywords as a function of keyword rank and a zoom factor, such that particular topics can be selected and display of a number of keywords associated with those topics can be increased or decreased. Additionally, zoom component 612 can display larger or fewer numbers of keywords associated with particular speaker turns in order to give a user varied control of the display of information associated with content 602. Mapping component 614 can associate one or more keywords with recorded portions of content 602. Such association can enable a user to access and play a portion (e.g., on a media player device, electronic video and/or audio playback device, etc.) the portion of content 602 related to a selected keyword. For example, a bigram "lion charges" associated with a summary of a jungle safari film can initiate playback of an audio/video recording where a commentator is discussing a lion charging prey, and/or where a video portion of the recording is depicting such events).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Basu in the method of Geppert in view of because this will enable keywords to be organized as a function of occurrence within a summary presentation, where keywords appearing before and after each other are displayed in a distinct manner indicating such sequence (e.g., keywords occurring earlier in time can appear above, to the left of, etc., keywords that occur later in time) and a quick visual scan of keywords as a function of timeline can indicate to a viewer a manner in which a conversation, discussion etc. progresses over time (Basu, para 0051).
Regarding Claim 15, Geppert discloses a non-transitory computer-readable medium comprising processor-executable instructions configured to cause one or more processors to:
join a video conference hosted by a video conference provider (Geppert, para 0004, video conference sessions; Geppert, para 0040, For example, the user can click a call icon, a video conference icon, an IM icon, an email icon, or a social media icon to invite another user to join the communication session… The system 100 then automatically contacts that person in their desired mode, a sender preferred mode, a currently available mode based on presence information, or in a common available mode between the participants and joins that person to the conference call);
receive
Geppert does not specifically disclose a first set of audio and video streams corresponding to a main meeting of the video conference, a second set of audio and video streams corresponding to the sidebar meeting of the video conference, identify, by a sidebar assistant, one or more keywords in an audio stream from the second set of audio and video streams, generating, by the sidebar assistant, a note based on the one or more keywords identified in the audio stream from the second set of audio and video streams, determine, by the sidebar assistant, an instruction to perform one or more operations based on semantic meaning of the one or more keywords, determining which of at least one of the first or second sets of audio and video streams based on the one or more keywords to perform the one or more operations on, and perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams.
However, Mai, in the same field of endeavor, discloses:
a first set of audio and video streams corresponding to a main meeting of the video conference (Mai, Figure 1, para 0022, Output 116 can be an audio stream, a video stream, or an audiovisual stream; [i.e., “audiovisual stream” from the main conference, Figure 1, Element 116 in Figure 1]);
a second set of audio and video streams corresponding to the sidebar meeting of the video conference (Mai, Figure 1, para 0022, output 118 can be an audio stream, a video stream, or an audiovisual stream; [i.e., audio and video streams from the breakout (sidebar) session; Element 118 in Figure 1]).
identify, by a sidebar assistant, one or more keywords in an audio stream from the second set of audio and video streams (Mai, para 0016, a member of the sub-group, such as the host or participant initiating the breakout session and/or the host of the main meeting, may configure the system to alert a member or members of the sub-group to return to the main meeting upon detecting certain predefined keywords. As used herein, a keyword may be single word or may also be a plurality of words such as a phrase. In particular embodiments, keywords may include the names of the meeting participants attending the breakout session and/or groups affiliated with a meeting participant (e.g., the participant's department such as marketing). For example, the system can be configured to alert members of a sub-group if the host of the main meeting states "alright everyone, let's re-convene." In an example embodiment, speech recognition technology is employed to detect when keywords or phrases are spoken in the main meeting. A signal can be communicated to members of the sub-group (for example, the video of a sidebar session can flicker or flash red in the background when key words in the main session are detected));
Geppert in view of Mai do not specifically disclose determine, by the sidebar assistant, an instruction to perform one or more operations based on a semantic meaning of the one or more keywords, determining which of at least one of the first or second sets of audio and video streams based on the one or more keywords to perform the one or more operations on, and perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams.
However, Basu, in the same field of endeavor, discloses:
determine, by the sidebar assistant, an instruction to perform one or more operations based on a semantic meaning of the one or more keywords (Basu, para 0053-0054, FIG. 5 depicts a system 500 that can provide additional context for a hierarchical display of keywords forming a scalable summary in accord with various aspects of the subject innovation… Speech recognition component 506 can receive, parse, and translate audio information associated with or descriptive of content 504 into text. Summarization component 508 can receive such text and generate keywords descriptive of content 504, and assign a keyphrase relevance rank to each keyword … zoom component 510 can control a density, font size, etc. of keywords presented within an available space to modify a level of detail associated with a summary and zoom factor. System 500 can further provide additional context to keywords presented on browsing interface 502 (e.g., as generated by summarization component 508 and populated by zoom component 510… System 500 can enable a user to control display of keywords and additional words presented in association with context component 512);
determining which of at least one of the first or second sets of audio and video streams based on the one or more keywords to perform the one or more operations on (Basu, para 0067-0068, FIG. 9 depicts a sample methodology 900 for presenting scalable summaries of content in accord with aspects of the subject disclosure. At 902, content is analyzed to identify distinctive patterns of speech contained therein. Such speech can be in the form of a commentary (e.g., broadcast news), discussion (e.g., professional lecture), overview, etc., associated with some audio and/or video content… The relevance rank(s) can indicate a likelihood of occurrence of a keyword and/or how representative a keyword is of a topic of discussion or other aspect of content. The relevance rank can be established at least in part on non-verbal cues (pitch, tone, loudness, and/or pauses of a speaker's voice), speaker turn information including a number of occurrences of a keyword in a speaker turn, visual cues, a TFIDF factor associated with a keyword, or combinations thereof. At 908, portions of recorded content are mapped to the keywords. Such mapping can, for example, allow the portions of recorded content to be accessed and/or played back by a user by selecting the keyword. As a more specific example, each keyword can be a link (e.g., hyperlink HTML link, XML link, and the like) to a local or remote data store containing the recorded content (see, for instance, FIG. 13 infra). Selecting the keyword can begin playback of the content at a point related to the keyword. For example, selection of a keyword can cause a recording to begin playing at a point in which the selected keyword occurs in the recording. At 910, a number of keywords are presented based on the relevance scale and a zoom factor. The zoom factor can be based, for instance, on an amount of graphical space available to render keywords, and a threshold level established by a user, or a default value. The zoom factor can be compared to the relevance scale associated with each keyword to determine whether a particular keyword is to be rendered or not. Consequently, by adjusting the zoom factor a user can increase and decrease a number of keywords presented, thereby transitioning from a broad overview to a detailed description of content in accord with aspects disclosed herein); and
perform the one or more operations based on the one or more keywords and the determined at least one of the first or second sets of audio and video streams (Basu, para 0056-0059, FIG. 6 illustrates a further example system 600 that provides scalable summaries of audio and/or video content in accord with aspects of the subject innovation. Content 602 can include any suitable auditory and/or visual information that includes or can be associated with a speech, text, and/or conversation based description or document (e.g., described by text, or speech, or discussed in conversation, etc. such that aspects of the audio and/or video information can be distinguished from other aspects and articulated via such speech, text, and/or conversation; examples could include closed caption text information broadcast with news, played with movies, etc.) … Summarization component 610 can generate a plurality of keywords associated with content 602 and associate a keyword rank with each keyword, as described supra. Additionally, keywords can be grouped at least in regard to a topic of conversation(s) associated with a keyword and a speaker turn(s) articulating a keyword, as described above. Zoom component 612 can display a number of keywords as a function of keyword rank and a zoom factor, such that particular topics can be selected and display of a number of keywords associated with those topics can be increased or decreased. Additionally, zoom component 612 can display larger or fewer numbers of keywords associated with particular speaker turns in order to give a user varied control of the display of information associated with content 602. Mapping component 614 can associate one or more keywords with recorded portions of content 602. Such association can enable a user to access and play a portion (e.g., on a media player device, electronic video and/or audio playback device, etc.) the portion of content 602 related to a selected keyword. For example, a bigram "lion charges" associated with a summary of a jungle safari film can initiate playback of an audio/video recording where a commentator is discussing a lion charging prey, and/or where a video portion of the recording is depicting such events).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Basuin the method of Geppert in view of Mai because this will enable keywords to be organized as a function of occurrence within a summary presentation, where keywords appearing before and after each other are displayed in a distinct manner indicating such sequence (e.g., keywords occurring earlier in time can appear above, to the left of, etc., keywords that occur later in time) and a quick visual scan of keywords as a function of timeline can indicate to a viewer a manner in which a conversation, discussion etc. progresses over time (Basu, para 0051).
Claims 2-4, 6-7, 9-14, 16-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Geppert in view of Mai, further in view of Basu, and further in view of McQuiston et al. Pat Pub. No. US 2020/0258525 A1 (McQuiston).
Regarding Claim 2, Geppert in view of Mai and Basu disclose the system of all the limitations of claim 1, wherein the processor is configured to execute further processor-executable instructions (Geppert, para 0066; computer-executable instructions) stored in the non-transitory computer-readable medium (Geppert, para 0022; stored in a non-transitory computer-readable medium).
Geppert teaches sidebar sessions or meetings (Geppert, para 0030, a sidebar session (i.e., sidebar meeting) may occur; para 0048, sidebar discussions have various levels of connection with the main conference).
Geppert in view of Mai and Basu does not specifically disclose transcribing the audio stream and analyzing the transcription by the sidebar (i.e., virtual) assistant.
However, McQuiston, in the same field of endeavor, teaches transcribing the audio stream and analyzing the transcription by the sidebar (i.e., virtual) assistant (McQuiston, para 0014; the virtual assistant transcribing the audio feed using a speech-recognition algorithm; the virtual assistant providing the transcription to at least one of the plurality of attendees; the virtual assistant receiving an edited transcription; and the virtual assistant updating the speech recognition algorithm based on the edited transcription. McQuiston, para 0016, the virtual assistant analyzing the transcription or the edited transcription for action items; McQuiston, para 0017; the virtual assistant analyzing the transcription or the edited transcription for attendee sentiment; McQuiston, para 0018; the virtual assistant analyzing the transcription or the edited transcription and generating a summary of the electronic meeting).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable the virtual assistant to automatically edit the action items of the meeting and generate a summary of the electronic meeting (McQuiston, para 0017 and 0018).
Regarding Claim 3, Geppert in view of Mai and Basu discloses the system of all the limitations of claim 1, wherein the processor is configured to execute further processor-executable instructions (Geppert, para 0066; computer-executable instructions) stored in the non-transitory computer-readable medium (Geppert, para 0022; stored in a non-transitory computer-readable medium).
identify, based on one or more recognized words, the one or more keywords (Geppert, para 0058, detects certain words or instructions).
Geppert in view of Mai and Basu do not specifically disclose performing speech recognition on the audio stream from the second set of audio and video streams.
However, McQuiston, in the same field of endeavor, teaches performing speech recognition on audio stream from a set of audio and video streams (McQuiston, para 0007; The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable the virtual assistant to provide the transcription to at least one of the plurality of attendees, to receive an edited transcription, and also to update the speech recognition algorithm based on the edited transcription (McQuiston, para 0007).
Regarding Claim 4, Geppert in view of Mai and Basu discloses the system of all the limitations of claim 1, wherein the processor is configured to execute further processor-executable instructions (Geppert, para 0066; computer-executable instructions) stored in the non-transitory computer-readable medium (Geppert, para 0022; stored in a non-transitory computer-readable medium).
determine based on the one or more keywords an instruction during the sidebar meeting (Mai, para 0016, keywords may include the names of the meeting participants attending the breakout session and/or groups affiliated with a meeting participant (e.g., the participant's department such as marketing). For example, the system can be configured to alert members of a sub-group if the host of the main meeting states "alright everyone, let's re-convene." In an example embodiment, speech recognition technology is employed to detect when keywords or phrases are spoken in the main meeting. A signal can be communicated to members of the sub-group (for example, the video of a sidebar session can flicker or flash red in the background when key words in the main session are detected)
the second set of audio and video streams (Mai, Figure 1, para 0022, output 118 can be an audio stream, a video stream, or an audiovisual stream; [i.e., audio and video streams from the breakout (sidebar) session; Element 118 in Figure 1])
Geppert in view of Mai and Basu do not disclose determine based on the one or more keywords an instruction to take notes, and generate a note.
However, McQuiston, in the same field of endeavor, teaches determine based on the one or more keywords an instruction to take notes (McQuiston, paras 0067 – 0071, One of ordinary skill in the art may readily appreciate that the disclosed virtual assistant embodiments may advantageously reduce lost time and efficiency caused by conventional meetings and note-taking . … a virtual assistant may free up participants to participate without the distraction of note-taking . … The system of the embodiments or portions of the system of the embodiments may be in the form of a “processing machine,”… As noted above, the processing machine executes the instructions that are stored in the memory or memories to process data. This processing of data may be in response to commands by a user or users of the processing machine, in response to previous processing, in response to a request by another processing machine and/or any other input, for example; [i.e., note-taking is just one of the commands/instructions “in response to commands by a user or users of the processing machine” disclosed in this portion of the reference]); and
generate a note (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant may free up participants to participate without the distraction of note-taking; McQuiston, para 0035, audio and/or video from a meeting (received as a stream); McQuiston, paras 0005-0007, An individual may take notes of the meeting to help remember and record, but that individual may not be able to participate or listen as effectively in the meeting as a result. Existing technology for meetings suffers from lost information and lost productivity of meeting participants. A need therefore exists for a means to automatically transcribe, record, and analyze discussions during a meeting. The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm, may provide the transcription to at least one of the plurality of attendees).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable the virtual assistant to provide the transcription to at least one of the plurality of attendees, to receive an edited transcription, and also to update the speech recognition algorithm based on the edited transcription (McQuiston, para 0007).
Regarding Claim 6, Geppert in view of Mai and Basu discloses the system of all the limitations of claim 1, wherein the one or more operations comprises one or more of (Geppert, para 0066; computer-executable instructions; Geppert, para 0022; stored in a non-transitory computer-readable medium).
Geppert in view of Mai and Basu do not specifically teach a recording function, a note function, a task function.
However, McQuiston, in the same field of endeavor, discloses:
a recording function (McQuiston, para 0049; the virtual assistant may initiate a function to record; McQuiston, para 0050; cause the virtual assistant to call a function to record);
a note function (McQuiston, para 0049; the virtual assistant may initiate a function to record or designate an action item); or
a task function (McQuiston, para 0050; the virtual assistant may call pre-defined functions).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this would enable the user to employ the virtual assistant to perform different functions (McQuiston, para 0049).
Regarding Claim 7, Geppert in view of Mai and Basu discloses the system of all the limitations of claim 6, wherein the one or more operations comprises a recording function, and wherein the processor is configured to execute further processor-executable instructions (Geppert, para 0066; computer-executable instructions) stored in the non-transitory computer-readable medium (Geppert, para 0022; stored in a non-transitory computer-readable medium).
Geppert teaches recording a segment of the first set of audio and video streams from the main meeting (Geppert, para 0030; An audio recording and transcription of the conference can be recorded) and also teaches sidebar meetings (Geppert, para 0048, Figure 4A; These different groups can be characterized as sidebar discussions and have various levels of connection with the main conference).
Geppert in view of Mai and Basu do not specifically disclose sidebar assistants, or virtual assistants, invoking the recording function.
However, McQuiston, in the same field of endeavor, teaches the use of virtual assistants (i.e., sidebar assistants) invoking the recording function (McQuiston, para 0035; Virtual assistant may record the audio and/or video from a meeting (received as a stream)), responsive to determining, by the virtual (i.e., sidebar) assistant, to invoke the recording function based on the one or more keywords (McQuiston, para 0049; the virtual assistant may initiate a function to record; the virtual assistant may call pre-defined functions in response to certain trigger words spoken during the meeting).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this would enable the virtual assistant to call pre-defined functions in response to certain trigger words spoken during the meeting (McQuiston, para 0049).
Regarding Claim 9, Geppert in view of Mai and Basu discloses the method of all the limitations of claim 8, wherein identifying, by the sidebar assistant, the one or more keywords (Geppert, para 0058, detects certain words or instructions) in the audio stream from the second set of audio and video streams (Geppert, para 0035; feed a live stream of images from a camera or video camera) comprises:
Geppert in view of Mai and Basu does not specifically disclose performing
However, McQuiston, in the same field of endeavor, teaches performing speech recognition on the audio stream from the second set of audio and video streams to identify one or more recognized words (McQuiston, para 0007; The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable the user to implement the virtual assistant to provide the transcription to at least one of the plurality of attendees, to receive an edited transcription, and also to update the speech recognition algorithm based on the edited transcription (McQuiston, para 0007; The virtual assistant may provide the transcription to at least one of the plurality of attendees, may receive an edited transcription, and may update the speech recognition algorithm based on the edited transcription).
Regarding Claim 10, Geppert in view of Mai and Basu discloses the method of all the limitations of claim 9, wherein:
identifying, by the sidebar assistant, the one or more keywords (Geppert, para 0058, detects certain words or instructions) in the audio stream from the second set of audio and video streams (Geppert, para 0035; feed a live stream of images from a camera or video camera) corresponding to the sidebar meeting comprises identifying, by the sidebar assistant, the one or more keywords from the transcription (Geppert, para 0058, detects certain words or instructions).
Geppert in view of Mai and Basu do not specifically disclose performing speech recognition on the audio stream from the second set of audio and video streams comprises transcribing
However, McQuiston, in the same field of endeavor, teaches performing speech recognition on the audio stream from the second set of audio and video streams comprises transcribing the audio stream from the second set of audio and video streams to generate a transcription of the audio stream (McQuiston, para 0007; The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable the user to implement the virtual assistant to provide the transcription to at least one of the plurality of attendees, to receive an edited transcription, and also to update the speech recognition algorithm based on the edited transcription (McQuiston, para 0007).
Regarding Claim 11, Geppert in view of Mai and Basu discloses the method of all the limitations claim 8, further comprising.
determine based on the one or more keywords an instruction to take notes during the sidebar meeting (Mai, para 0016, keywords may include the names of the meeting participants attending the breakout session and/or groups affiliated with a meeting participant (e.g., the participant's department such as marketing). For example, the system can be configured to alert members of a sub-group if the host of the main meeting states "alright everyone, let's re-convene." In an example embodiment, speech recognition technology is employed to detect when keywords or phrases are spoken in the main meeting. A signal can be communicated to members of the sub-group (for example, the video of a sidebar session can flicker or flash red in the background when key words in the main session are detected); and
the second set of audio and video streams (Mai, Figure 1, para 0022, output 118 can be an audio stream, a video stream, or an audiovisual stream; [i.e., audio and video streams from the breakout (sidebar) session; Element 118 in Figure 1])
Geppert in view of Mai and Basu does not disclose generate a note.
However, McQuiston, in the same field of endeavor, teaches generate a note (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant [i.e., a sidebar assistant] may free up participants to participate without the distraction of note-taking; McQuiston, para 0035, audio and/or video from a meeting (received as a stream); McQuiston, paras 0005-0007, An individual may take notes of the meeting to help remember and record, but that individual may not be able to participate or listen as effectively in the meeting as a result. Existing technology for meetings suffers from lost information and lost productivity of meeting participants. A need therefore exists for a means to automatically transcribe, record, and analyze discussions during a meeting. The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm, may provide the transcription to at least one of the plurality of attendees).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable the virtual assistant to provide the transcription to at least one of the plurality of attendees, to receive an edited transcription, and also to update the speech recognition algorithm based on the edited transcription (McQuiston, para 0007).
Regarding Claim 12, Geppert in view of Mai and Basu discloses the method of all the limitations of claim 11, wherein the one or more operations comprises a recording function (Geppert, para 0048, Figure 4A; These different groups can be characterized as sidebar discussions and have various levels of connection with the main conference; Geppert, para 0030; An audio recording and transcription of the conference can be recorded), and the method further comprises:
Geppert in view of Mai and Basu do not specifically disclose recording, by the sidebar assistant, a segment of the first set of audio and video streams from the main meeting, responsive to determining, by the sidebar assistant, to invoke the recording function based on the one or more keywords. Geppert also does not specifically disclose generating, by the sidebar assistant, a recording note based on the segment of the first set of audio and video streams recorded.
However, McQuiston, in the same field of endeavor, teaches recording, by the sidebar assistants (i.e., virtual assistants) a segment of the first set of audio and video streams from the main meeting (McQuiston, para 0035; Virtual assistant may record the audio and/or video from a meeting (received as a stream)), responsive to determining, by the sidebar (i.e., virtual) assistant, to invoke the recording function based on the one or more keywords (McQuiston, para 0049; the virtual assistant may initiate a function to record; the virtual assistant may call pre-defined functions in response to certain trigger words spoken during the meeting). McQuiston also teaches generating a recording note (i.e., taking notes automatically) based on the segment of the first set of audio and video streams recorded (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant may free up participants to participate without the distraction of note-taking; McQuiston, para 0035, audio and/or video from a meeting (received as a stream); McQuiston, para 0064, keyword matching may be used to identify).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert and Basu in view of Mai because this will enable the virtual assistant to call pre-defined functions in response to certain trigger words spoken during the meeting (McQuiston, para 0049), and also because this will enable an attendee, or an absentee, to review the notes or transcript of the meeting at a later date, thus avoiding information loss (McQuiston, para 0067).
Regarding Claim 13, Geppert in view of Mai and Basu discloses the method of all the limitations of claim 11, wherein the one or more operations comprises a note function, and the method further comprises:
Mai further discloses:
a segment of the audio stream from the second set of audio and video streams corresponding to the sidebar meeting (Mai, Figure 1, para 0022, output 118 can be an audio stream, a video stream, or an audiovisual stream; [i.e., audio and video streams from the breakout (sidebar) session; Element 118 in Figure 1]).
Geppert in view of Mai and Basu do not specifically disclose transcribing, by the sidebar (i.e., virtual) assistant responsive to determining, by the sidebar assistant, to invoke the note function based on the one or more keywords, and Geppert also does not specifically disclose generating, by the sidebar assistant, a recording note based on the segment of the first set of audio and video streams recorded.
However, McQuiston, in the same field of endeavor, teaches:
transcribing, by the sidebar (i.e., virtual) assistant, responsive to determining, by the sidebar assistant, to invoke the note function based on the one or more keywords (McQuiston, para 0014; the virtual assistant transcribing the audio feed using a speech-recognition algorithm; the virtual assistant providing the transcription to at least one of the plurality of attendees; the virtual assistant receiving an edited transcription; and the virtual assistant updating the speech recognition algorithm based on the edited transcription. McQuiston, para 0016, the virtual assistant analyzing the transcription or the edited transcription for action items; McQuiston, para 0017; the virtual assistant analyzing the transcription or the edited transcription for attendee sentiment; McQuiston, para 0018; the virtual assistant analyzing the transcription or the edited transcription and generating a summary of the electronic meeting; McQuiston, para 0049; the virtual assistant may call pre-defined functions in response to certain trigger words (i.e., keywords) spoken during the meeting).
generating, by the virtual (i.e., sidebar) assistant, the note based on the segment of the audio stream from the second set of audio and video streams transcribed (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant may free up participants to participate without the distraction of note-taking; McQuiston, para 0035, audio and/or video from a meeting (received as a stream); McQuiston, para 0064, keyword matching may be used to identify).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this would enable an attendee, or an absentee, to review the notes or transcript of the meeting at a later date, thus avoiding information loss (McQuiston, para 0067) which would also enable the virtual assistant to automatically edit the action items of the meeting and generate a summary of the electronic meeting (McQuiston, para 0017 and 0018).
Regarding Claim 14, Geppert in view of Mai and Basu discloses the method of all the limitations of claim 11, wherein the one or more operations comprises a task function, and the method further comprises:
Geppert in view of Mai and Basu do not specifically disclose generating, by the sidebar assistant, a task note, responsive to determining, by the sidebar assistant, to invoke the task function based on the one or more keywords.
However, McQuiston, in the same field of endeavor, teaches generating, by the sidebar assistant, a task note, responsive to determining, by the sidebar assistant, to invoke the task function based on the one or more keywords (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant may free up participants to participate without the distraction of note-taking; McQuiston, para 0035, audio and/or video from a meeting (received as a stream); McQuiston, para 0064, keyword matching may be used to identify; McQuiston, para 0049; the virtual assistant may call pre-defined functions in response to certain trigger words (i.e., keywords) spoken during the meeting).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable an attendee, or an absentee, to review the notes or transcript of the meeting at a later date, thus avoiding information loss (McQuiston, para 0067).
Regarding Claim 16, Geppert in view of Mai and Basu discloses the non-transitory computer-readable medium of claim 15, further comprising processor-executable instructions configured to cause the one or more processors to.
Mai further teaches:
display, on the first client device, a sidebar note, wherein the sidebar note comprises the notes (Mai, para 0052, At 904, a breakout session (or sidebar) is initiated by a sub-group of participants of the main meeting. A separate audio, and in some embodiments video and/or text, stream is provided to members of the breakout session; [“sidebar text” as “sidebar note”]).
Geppert in view of Mai and Basu does not specifically disclose displaying in response to generating the note a sidebar note, wherein the sidebar note comprises notes generated by the sidebar assistant
However, McQuiston, in the same field of endeavor, teaches:
determine. based on the one or more keywords. an instruction to take notes during the sidebar meeting (McQuiston, paras 0067 – 0071, One of ordinary skill in the art may readily appreciate that the disclosed virtual assistant embodiments may advantageously reduce lost time and efficiency caused by conventional meetings and note-taking . … a virtual assistant may free up participants to participate without the distraction of note-taking . … The system of the embodiments or portions of the system of the embodiments may be in the form of a “processing machine,”… As noted above, the processing machine executes the instructions that are stored in the memory or memories to process data. This processing of data may be in response to commands by a user or users of the processing machine, in response to previous processing, in response to a request by another processing machine and/or any other input, for example; [i.e., note-taking is just one of the commands/instructions “in response to commands by a user or users of the processing machine” disclosed in this portion of the reference]);
generate a note based on the second set of audio and video streams (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant [i.e., a sidebar assistant] may free up participants to participate without the distraction of note-taking; McQuiston, para 0035, audio and/or video from a meeting (received as a stream); McQuiston, paras 0005-0007, An individual may take notes of the meeting to help remember and record, but that individual may not be able to participate or listen as effectively in the meeting as a result. Existing technology for meetings suffers from lost information and lost productivity of meeting participants. A need therefore exists for a means to automatically transcribe, record, and analyze discussions during a meeting. The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm, may provide the transcription to at least one of the plurality of attendees); and
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable an attendee, or an absentee, to review the notes or transcript of the meeting at a later date, thus avoiding information loss (McQuiston, para 0067).
Regarding Claim 17, Geppert in view of Mai and Basu discloses the non-transitory computer-readable medium of all the limitations of claim 15, wherein the one or more operations comprises a task function. and further comprising processor-executable instructions configured to cause the one or more processors to:
Geppert in view of Mai does not specifically disclose generate by the sidebar assistant. a task note. responsive to determining. by the sidebar assistant. to invoke the task function based on the one or more keywords.
However, McQuiston, in the same field of endeavor, teaches generate by the sidebar assistant. a task note. responsive to determining. by the sidebar assistant. to invoke the task function based on the one or more keywords (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant (i.e., a sidebar assistant) may free up participants to participate without the distraction of note-taking; McQuiston, para 0035, audio and/or video from a meeting (received as a stream); McQuiston, para 0064, keyword matching may be used to identify; McQuiston, para 0049, In one embodiment, the virtual assistant may call pre-defined functions in response to certain trigger words spoken during the meeting. For example, if a user says the words “take action item,” the virtual assistant may initiate a function to record or designate an action item. The virtual assistant application may also be connected directly to the meeting software, such that it may adjust settings and take action in the meeting software itself. For example, if a user says “share screen,” tile virtual assistant application may cause the meeting software to share that user's screen with tile rest of tile attendees; [“trigger words” as “keywords”; “initiate a function” as “invoke the task function”; “initiate/call” as “invoke”; also “application” as “invoking/invocation”; “call pre-defined functions in response to certain trigger words spoken” as “invoke the task function based on the one or more keywords”]).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable the virtual assistant to provide the transcription to at least one of the plurality of attendees, to receive an edited transcription, and also to update the speech recognition algorithm based on the edited transcription (McQuiston, para 0007).
Regarding Claim 19, Geppert in view of Mai and Basu discloses the non-transitory computer-readable medium of all the limitations of claim 15, further comprising processor-executable instructions configured to cause the one or more processors to:
Mai further discloses:
determine based on the one or more keywords an instruction during the sidebar meeting (Mai, para 0016, keywords may include the names of the meeting participants attending the breakout session and/or groups affiliated with a meeting participant (e.g., the participant's department such as marketing). For example, the system can be configured to alert members of a sub-group if the host of the main meeting states "alright everyone, let's re-convene." In an example embodiment, speech recognition technology is employed to detect when keywords or phrases are spoken in the main meeting. A signal can be communicated to members of the sub-group (for example, the video of a sidebar session can flicker or flash red in the background when key words in the main session are detected
the second set of audio and video streams (Mai, Figure 1, para 0022, output 118 can be an audio stream, a video stream, or an audiovisual stream; [i.e., audio and video streams from the breakout (sidebar) session; Element 118 in Figure 1])
Geppert in view of Mai and Basu do not specifically disclose being responsive to generating, by the sidebar assistant, the note based on the one or more keywords, record, by the sidebar assistant, a segment of the first set of audio and video streams corresponding to the main meeting, wherein the segment of the first set of audio and video streams recorded corresponds to a time that the one or more keywords were identified by the sidebar assistant.
However, McQuiston, in the same field of endeavor, teaches
based on the one or more keywords an instruction to take notes (McQuiston, paras 0067 – 0071, One of ordinary skill in the art may readily appreciate that the disclosed virtual assistant embodiments may advantageously reduce lost time and efficiency caused by conventional meetings and note-taking . … a virtual assistant may free up participants to participate without the distraction of note-taking . … This processing of data may be in response to commands by a user or users of the processing machine, in response to previous processing, in response to a request by another processing machine and/or any other input, for example);
generate a note (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant [i.e., a sidebar assistant] may free up participants to participate without the distraction of note-taking; McQuiston, para 0035, audio and/or video from a meeting (received as a stream); McQuiston, para 0007, The virtual assistant may receive at least an audio feed and a video feed of the electronic meeting in real-time, may transcribe the audio feed using a speech-recognition algorithm, may provide the transcription to at least one of the plurality of attendees); and
being responsive to generating, by the sidebar assistant, the note based on the one or more keywords, record, by the sidebar assistant, a segment of the first set of audio and video streams corresponding to the main meeting (McQuiston, para 0067; by automatically taking notes for a meeting, a virtual assistant (i.e., a sidebar assistant) may free up participants to participate without the distraction of note-taking; McQuiston, para 0035; audio and/or video from a meeting (received as a stream); McQuiston, para 0064; keyword matching may be used to identify), wherein the segment of the first set of audio and video streams recorded corresponds to a time that the one or more keywords were identified by the sidebar assistant (McQuiston, para 0020; the virtual assistant identifying a trigger word in the audio feed, and executing function in response to the trigger word(i.e., keyword)).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this will enable an attendee, or an absentee, to review the notes or transcript of the meeting at a later date, thus avoiding information loss (McQuiston, para 0067).
Regarding Claim 20, Geppert in view of Mai and Basu discloses the non-transitory computer-readable medium of all the limitations of claim 19 further comprising processor-executable instructions configured to cause the one or more processors to:
Geppert in view of Mai and Basu do not specifically disclose providing, by the sidebar assistant, a timestamp based on the note, wherein the timestamp corresponds to a time in the main meeting when the one or more keywords were identified.
However, McQuiston, in the same field of endeavor, teaches providing, by the sidebar assistant, a timestamp based on the note, wherein the timestamp corresponds to a time in the main meeting when the one or more keywords were identified (McQuiston, para 0062; the virtual (i.e., sidebar) assistant may analyze the transcript and/or audio to generate one or more of a summary of the meeting, time stamps for the transcript).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of McQuiston in the method of Geppert in view of Mai and Basu because this would enable the meeting organizer to specify the types of reminders, indicators, and/or announcements for the virtual assistant as the meeting progresses at various timestamps of the meeting (McQuiston, para 0051).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kumar (US 10540971 B2) discloses systems and methods for in-meeting group assistance using a virtual assistant. During a video conference, a virtual assistant may provide in-meeting assistance responsive to verbalized group intent. The virtual assistant may automatically perform actions, such as recording notes, creating calendar events, and obtaining information derived from previous meetings, in response to the detected group intent. The virtual assistant may also provide pre-meeting assistance.
Shen et al. (US 20190228380 A1) discloses systems and methods for logging and reviewing meeting. The system may include a memory storing computer-executable instructions and a processor. The processor may be configured to execute the instructions to perform operations. The operations may include receiving audio of the meeting captured by at least one microphone device and determining an arriving angle of speech from at least one attendee of the meeting based on the captured audio. The operations may also include generating a data stream based at least in part on the arriving angle of speech, determining an identification of the at least one attendee, and matching the identification to the data stream.
Asthana et al. (US 7830408 B2) discloses conference captioning that includes system and method for providing captioning in a conference. The method includes establishing a conference between a first participant and a second participant. A user option is provided to augment the conference with a second type of media corresponding to the first type of media. The second type of media is then generated based on one or more conference parameters in response to the signal.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MULUGETA T. DUGDA whose telephone number is (703)756-1106. The examiner can normally be reached Mon - Fri, 4:30am - 7:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MULUGETA TUJI DUGDA/Examiner, Art Unit 2653
/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653
01/19/2026