DETAILED ACTION
Notice of AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s 3/10/2026 Amendment and remarks have been considered. Claims 1-20 are pending.
Claim Objections. The objections to claims 1 and 16-20 are withdrawn in view of Applicant’s amendments to such claims.
Response to Arguments
On page 11 of Applicant’s 3/10/2026 Amendment, Applicant asserts that paras. 0092-0099 provide sufficient written description support for the claim amendments.
The examiner agrees that the portions of the disclosure identified by Applicant provide sufficient written description support for the claim amendments.
On pages 11-13 of Applicant’s 3/10/2026 Amendment, with respect to the rejections under 35 U.S.C. 101, Applicant argues that as amended, the claims include new limitations that integrate the judicial exception into a practical application. In particular, Applicant argues:
PNG
media_image1.png
330
664
media_image1.png
Greyscale
The examiner respectfully disagrees. Determining the frequency of how often a person participants in the meeting is a mental step that a human assistant, taking minutes, could perform, as explained in the detailed rejections below. No specific machine is needed, and no improvement to computer functionality or other technical field is provided by simply determining a frequency of how often a participant is participating.
On pages 13-15 of Applicant’s 3/10/2026 Amendment, with respect to the rejections of the independent claims under 35 U.S.C. 102, Applicant argues that as amended, the HILLELI reference does not teach the newly-added limitations.
The examiner agrees. The previous rejections to the independent claims under 35 U.S.C. 102 are hereby withdrawn. However, new grounds of rejection, under 35 U.S.C. 103 in view of the HILLELI and LEE references, are provided herein, where such new grounds of rejection are necessitated by Applicant’s amendments to the independent claims.
On pages 15-18 of Applicant’s 3/10/2026 Amendment, with respect to the rejection of the claims 4, 11, and 18 under 35 U.S.C. 103, Applicant argues that as amended, the HILLELI and WALKER references does not teach the newly-added limitation that “the notification is displayed to each of the participants in the meeting”.
The examiner respectfully disagrees. Para. 0099 of WALKER specifically says that “each participant’s user notifications 331 are adjusted for time of day,” meaning that the notification is provided to “each participant. Moreover, para. 0128 discloses the displaying of notifications for each participant. While the claim language requires that a notification be displayed to each participant, the claim language does not require the notification to be performed “at the same time” or “before the meeting” as argued by Applicant. There is no temporal requirement in the claim language, and WALKER discloses that “each participant’ receives such notifications.
On page 18-19 of Applicant’s 3/10/2026 Amendment, with respect to the rejection of the claims 6-7, 13-14, and 20 under 35 U.S.C. 103, Applicant argues that HILLELI does not teach “detect[ing] ... a time duration of a spoken sentence.”
The examiner respectfully disagrees. HILLELI at para. 0045 explicitly teaches that meeting transcripts are timestamped, so its straightforward to determine a time duration of a single sentence from such timestamps, particularly for a response that is a single sentence.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Step 1 of the Alice/Mayo framework, Claims 1-7 are directed to a method (a process), Claims 8-14 are directed to a computing device (a machine), and Claims 15-20 is directed to a non-transitory computer readable storage medium (an article of manufacture), which each fall within one of the four statutory categories of inventions.
Regarding Claim 1
Step 2A, prong 1 (Is the claim directed to a law of nature, a natural phenomenon or an abstract idea).
Claim 1 recites the following limitations relating to the judicial exception sub-grouping of “managing personal behavior or relationships or interactions between people” that includes social activities, teaching, and following rules or instruction. See MPEP 2106.04 II.C.
A method for providing a (under the yesbroadest reasonable interpretation, this limitation can be interpreted, for example, as assigning a human assistant to appear in a meeting for the purpose of taking notes, which merely assigns a role to an individual during the social activity of a meeting, or as another example, appointing someone to be Secretary of the Board of the Directors for a BoD meeting)
analyzing ... the set of data to the set of data to provide the (under the broadest reasonable interpretation, this limitation can be interpreted, for example, as assigning a human assistant to appear in a meeting for the purpose of taking notes, which merely assigns a role to an individual during the social activity of a meeting, where such assignment is based on analyzing data and is a mental process for a meeting participant to determine who should be the assistant, akin to the “mental process that a neurologist should follow when testing a patient for nervous system malfunctions” as set forth in example (iii) in MPEP 2106.04 II.C)
assisting, ... the participants in the meeting based on the analysis of the set of data (under the broadest reasonable interpretation, this limitation can be interpreted, for example, as having the assigned human assistant providing assistance to meeting participants, such as by analyzing data and creating a set of meeting minutes, or answering questions about what had previously been discussed at the meeting and at what time, in each case which are social activities between meeting participants)
wherein the assisting of the participants in the meeting comprises: (under the broadest reasonable interpretation, this limitation can be interpreted, for example, as having the assigned human assistant providing assistance to meeting participants, such as by analyzing data and creating a set of meeting minutes, or answering questions about what had previously been discussed at the meeting and at what time, in each case which are social activities between meeting participants)
analyzing, ... the first audio data to detect a frequency of at least one of the participants, the frequency being numerically expressible; (under the broadest reasonable interpretation, this limitation can be interpreted, for example, as having the assigned human assistant providing assistance to meeting participants, such as by analyzing an audio set to determine how often a participant participates in the meeting, which can just be a numeric count divided by the total number of speech sessions)
Step 2A, prong 2 (Does the claim recite additional elements that integrate the judicial exception into a practical application?).
The judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements (e.g., “virtual assistant”, “processor” and “trained model”) which are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
Regarding the “virtual assistant” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a virtual assistant. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a virtual assistant). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “receiving, at a processor, a set of data associated with the meeting” limitation, such additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
Regarding the “by the processor” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a processor. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “by the processor using a trained model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a processor and trained model. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor and trained model). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “receiving, at the processor, first audio data associated with the participants in the meeting” limitation, such additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
Regarding the “displaying, by the processor via a display, a notification associated with at least one event for which the frequency of the at least one of the participants is higher than a predefined numerical threshold level based on the analysis of the first audio data” limitation, such limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).
Accordingly, at Step 2A, prong two, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not integrate the judicial exception into a practical application.
Step 2B (Does the claim recite additional elements that amount to significantly more than the judicial exception?)
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception. As discussed above, the additional elements (e.g., “virtual assistant”, “processor” and “trained model”) are recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component (See MPEP 2106.05(f)).
Regarding the “virtual assistant” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “receiving, at a processor, a set of data associated with the meeting” limitation, as discussed above, the additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Regarding the “by the processor” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “by the processor using a trained model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “receiving, at the processor, first audio data associated with the participants in the meeting” limitation, as discussed above, the additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Regarding the “displaying, by the processor via a display, a notification associated with at least one event for which the frequency of the at least one of the participants is higher than a predefined numerical threshold level based on the analysis of the first audio data” limitation, this limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)). The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”)
Accordingly, at Step 2B, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not amount to significantly more than the judicial exception.
Regarding Claim 2
Step 2A, Prong 2
Regarding the “wherein the set of data comprises meeting details, participant details, discussion details, audio details, and video details” limitation, this limitation merely describes the types of data being considered and processed, and therefore such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not integrate a judicial exception into a practical application.
Step 2B
Regarding the “wherein the set of data comprises meeting details, participant details, discussion details, audio details, and video details” limitation, such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which does not amount to significantly more than the judicial exception. MPEP 2106.05(h).
Regarding Claim 3
Step 2A, Prong 1
transcribing, ... the received second audio data into raw textual data (under the broadest reasonable interpretation, this is a mental process (see MPEP 2106.04 III) that a human can perform mentally using a physical aid (e.g., pencil and paper), and is met by a human assistant listening to audio data and transcribing the words by writing them on a piece of paper)
processing, ... the raw textual data into processed textual data, wherein the textual data comprises a name of the speaker, corresponding text, a start time at which the speaker begins speaking, and an end time at which the speaker stops speaking data (under the broadest reasonable interpretation, this is a mental process (see MPEP 2106.04 III) that a human can perform mentally using a physical aid (e.g., pencil and paper), and is met by a human assistant listening to audio data and transcribing the words by writing them on a piece of paper, and then processing the written information to identify specifically identify the speaker of certain text, the corresponding text, and the start/stop time for the certain text)
Step 2A, Prong 2
Regarding the “receiving, at the processor, second audio data associated with a speaker that is participating in the meeting” limitation, such additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
Regarding the “by the processor” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a processor. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, the processed textual data based on a requirement of the participants” limitation, such limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).
Step 2B
Regarding the “receiving, at the processor, second audio data associated with a speaker that is participating in the meeting” limitation, as discussed above, the additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Regarding the “by the processor” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, the processed textual data based on a requirement of the participants” limitation, this limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)). The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”)
Regarding Claim 4
Step 2A, Prong 1
analyzing, ... the geographic details and the time zone details of the participants to determine a variation among the geographic details and the time zone details of the participants in the meeting (under the broadest reasonable interpretation, this is a mental process (see MPEP 2106.04 III) that a human can perform mentally using a physical aid (e.g., pencil and paper), and is met by a human assistant taking attendance and mentally determining that participants are in different geographic locations and time zones, e.g., to determine the largest difference between time zones)
Step 2A, Prong 2
Regarding the “receiving, by the processor, geographic details and time zone details of the participants” limitation, such additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
Regarding the “by the processor” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a processor. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, a notification that relates to the variation among the geographic details and the time zone details of the participants in the meeting” limitation, such limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).
Regarding the “wherein the notification is displayed to each of the participants in the meeting” limitation, such limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).
Step 2B
Regarding the “receiving, by the processor, geographic details and time zone details of the participants” limitation, as discussed above, the additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Regarding the “by the processor” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, a notification that relates to the variation among the geographic details and the time zone details of the participants in the meeting” limitation, this limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)). The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”)
Regarding the “wherein the notification is displayed to each of the participants in the meeting” limitation, this limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)). The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”)
Regarding Claim 5
Step 2A, Prong 1
analyzing, ... the third audio data in real-time to detect unconscious bias words used by the at least one speaker (under the broadest reasonable interpretation, this is a mental process (see MPEP 2106.04 III) that a human can perform mentally using a physical aid (e.g., pencil and paper), and is met by a human assistant listening to audio data in-real time and noting detected unconscious bias words (e.g., using the pronoun “he” instead of “he or she or they”)
suggesting, ... a replacement of the unconscious bias words to the at least one speaker (under the broadest reasonable interpretation, this is a mental process (see MPEP 2106.04 III) that a human can perform mentally using a physical aid (e.g., pencil and paper), and is met by a human assistant listening to audio data in-real time and noting detected unconscious bias words (e.g., using the pronoun “he”) and providing a less biased alternative (e.g., “he or she or they”)
Step 2A, Prong 2
Regarding the “receiving, at the processor, third audio data associated with the at least one speaker that is participating in the meeting” limitation, such additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
Regarding the “by the processor using the trained model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a processor using a trained model. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor using a trained model). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, a notification to the at least one speaker that relates to the unconscious bias words” limitation, such limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).
Regarding the “by the processor” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a processor. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Step 2B
Regarding the “receiving, at the processor, third audio data associated with the at least one speaker that is participating in the meeting” limitation, as discussed above, the additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Regarding the “by the processor using the trained model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, a notification to the at least one speaker that relates to the unconscious bias words” limitation, this limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)). The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”)
Regarding the “by the processor” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding Claim 6
Step 2A, Prong 1
analyzing, ... the audio data to detect a pitch of one of the participants, text spoken by the one of the participants, and a time duration of a spoken sentence (under the broadest reasonable interpretation, this is a mental process (see MPEP 2106.04 III) that a human can perform mentally using a physical aid (e.g., pencil and paper), and is met by a human assistant listening to audio data in-real time to detect a pitch (e.g., either high or low) of the participants, the text actually spoken by the participants, and the time duration of a specific sentence)
Step 2A, Prong 2
Regarding the “by the processor using the trained model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a processor using a trained model. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor using a trained model). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, a notification associated with at least one event” limitation, such limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).
Regarding the “for which the pitch of the one of the participants is higher than a predefined threshold level based on the analysis of the first audio data” limitation, this limitation merely describes a particular criteria when a notification is triggered, and therefore such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not integrate a judicial exception into a practical application.
Step 2B
Regarding the “by the processor using the trained model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, a notification associated with at least one event” limitation, this limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)). The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”)
Regarding the “for which the pitch of the one of the participants is higher than a predefined threshold level based on the analysis of the first audio data” limitation, such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which does not amount to significantly more than the judicial exception. MPEP 2106.05(h).
Regarding Claim 7
Step 2A, Prong 1
analyzing, ... the video data to detect a gesture and an emotion of one of the participants, text spoken by the one of the participants, and a time duration of a spoken sentence (under the broadest reasonable interpretation, this is a mental process (see MPEP 2106.04 III) that a human can perform mentally using a physical aid (e.g., pencil and paper), and is met by a human assistant watching the video data to detect gestures and emotions by the participants, the text actually spoken by the participants (using lip reading, for example), and the time duration of a specific sentence (also using lip reading))
Step 2A, Prong 2
Regarding the “receiving, at the processor, video data associated with the participants in the meeting” limitation, such additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process (see MPEP 2106.05(g)).
Regarding the “by the processor using the trained model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception. In particular, the claim only recites the additional element of a processor using a trained model. This additional element is recited at a high-level of generality and amounts to no more than mere instructions to apply the exception using a generic computer component (a processor using a trained model). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, a notification associated with at least one event” limitation, such limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)).
Regarding the “for which the emotion of the one of the participants is identified as being an unfavorable emotion” limitation, this limitation merely describes a particular criteria when a notification is triggered, and therefore such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use. As explained by the Supreme Court, a claim directed to a judicial exception cannot be made eligible "simply by having the applicant acquiesce to limiting the reach of the patent for the formula to a particular technological use." Diamond v. Diehr, 450 U.S. 175, 192 n.14, 209 USPQ 1, 10 n. 14 (1981). Thus, limitations that amount to merely indicating a field of use or technological environment in which to apply a judicial exception do not integrate a judicial exception into a practical application.
Step 2B
Regarding the “receiving, at the processor, video data associated with the participants in the meeting” limitation, as discussed above, the additional element of a data gathering step is recited at a high level of generality and amounts to extra-solution activity of receiving data, i.e. pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
Regarding the “by the processor using the trained model” limitation, such limitation is recited at a high-level of generality and amounts to no more than adding the words “apply it” (or an equivalent) with the judicial exception, because the limitation merely provides instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. Accordingly, this additional element does not add significantly more than the judicial exception. (See MPEP 2106.05(f)).
Regarding the “displaying, by the processor via a display, a notification associated with at least one event” limitation, this limitation amounts to extra solution activity because it is a mere nominal or tangential addition to the claim, amounting to mere data output (see MPEP 2106.05(g)). The courts have similarly found limitations directed to displaying a result, recited at a high level of generality, to be well-understood, routine, and conventional. See (MPEP 2106.05(d)(II), "presenting offers and gathering statistics.", “determining an estimated outcome and setting a price”)
Regarding the “for which the emotion of the one of the participants is identified as being an unfavorable emotion” limitation, such limitation amounts to no more than generally linking the use of a judicial exception to a particular technological environment or field of use as explained above, which does not amount to significantly more than the judicial exception. MPEP 2106.05(h).
Regarding Claim 8
Step 2A, Prong 1
Claim 8 recites a computing device that corresponds to the method of claim 1, and therefore the analysis under Step 2A, Prong 1 with respect to claim 1 also applies to this claim 8. While claim 8 recites additional generic computing components (“virtual assistant”, “processor”, “memory”, and “communication interface”), such additional generic computing components do not change the analysis under Step 2A, Prong 1.
Step 2A, Prong 2
Claim 8 recites a computing device that corresponds to the method of claim 1, and therefore the analysis under Step 2A, Prong 2 with respect to claim 1 also applies to this claim 8. While claim 8 recites additional generic computing components (“virtual assistant”, “processor”, “memory”, and “communication interface”), such additional generic computing components do not change the analysis under Step 2A, Prong 2.
Step 2B
Claim 8 recites a computing device that corresponds to the method of claim 1, and therefore the analysis under Step 2B with respect to claim 1 also applies to this claim 8. While claim 8 recites additional generic computing components (“virtual assistant”, “processor”, “memory”, and “communication interface”), such additional generic computing components do not change the analysis under Step 2B.
Claims 9-14 depend from claim 8 and correspond to the methods recited in claims 2-7, respectively, and are therefore rejected for the same reasons explained above with respect to claim 8 and claims 2-7, respectively.
Regarding Claim 15
Step 2A, Prong 1
Claim 15 recites a non-transitory computer readable storage medium that corresponds to the method of claim 1, and therefore the analysis under Step 2A, Prong 1 with respect to claim 1 also applies to this claim 15. While claim 15 recites additional generic computing components (“virtual assistant”, “processor”, and “non-transitory computer readable storage medium”), such additional generic computing components do not change the analysis under Step 2A, Prong 1.
Step 2A, Prong 2
Claim 15 recites a non-transitory computer readable storage medium that corresponds to the method of claim 1, and therefore the analysis under Step 2A, Prong 2 with respect to claim 1 also applies to this claim 15. While claim 15 recites additional generic computing components (“virtual assistant”, “processor”, and “non-transitory computer readable storage medium”), such additional generic computing components do not change the analysis under Step 2A, Prong 2.
Step 2B
Claim 15 recites a non-transitory computer readable storage medium that corresponds to the method of claim 1, and therefore the analysis under Step 2B with respect to claim 1 also applies to this claim 15. While claim 15 recites additional generic computing components (“virtual assistant”, “processor”, and “non-transitory computer readable storage medium”), such additional generic computing components do not change the analysis under Step 2B.
Claims 16-19 depend from claim 15 and correspond to the methods recited in claims 2-5, respectively, and are therefore rejected for the same reasons explained above with respect to claim 15 and claims 2-5, respectively.
Claim 20 depends from claim 15 and corresponds to the methods recited in claims 6-7 and is therefore rejected for the same reasons explained above with respect to claim 15 and claims 6-7.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 8-9, and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over US 20210097502 A1, hereinafter referenced as HILLELI, in view of US 20180241882 A1, hereinafter referenced as LEE.
Regarding Claim 1
HILLELI teaches:
A method for providing a virtual assistant to participants in a meeting, the method comprising: (HILLELI, para. 0019: “Turning now to scene 850 of FIG. 8, aspects of the example embodiment are depicted. In particular, this example embodiment is implemented using a virtual assistant, such as the Cortana® assistant by Microsoft® Corporation, operating in connection with a meeting or communications application, such as Microsoft Teams®. As shown in scene 850, computer display 820 depicts a representation of a virtual assistant 860, such as the Cortana virtual assistant. The virtual assistant 860 has automatically determined personalized action item for each meeting attendee, based on the meeting discussion and related contextual information, as described herein. Virtual assistant 860 then states, at statement 865, “I have sent each meeting attendee their personal action items.” As described herein, each meeting participant (or any individual who is responsible for an action item) may be provided the action item(s) from the meeting for which they are responsible or action items that are relevant to them.”;
Examiner’s Note: As depicted in Fig. 8, HILLELI discloses a virtual assistant (such as Cortana) being provided to meeting attendees (corresponding to recited “participants in a meeting”) where the virtual assistant provides action items to relevant meeting participants)
receiving, at a processor, a set of data associated with the meeting; (HILLELI, para. 0043: “Continuing with FIG. 2, example system 200 includes a meeting monitor 250. Meeting monitor 250 is generally responsible for determining and/or detecting meeting features from online meetings and/or in-person meetings and making the meeting features available to the other components of the system 200. For example, such monitored activity can be meeting location (e.g., as determined by geo-location of user devices), topic of the meeting, invitees of the meeting, whether the meeting is recurring, related deadlines, projects, and the like.”;
HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like. In some embodiments, the meeting activity monitor 252 outputs transcripts or activity that happens during a meeting. For example, activity or content may be timestamped or otherwise correlated with meeting transcripts.”;
HILLELI, para. 0114: “Turning now to FIG. 5, which depicts a process 500 for determining and providing personalized action items from a meeting, in accordance within embodiments of this disclosure. Process 500 (and/or any of the functionality described herein may be performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (e.g., instructions run on a processor to perform hardware simulation), firmware, or a combination thereof.”;
HILLELI, para. 0115: “Per block 510, meeting content is determined (e.g., by the meeting monitor 250). For example, the meeting activity monitor 252 can use one or more sensors or other components to monitor chats, presented context, or portions of a transcript. The contextual information extractor/determiner 254 can determine the contextual information of an event, such as who is present or invited to a meeting, the topic of the meeting, location of the meeting, or other context or character sequences within a transcript or meeting content itself. Then the meeting content assembler 256 can generate an enriched meeting-activity timeline, such as tags or structured data that includes a timeline of each conversation and a timestamp indicating when the conversation started/stopped. ... In some embodiments, block 510 comprises monitoring a meeting to determine a set of meeting content. In particular, in one embodiment, a meeting monitor may receive indications of each user input in a chat or other online forum, which is indicative of a live meeting.”
HILLELI, para. 0139: “With reference to FIG. 7, computing device 700 includes ... memory 12, one or more processors 14, ....”;
Examiner’s Note: HILLELI discloses a process 500, carried out using a processor, where the process receives (using sensors) meeting content, such as chats and presented content, where such meeting content corresponds to the recited “set of data associated with the meeting”)
analyzing, by the processor using a trained model, the set of data to provide the virtual assistant to the participants in the meeting; (HILLELI, para. 0027: “Various embodiments improve these virtual assistants because they can parse a meeting transcript or audio input (e.g., in near real-time) to determine what input is an action item.”);
HILLELI, para. 0052: “The action item generator 260 identifies likely action items from event content. In some embodiments, the input includes the output of the meeting monitor 250 (e.g., user data and meeting-related data from sensors (microphones, video, user activity, and the like)), the user-data collection component 210, and from user profile 240 of users. In some embodiments, the output is a list of likely action items and related corresponding information, such as relevant files, who the action item is attributed to or who has to complete the action item, the date, and the like. In some embodiments, the output of the action item generator 260 is a structured data record (e.g., a database record) that includes various attributes, such as action item name, attribution (who has to complete the action item), action item category/type, related files, and/or content to be provided to remind the user to complete an action item.”
HILLELI, para. 0057: “In some embodiments, the action item candidate classifier 264 uses a machine learning model, such as a deep learning classification neural network (e.g., a Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), or Transformers). In certain embodiments, labels, categories, or target classifications can first be identified, such as “action item” or “not an action item.” These labels, categories, or target classifications may either be hard (e.g., membership of a class is a binary “yes” or “no”) or soft (e.g., there is a probability or likelihood attached to the labels). Alternatively or additionally, transfer learning may occur. Transfer learning is the concept of re-utilizing a pre-trained model for a new related problem. For example, confidence levels obtained to detect action items can be used to detect non-action items.”;
HILLELI, para. 0117: “Per block 530, candidate action items determined at block 520 are classified. For example, a CNN or other machine learning model may be used to classify whether the action item candidates are action items or are not action items, which may both be labels, for example, in a classification model. In this way action item candidates can be ruled out or actually become action items. The classification can be based on the meeting content or other contextual information in some embodiments, such as when a model learns contextual information that a user always indicates that a particular phrase is an action item. Accordingly, when the phrase is stated, it can be classified with high probability that it is an action item. Embodiments of step 530 may be performed as described in connection with action item candidate classifier 264 (FIG. 2).”;
Examiner’s Note: A trained machine learning model, such as a CNN or RNN, is used on the meeting data in order for the virtual assistant (Cortana) to provide action items to individual meeting participants, where the use of the virtual assistant to provide action items (or other assistance) is the provision of the virtual assistant to the meeting participants)
assisting, by the processor, the participants in the meeting based on the analysis of the set of data. (HILLELI, para. 0027: “Various embodiments improve these virtual assistants because they can parse a meeting transcript or audio input (e.g., in near real-time) to determine what input is an action item.”);
HILLELI, para. 0120: “ Embodiments of block 560 provide the action items assembled in step 550. Embodiments of block 560 may provide the action items intended to be performed by a specific person to that person. The action items may also include contextual information such as due dates, and/or other related context such as background information, explanatory information, supervisory or responsibility information about who or whom the action item is to be prepared for or who is supervising the performance of the action item, which may be determined or extracted during the meeting (or from previous related meetings). In this way, the action items may be personalized to that specific person.”;
HILLELI, para. 0121: “Some embodiments of block 560 may provide a personalized set of action items to the users responsible for completing and/or stating those action items via a communication message, such as email or within an application, such as a communications application, calendar application, task list/to-do application, or an online meeting application. Embodiments of block 560 may be performed as described in connection with action item assembler 266 (FIG. 2). Additional details of step 540 are described in connection to action item generator 260 in FIG. 2.”;
Examiner’s Note: HILLELI discloses providing individualized action items to each participant, corresponding to the recited “assisting ... the participants in the meeting based on the analysis of the set of data” limitation).
wherein the assisting of the participants in the meeting comprises: (HILLELI, para. 0027: “Various embodiments improve these virtual assistants because they can parse a meeting transcript or audio input (e.g., in near real-time) to determine what input is an action item.”);
HILLELI, para. 0120: “ Embodiments of block 560 provide the action items assembled in step 550. Embodiments of block 560 may provide the action items intended to be performed by a specific person to that person. The action items may also include contextual information such as due dates, and/or other related context such as background information, explanatory information, supervisory or responsibility information about who or whom the action item is to be prepared for or who is supervising the performance of the action item, which may be determined or extracted during the meeting (or from previous related meetings). In this way, the action items may be personalized to that specific person.”;
HILLELI, para. 0121: “Some embodiments of block 560 may provide a personalized set of action items to the users responsible for completing and/or stating those action items via a communication message, such as email or within an application, such as a communications application, calendar application, task list/to-do application, or an online meeting application. Embodiments of block 560 may be performed as described in connection with action item assembler 266 (FIG. 2). Additional details of step 540 are described in connection to action item generator 260 in FIG. 2.”;
Examiner’s Note: HILLELI discloses providing individualized action items to each participant, corresponding to the recited “assisting of the participants in the meeting” limitation).
receiving, at the processor, first audio data associated with the participants in the meeting; (HILLELI, para. 0027: “Various embodiments improve these virtual assistants because they can parse a meeting transcript or audio input (e.g., in near real-time) to determine what input is an action item.”
HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like. In some embodiments, the meeting activity monitor 252 outputs transcripts or activity that happens during a meeting. For example, activity or content may be timestamped or otherwise correlated with meeting transcripts.”;
HILLELI, para. 0072: “In some embodiments, the event content can alternatively or additionally include audio content of everything that was said during an event. In some embodiments, the natural language sequence normalizer 312 processes event content in near-real time (e.g., as each statement is stated during a meeting or shortly thereafter each statement is stated).”
Examiner’s Note: HILLELI discloses using a microphone to record meeting content, including statements said during a meeting by meeting participants)
However, HILLELI fails to explicitly teach:
analyzing, by the processor, the first audio data to detect a frequency of at least one of the participants, the frequency being numerically expressible; and
displaying, by the processor via a display, a notification associated with at least one event for which the frequency of the at least one of the participants is higher than a predefined numerical threshold level based on the analysis of the first audio data.
However, in a related field of endeavor (teleconferencing with video and audio data, see para. 0004), LEE teaches and makes obvious:
analyzing, by the processor, the first audio data to detect a frequency of at least one of the participants, the frequency being numerically expressible; and (LEE, para. 0085: “In some embodiments, a participant quality signal, and the corresponding metric, is the amount of time spoken by each speaker. This signal may be determined from the audio data of the teleconference by processing the audio data to determine the amount of time each user-participant speaks. In some embodiments, more talking by a user-participant indicates a more dominant user-participant.”
LEE, para. 0131 and Fig. 5: “For example, a summary user interface 500 may be displayed post-teleconference for review of the quality signals and metrics after the teleconference is over. The after-session feedback module 250 or 334 generates the elements (e.g., elements 502, 504, etc.) in the summary user interface 500 based on the signals and metrics for display in the summary user interface 500.”
PNG
media_image2.png
408
576
media_image2.png
Greyscale
Examiner’s Note: LEE discloses tracking the amount of time that a speaker has been speaking (corresponding to recited “frequency of at least one of the participants”), which is a measurement of frequency because it discloses how frequently a participant is speaking over a period of time, which is shown in Fig. 5 as 33% of the overall speaking time is by 1 person; the HILLELI-LEE combination now modifies the meting system of HILLELI to track the amount of time a speaker speaks, and to report such metric on an interface so that it is numerically expressible)
displaying, by the processor via a display, a notification associated with at least one event for which the frequency of the at least one of the participants is higher than a predefined numerical threshold level based on the analysis of the first audio data. (LEE, para. 0117: “In some embodiments, a prompt may be presented in one or more modalities, and the modalities used may depend on the situation and the context of the user-participant. For example, a prompt may be presented via audio rather than visually in the case that there is a disruption in the video stream or if presenting a visual prompt would interfere with the communication. Modalities may be visual (textual, iconic, graphical), audio (verbal, tones, sounds), vibro-tactile (buzz, pulses), or other modalities of actuators connected to the client device.”;
LEE, para. 0120: “For example, when a user-participant's speaking contribution exceeds a predefined threshold based on the number of other user-participants in the teleconference, a prompt that instructs the user to speak less and listen more may be presented to the user. A desired consequence of the change is that the user does not overly dominate the teleconference so that other users in the teleconference can contribute more.”;
Examiner’s Note: LEE discloses visually providing a prompt to instruct a user to speak less when their speaking contribution exceeds a predefined threshold; the HILLELI-LEE combination now modifies the meting system of HILLELI to track the amount of time a speaker speaks, and to prompt the user to stop speaking so as not to overly dominate the conversation)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of HILLELI and LEE as explained above. As disclosed by LEE, one of ordinary skill would have been motivated to do so in order to prevent a particular participant from dominating the meeting “so that other users in the teleconference can contribute more.” (para. 0120).
Regarding Claim 2
HILLELI and LEE disclose the method of claim 1 as explained above. HILLELI further teaches:
wherein the set of data comprises meeting details, participant details, discussion details, audio details, and video details. (HILLELI, para. 0043: “Continuing with FIG. 2, example system 200 includes a meeting monitor 250. Meeting monitor 250 is generally responsible for determining and/or detecting meeting features from online meetings and/or in-person meetings and making the meeting features available to the other components of the system 200. For example, such monitored activity can be meeting location (e.g., as determined by geo-location of user devices), topic of the meeting, invitees of the meeting, whether the meeting is recurring, related deadlines, projects, and the like.”;
HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like. In some embodiments, the meeting activity monitor 252 outputs transcripts or activity that happens during a meeting. For example, activity or content may be timestamped or otherwise correlated with meeting transcripts.”;
HILLELI, para. 0052: “The action item generator 260 identifies likely action items from event content. In some embodiments, the input includes the output of the meeting monitor 250 (e.g., user data and meeting-related data from sensors (microphones, video, user activity, and the like)), the user-data collection component 210, and from user profile 240 of users.”;
HILLELI, para. 0115: “Per block 510, meeting content is determined (e.g., by the meeting monitor 250). For example, the meeting activity monitor 252 can use one or more sensors or other components to monitor chats, presented context, or portions of a transcript. The contextual information extractor/determiner 254 can determine the contextual information of an event, such as who is present or invited to a meeting, the topic of the meeting, location of the meeting, or other context or character sequences within a transcript or meeting content itself. Then the meeting content assembler 256 can generate an enriched meeting-activity timeline, such as tags or structured data that includes a timeline of each conversation and a timestamp indicating when the conversation started/stopped. ... In some embodiments, block 510 comprises monitoring a meeting to determine a set of meeting content. In particular, in one embodiment, a meeting monitor may receive indications of each user input in a chat or other online forum, which is indicative of a live meeting.”
HILLELI, para. 0139: “With reference to FIG. 7, computing device 700 includes ... memory 12, one or more processors 14, ....”;
Examiner’s Note: HILLELI discloses that the meeting content collected by the meeting monitor 520/meeting activity monitor 522 includes the meeting location and topic (corresponding to recited “meeting details”), who is present and invited to the meeting (corresponding to recited “participant details”), projects and deadlines related to the meeting (corresponding to recited “discussion details”), and that such data is collected using microphones (to collect recited “audio details”) and video sensors (to collect recited “video details”))
Regarding Claim 8:
HILLELI teaches:
A computing device configured to implement an execution of a method for providing a virtual assistant to participants in a meeting, the computing device comprising: (HILLELI, para. 0019: “Turning now to scene 850 of FIG. 8, aspects of the example embodiment are depicted. In particular, this example embodiment is implemented using a virtual assistant, such as the Cortana® assistant by Microsoft® Corporation, operating in connection with a meeting or communications application, such as Microsoft Teams®. As shown in scene 850, computer display 820 depicts a representation of a virtual assistant 860, such as the Cortana virtual assistant. The virtual assistant 860 has automatically determined personalized action item for each meeting attendee, based on the meeting discussion and related contextual information, as described herein. Virtual assistant 860 then states, at statement 865, “I have sent each meeting attendee their personal action items.” As described herein, each meeting participant (or any individual who is responsible for an action item) may be provided the action item(s) from the meeting for which they are responsible or action items that are relevant to them.”;
Examiner’s Note: As depicted in Fig. 8, HILLELI discloses a virtual assistant (such as Cortana) being provided to meeting attendees (corresponding to recited “participants in a meeting”) where the virtual assistant provides action items to relevant meeting participants)
a processor; (Fig. 7, processor 14, see para. 0139)
a memory; and (Fig. 7, memory 12, see para. 0139)
a communication interface coupled to each of the processor and the memory(Fig. 7, I/0 ports 18 and I/O components 20, see para. 0139)
The remaining limitations correspond to limitations in claim 1, and therefore this claim 8 is rejected for the same reasons previously explained with respect to claim 1
Claim 9 depends from claim 8 and claims a computing device that corresponds to the method of claim 2, and is therefore rejected for the same reasons explained with respect to claims 2 and 8.
Regarding Claim 15:
HILLELI teaches:
A non-transitory computer readable storage medium (Fig. 7, memory 12, see para. 0139) storing instructions for providing a virtual assistant to participants in a meeting, the instructions comprising executable code which, when executed by a processor, (Fig. 7, processor 14, see para. 0139) causes the processor to: (HILLELI, para. 0019: “Turning now to scene 850 of FIG. 8, aspects of the example embodiment are depicted. In particular, this example embodiment is implemented using a virtual assistant, such as the Cortana® assistant by Microsoft® Corporation, operating in connection with a meeting or communications application, such as Microsoft Teams®. As shown in scene 850, computer display 820 depicts a representation of a virtual assistant 860, such as the Cortana virtual assistant. The virtual assistant 860 has automatically determined personalized action item for each meeting attendee, based on the meeting discussion and related contextual information, as described herein. Virtual assistant 860 then states, at statement 865, “I have sent each meeting attendee their personal action items.” As described herein, each meeting participant (or any individual who is responsible for an action item) may be provided the action item(s) from the meeting for which they are responsible or action items that are relevant to them.”;
Examiner’s Note: As depicted in Fig. 8, HILLELI discloses a virtual assistant (such as Cortana) being provided to meeting attendees (corresponding to recited “participants in a meeting”) where the virtual assistant provides action items to relevant meeting participants)
The remaining limitations correspond to limitations in claim 1, and therefore this claim 15 is rejected for the same reasons previously explained with respect to claim 1
Claim 16 depends from claim 15 and claims a non-transitory computer readable storage medium that corresponds to the method of claim 2, and is therefore rejected for the same reasons explained with respect to claims 2 and 15.
Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over HILLELI in view of LEE and further in view of US 20230005495 A1, hereinafter referenced as KUKDE.
Regarding Claim 3
HILLELI and LEE disclose the method of claim 1 as explained above. HILLELI further teaches:
receiving, at the processor, second audio data associated with a speaker that is participating in the meeting; (HILLELI, para. 0060: “The action item attributor 266 can map content character sequences to the identity of the speaker or person responsible for completing the action item in any suitable manner. For example, in some embodiments, a voice-recognition component can be used on audio content input to map phonemes of the input to a library of known or predetermined phonemes of particular users (e.g., as found within the participant behavior history 346). Accordingly, a voice-recognition component can record each user's voice in the user profile 240 (e.g., each user that can potentially attend a meeting). In this way, a prediction can be made that a particular parsed character sequence was said by a particular user.”;
HILLELI, para. 0072: “The natural language sequence normalizer 312 parses or tokenizes event content and/or other external information (e.g., information received by the user-data collection component 310) and re-structures the information. In some embodiments, the event content is or includes documents or transcripts of the order and content of everything that was said in an event written in natural language. For example, the event content can be a written transcript of everything that was said during an entire duration of a meeting. In some embodiments, the event content can alternatively or additionally include audio content of everything that was said during an event.”;
Examiner’s Note: HILLELI discloses that audio content from all meeting participants is collected, and further that a voice-recognition component can associate the audio with particular meeting participants)
transcribing, by the processor, the received second audio data into raw textual data; (HILLELI, para. 0124: “In some embodiments, an entire transcript or copy of the event can be detected when the event is completed. For example, a device may record an entire meeting event and an administrator can upload the meeting event on a computing device, which causes a natural language text to be outputted (e.g., via speech-to-text), at which point the transcript can be detected.”;
HILLELI, para. 0125: “ In some embodiments, block 604 includes tokenizing, via natural language processing, a transcript of the meeting event to clean or otherwise provide insight for prediction action items (e.g., by the natural language sequence normalizer 312).”;
HILLELI discloses a transcribing audio (using speech-to-text) to output a transcript that can be cleaned “corresponding to recited “raw textual data”)
processing, by the processor, the raw textual data into processed textual data, wherein the processed textual data comprises a name of the speaker, corresponding text, a start time at which the speaker begins speaking, and an end time at which the speaker stops speaking; and (HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like. In some embodiments, the meeting activity monitor 252 outputs transcripts or activity that happens during a meeting. For example, activity or content may be timestamped or otherwise correlated with meeting transcripts.”;
HILLELI, para. 0115: “Per block 510, meeting content is determined (e.g., by the meeting monitor 250). For example, the meeting activity monitor 252 can use one or more sensors or other components to monitor chats, presented context, or portions of a transcript. The contextual information extractor/determiner 254 can determine the contextual information of an event, such as who is present or invited to a meeting, the topic of the meeting, location of the meeting, or other context or character sequences within a transcript or meeting content itself. Then the meeting content assembler 256 can generate an enriched meeting-activity timeline, such as tags or structured data that includes a timeline of each conversation and a timestamp indicating when the conversation started/stopped. In certain embodiments of block 510, content is determined from a meeting, which may be determined by monitoring the meeting receive information about the meeting, such as transcript information, or other information about the meeting such as the attendees, meeting topic, and/or related contextual information.”;
HILLELI, para. 0128: “In some embodiments, block 604 includes processing character sequences of the transcript through a word embedding vector model or semantic model. ... In this way, language is cleaned or otherwise added, removed, or replaced based on semantic context. For example, using the example transcript, “Set a goal for [telephone rings, pause] set a goal for X amount in profit next quarter” can be modified as “PERSON will try to sell X amount in profit next quarter.” This restructuring can be completed in various embodiments for all text (or only the identified action item candidates) of the example transcript described above to clean the text.”;
Examiner’s Note: HILLELI discloses cleaning the transcript (where the cleaned transcript corresponds to the recited “processed textual data”) and where such transcript includes participant names (corresponding to recited “name of the speaker” and “corresponding text”) and that the transcript is timestamped (corresponding to recited “a start time at which the speaker begins speaking, and an end time at which the speaker stops speaking”))
However, HILLELI and LEE fail to explicitly teach:
displaying, by the processor via a display, the processed textual data based on a requirement of the participants.
However, in a related field of endeavor (virtual meetings, see para. 0001), KUKDE teaches:
displaying, by the processor via a display, the processed textual data based on a requirement of the participants. (KUKDE, para. 0073: “ In some embodiments, upon receiving a request sent from client device 112A, 112B, the transcripts or other speaker information is retrieved from database 136 and sent to the client device 112A, 112B for subsequent display.”;
Examiner’s Note: the HILLELI-LEE-KUKDE combination now modifies the meeting system of HILLELI so that if a participant, using their own device, requests to view the transcript as in KUKDE (corresponding to recited “based on a requirement of the participants”), the cleaned-up transcript of HILLELI is displayed on the device of a user having a display as in KUKDE)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the virtual meeting system of HILLELI, with the teachings of LEE and KUKDE as explained above. As disclosed by KUKDE, one of ordinary skill would have been motivated to do so because the present invention provides “solutions... for generating and displaying information that users otherwise would not have had.” (para. 0036). One of ordinary skill would further be motivated to do so, for example, in order to provide a deaf meeting participant with a meeting transcript so that they can visually follow the meeting.
Claim 10 depends from claim 8 and claims a computing device that corresponds to the method of claim 3, and is therefore rejected for the same reasons explained with respect to claims 3 and 8.
Claim 17 depends from claim 15 and claims a non-transitory computer readable storage medium that corresponds to the method of claim 3, and is therefore rejected for the same reasons explained with respect to claims 3 and 15.
Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over HILLELI in view of LEE and further in view of US 20160350722 A1, hereinafter referenced as WALKER.
Regarding Claim 4
HILLELI and LEE disclose the method of claim 1 as explained above. HILLELI further teaches:
receiving, by the processor, geographic details ... of the participants; (HILLELI, para. 0084: “Event location 374 corresponds to the geographical location or type of event. For example, event location 374 can indicate the physical address of the meeting or building/room identifier of the meeting location. The event location 374 can alternatively or additionally indicate that the meeting is a virtual or online meeting or in-person meeting.”)
However, HILLELI and LEE fail to explicitly teach:
receiving, ... time zone details of the participants;
analyzing, by the processor, the geographic details and the time zone details of the participants to determine a variation among the geographic details and the time zone details of the participants in the meeting; and
displaying, by the processor via a display, a notification that relates to the variation among the geographic details and the time zone details of the participants in the meeting.
where the notification is displayed to each of the participants in the meeting
However, in a related field of endeavor (event scheduling for meeting management, see paras. 0001, 0325), WALKER discloses:
receiving, ... time zone details of the participants; (WALKER, para. 0099: “For academic schedules, the system is able to track the location of the users fixed and mobile devices 170. Based upon the location and time zone attributes of an event 210 along with other system and publicly available or licensed travel time data, the rule engine 120 and recommendation engine 182 can incorporate travel time and adjust user notifications to dynamically process user notifications 331, to ensure that all participants are able to attend on time at an agreed location or at the same time. In a similar manner, a phone conference can be scheduled for participants in different time zones, ensuring that each participant's user notifications 331 are adjusted for time of day across the varying time zones.”;
Examiner’s Note: WALKER teaches that for a virtual meeting (a telephone conference), the individual participant’s time zone information is accounted for; the HILLELI-WALKER combination now utilizes the time zone information of WALKER when determining scheduling of the virtual meetings of HILLELI)
analyzing, by the processor, the geographic details and the time zone details of the participants to determine a variation among the geographic details and the time zone details of the participants in the meeting; and (WALKER, para. 0099: “For academic schedules, the system is able to track the location of the users fixed and mobile devices 170. Based upon the location and time zone attributes of an event 210 along with other system and publicly available or licensed travel time data, the rule engine 120 and recommendation engine 182 can incorporate travel time and adjust user notifications to dynamically process user notifications 331, to ensure that all participants are able to attend on time at an agreed location or at the same time. In a similar manner, a phone conference can be scheduled for participants in different time zones, ensuring that each participant's user notifications 331 are adjusted for time of day across the varying time zones.”;
Examiner’s Note: WALKER teaches that for a virtual meeting (a telephone conference), the individual participant’s time zone information and location information are accounted for; the HILLELI-WALKER combination now utilizes the location and time zone information of WALKER when determining scheduling of the virtual meetings of HILLELI and differences in time zones are determined and reflected in the different user notifications which are adjusted based on time zone differences)
displaying, by the processor via a display, a notification that relates to the variation among the geographic details and the time zone details of the participants in the meeting. (WALKER, para. 0042: “Optionally, the method 600 may comprise modifying the timing of event notifications to cater for travel time based on a current location of a user, and time zone and location of events.”;
WALKER, para. 0054: “For mobile devices, iCalendar notifications 414 are also provided with links to create user session and provide user responses 412 via standard electronic calendar interfaces.”;
WALKER, para. 0099: “For academic schedules, the system is able to track the location of the users fixed and mobile devices 170. Based upon the location and time zone attributes of an event 210 along with other system and publicly available or licensed travel time data, the rule engine 120 and recommendation engine 182 can incorporate travel time and adjust user notifications to dynamically process user notifications 331, to ensure that all participants are able to attend on time at an agreed location or at the same time. In a similar manner, a phone conference can be scheduled for participants in different time zones, ensuring that each participant's user notifications 331 are adjusted for time of day across the varying time zones.”;
Examiner’s Note: WALKER teaches that for a virtual meeting (a telephone conference), the individual participant’s time zone information and location information are accounted for; the HILLELI-WALKER combination now utilizes the location and time zone information of WALKER when determining scheduling of the virtual meetings of HILLELI and differences in time zones are determined and displayed in the different user notifications which are adjusted based on time zone differences)
where the notification is displayed to each of the participants in the meeting (WALKER, para. 0099: “For academic schedules, the system is able to track the location of the users fixed and mobile devices 170. Based upon the location and time zone attributes of an event 210 along with other system and publicly available or licensed travel time data, the rule engine 120 and recommendation engine 182 can incorporate travel time and adjust user notifications to dynamically process user notifications 331, to ensure that all participants are able to attend on time at an agreed location or at the same time. In a similar manner, a phone conference can be scheduled for participants in different time zones, ensuring that each participant's user notifications 331 are adjusted for time of day across the varying time zones.”;
WALKER, para. 0128: “ Event relationships are illustrated by displaying configuration fields that define start time, end time, records, and rules that define interactions between events 305, 315, 325 and the types of alert and smart-alert notifications 513.”
Examiner’s Note: WALKER teaches that for a virtual meeting (a telephone conference), that notificationsa re displayed for “each participant”; the HILLELI-LEE-WALKER combination now utilizes the location and time zone information of WALKER when determining scheduling of the virtual meetings of HILLELI and differences in time zones are determined and displayed in the different user notifications which are adjusted based on time zone differences, where such notifications are displayed for “each participant”)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the virtual meeting system of HILLELI, with the teachings of LEE and WALKER as explained above. As disclosed by WALKER, one of ordinary skill would have been motivated to do so in order to adjust user notifications to “ensure that all participants are able to attend on time ... or at the same time.” (para. 0099).
Claim 11 depends from claim 8 and claims a computing device that corresponds to the method of claim 4, and is therefore rejected for the same reasons explained with respect to claims 4 and 8.
Claim 18 depends from claim 15 and claims a non-transitory computer readable storage medium that corresponds to the method of claim 4, and is therefore rejected for the same reasons explained with respect to claims 4 and 15.
Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over HILLELI in view of LEE and further in view of US 20180341637 A1, hereinafter referenced as GAUR.
Regarding Claim 5
HILLELI and LEE disclose the method of claim 1 as explained above. HILLELI further teaches:
receiving, at the processor, third audio data associated with at least one speaker that is participating in the meeting; (HILLELI, para. 0060: “The action item attributor 266 can map content character sequences to the identity of the speaker or person responsible for completing the action item in any suitable manner. For example, in some embodiments, a voice-recognition component can be used on audio content input to map phonemes of the input to a library of known or predetermined phonemes of particular users (e.g., as found within the participant behavior history 346). Accordingly, a voice-recognition component can record each user's voice in the user profile 240 (e.g., each user that can potentially attend a meeting). In this way, a prediction can be made that a particular parsed character sequence was said by a particular user.”;
HILLELI, para. 0072: “The natural language sequence normalizer 312 parses or tokenizes event content and/or other external information (e.g., information received by the user-data collection component 310) and re-structures the information. In some embodiments, the event content is or includes documents or transcripts of the order and content of everything that was said in an event written in natural language. For example, the event content can be a written transcript of everything that was said during an entire duration of a meeting. In some embodiments, the event content can alternatively or additionally include audio content of everything that was said during an event.”;
Examiner’s Note: HILLELI discloses that audio content from all meeting participants is collected, and further that a voice-recognition component can associate the audio with particular meeting participants)
analyzing, by the processor using the trained model, the third audio data in real-time (HILLELI, para. 0027: Various embodiments improve these virtual assistants because they can parse a meeting transcript or audio input (e.g., in near real-time) to determine what input is an action item.”
HILLELI, para. 0085: “ In some embodiments, the names or other identifiers of participants of an event are determined automatically or in near-real-time as users speak (e.g., based on voice recognition algorithms)”;
HILLELI, para. 0100: “Alternatively or additionally, the layer 404 ingests portions of the transcript 402 at different time instances (e.g., in near-real-time as each utterance or participant speech occurs).”;
HILLELI, para. 0124: “For example, a device may record an entire meeting event and an administrator can upload the meeting event on a computing device, which causes a natural language text to be outputted (e.g., via speech-to-text), at which point the transcript can be detected.”;
Examiner’s Note: HILLELI discloses analyzing audio input from a meeting, where such analysis includes near-real-time analysis by voice recognition algorithms and a neural network model (see para. 0100))
displaying, by the processor via a display, a notification to the at least one speaker ... (HILLELI, para. 0064: “In some embodiments, presentation component 220 generates user interface features associated with the clarification and/or feedback request. Such features can include interface elements (such as graphics buttons, sliders, menus, audio prompts, alerts, alarms, vibrations, pop-up windows, notification-bar or status-bar items, in-app notifications, or other similar features for interfacing with a user), queries, and prompts.”)
However, HILLELI and LEE fail to explicitly teach:
... to detect unconscious bias words used by the at least one speaker;
... that relates to the unconscious bias words; and
suggesting, by the processor, a replacement of the unconscious bias words to the at least one speaker.
However, in a related field of endeavor (a user interface for modifying a user’s communications, see para. 0009), GAUR teaches:
... to detect unconscious bias words used by the at least one speaker; (GAUR, para. 0008: “Research indicates that all people have an unconscious bias influenced by background, cultural environment, and/or personal experiences. This unconscious bias tends to be reflected in a user's writing and other activities without the user even realizing it.”;
GAUR, para. 0009: “Stated another way, the present concepts can be manifest as an unconscious bias detection service that can be applied to text from any source.”;
Examiner’s Note: the HILLELI-LEE-GAUR combination now modifies the virtual meeting system of HILLELI to transcribe the audio using near-real-time speech-to-text technologies (as in HILLELI) and to detect unconscious bias in the resulting text using the unconscious bias detection service of GAUR as applied to the transcribed text)
... that relates to the unconscious bias words; and (GAUR, para. 0012: “FIG. 1C shows a subsequent screenshot 100C with unconscious bias detection employed responsive to the user selection of FIG. 1B. Screenshot 100C shows instances of potential unconscious bias words (e.g., unconscious bias candidates) highlighted in bold at 114(1) and 114(2). In this case, the highlighted potential unconscious bias words include the word “he” at 114(1) and the word “manpower” at 114(2). Screenshot 100C also includes a listing of the detected potential unconscious bias words at 116 and a suggested alternative language listing at 118.”;
Examiner’s Note: the HILLELI-LEE-GAUR combination now modifies the virtual meeting system of HILLELI to transcribe the audio using near-real-time speech-to-text technologies (as in HILLELI) and to detect unconscious bias in the resulting text using the unconscious bias detection service of GAUR as applied to the transcribed text, and then to send the user a notification as depicted in Fig. 1C (see section 116 and 118) (and as in HILLELI para. 0064) regarding the detection of unconscious bias words)
suggesting, by the processor, a replacement of the unconscious bias words to the at least one speaker. (GAUR, para. 0012: “FIG. 1C shows a subsequent screenshot 100C with unconscious bias detection employed responsive to the user selection of FIG. 1B. Screenshot 100C shows instances of potential unconscious bias words (e.g., unconscious bias candidates) highlighted in bold at 114(1) and 114(2). In this case, the highlighted potential unconscious bias words include the word “he” at 114(1) and the word “manpower” at 114(2). Screenshot 100C also includes a listing of the detected potential unconscious bias words at 116 and a suggested alternative language listing at 118.”;
Examiner’s Note: the HILLELI-LEE-GAUR combination now modifies the virtual meeting system of HILLELI to transcribe the audio using near-real-time speech-to-text technologies (as in HILLELI) and to detect unconscious bias in the resulting text using the unconscious bias detection service of GAUR as applied to the transcribed text, and then to send the user a notification (as in HILLELI) regarding the detection of unconscious bias words and suggested alternative language as shown in GAUR, Fig. 1C)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the virtual meeting system of HILLELI, with the teachings of LEE and GAUR as explained above. As disclosed by GAUR, one of ordinary skill would have been motivated to do so in order to bring unconscious bias to the user’s attention so that the user can avoid offending others. (see paras. 0007-0008).
Claim 12 depends from claim 8 and claims a computing device that corresponds to the method of claim 5, and is therefore rejected for the same reasons explained with respect to claims 5 and 8.
Claim 19 depends from claim 15 and claims a non-transitory computer readable storage medium that corresponds to the method of claim 5, and is therefore rejected for the same reasons explained with respect to claims 5 and 15.
Claims 6-7, 13-14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over HILLELI in view of LEE and further in view of US 20130039483 A1, hereinafter referenced as WOLFELD.
Regarding Claim 6
HILLELI and LEE disclose the method of claim 1 as explained above. HILLELI further teaches:
analyzing, by the processor using the trained model, the first audio data to detect ... text spoken by the one of the participants, and a time duration of a spoken sentence; and (HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like. In some embodiments, the meeting activity monitor 252 outputs transcripts or activity that happens during a meeting. For example, activity or content may be timestamped or otherwise correlated with meeting transcripts.”;
HILLELI, para. 0115: “Per block 510, meeting content is determined (e.g., by the meeting monitor 250). For example, the meeting activity monitor 252 can use one or more sensors or other components to monitor chats, presented context, or portions of a transcript. The contextual information extractor/determiner 254 can determine the contextual information of an event, such as who is present or invited to a meeting, the topic of the meeting, location of the meeting, or other context or character sequences within a transcript or meeting content itself. Then the meeting content assembler 256 can generate an enriched meeting-activity timeline, such as tags or structured data that includes a timeline of each conversation and a timestamp indicating when the conversation started/stopped. In certain embodiments of block 510, content is determined from a meeting, which may be determined by monitoring the meeting receive information about the meeting, such as transcript information, or other information about the meeting such as the attendees, meeting topic, and/or related contextual information.”;
Examiner’s Note: HILLELI discloses creating a transcript, identifying when each speaker speaks and the duration of that speaker’s speech (which can be a single sentence))
displaying, by the processor via a display, a notification associated with at least one event ... (HILLELI, para. 0064: “In some embodiments, presentation component 220 generates user interface features associated with the clarification and/or feedback request. Such features can include interface elements (such as graphics buttons, sliders, menus, audio prompts, alerts, alarms, vibrations, pop-up windows, notification-bar or status-bar items, in-app notifications, or other similar features for interfacing with a user), queries, and prompts.”)
However, HILLELI and LEE fail to explicitly teach
to detect a pitch of one of the participants
for which the pitch of the one of the participants is higher than a predefined threshold level based on the analysis of the first audio data.
However, in a related field of endeavor (voice communications in a call center environment, see para. 0001), WOLFELD teaches:
analyzing, by the processor using the trained model, the first audio data to detect a pitch of one of the participants (WOLFELD, para. 0017: “For example, the communication session detection applications 26 can include any one or more detection software features including, without limitation: ... (2) voice volume and pitch detection software to detect when any person associated with a communication session may have a change in emotion such as becoming upset or angry (e.g., when volume and/or pitch levels increase above one or more threshold values, where such threshold value(s) can be determined based upon historical audio data associated with the communication session and/or any other communication sessions)”;
Examiner’s Note: the HILLELI-LEE-WOLFELD combination now modifies the virtual meeting system of HILLELI to analyze the pitch of each speaker as in WOLFELD)
displaying, by the processor via a display, a notification associated with at least one event for which the pitch of the one of the participants is higher than a predefined threshold level based on the analysis of the first audio data. (WOLFELD, para. 0004: “FIG. 1 is a schematic block diagram of an example call center communication system that facilitates multiple communications between call agents and customers simultaneously and further facilitates automatic monitoring of communications as well as notification to a supervisor of a scoring of problematic communications.”;
WOLFELD, para. 0017: “For example, the communication session detection applications 26 can include any one or more detection software features including, without limitation: ... (2) voice volume and pitch detection software to detect when any person associated with a communication session may have a change in emotion such as becoming upset or angry (e.g., when volume and/or pitch levels increase above one or more threshold values, where such threshold value(s) can be determined based upon historical audio data associated with the communication session and/or any other communication sessions)”;
WOLFELD, para. 0019: “Thus, the communication session detection applications 26 provide an automatic indication of whether anyone (i.e., call agent and/or customer) associated with a particular communication session may be angry, upset or agitated which in turn provides an indication of a potentially problematic communication session.”
WOLFELD, para. 0022: “For example, it may be automatically determined that a first on-going communication session qualifies as a potentially problematic call based upon indicators provided by two or more communication session detection applications 26 (e.g., based upon a threshold frequency being established for identified problematic words/phrases occurring within the communication session, based upon a volume/pitch level being greater than a threshold level for the communication session, and/or based upon any other indicators determined by other communication session detection applications 26), whereas a second on-going communication session either does not qualify as a potentially problematic call (e.g., no indicators as determined by the communication session detection applications 26) or qualifies as a potentially problematic call but with fewer indicators or with indicators having smaller indicator values (e.g., a volume/pitch level or value of the second on-going communication session is less than the volume/pitch level or value of the first on-going communication session) as provided by one or more communication session detection applications 26 in comparison to the first on-going communication session.”
Examiner’s Note: the HILLELI-LEE-WOLFELD combination now modifies the virtual meeting system of HILLELI to analyze the pitch of each speaker as in WOLFELD, and if such pitch is above a threshold as in WOLFELD, flagging the instance and issuing a notification as WOLFELD (because a pitch level about a threshold indicates an angry customer, which is a “problematic communication” that creates a notification to the supervisor) and HILLELI (see para. 0064)).
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the virtual meeting system of HILLELI, with the teachings of LEE and WOLFELD as explained above. As disclosed by WOLFELD, one of ordinary skill would have been motivated to do so in order to identify a “potential problem associated with the corresponding communication session.” (para. 0009). For example, the pitch detection can determine if a speaker is “becoming upset or angry.” (para. 0017).
Regarding Claim 7
HILLELI and LEE disclose the method of claim 1 as explained above. HILLELI further teaches:
receiving, at the processor, video data associated with the participants in the meeting; (HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like.”;
Examiner’s Note: HILLELI discloses using video data (including audiovisual data) from a meeting)
analyzing, by the processor using the trained model, the video data to detect ... text spoken by the one of the participants, and a time duration of a spoken sentence; and (HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like. In some embodiments, the meeting activity monitor 252 outputs transcripts or activity that happens during a meeting. For example, activity or content may be timestamped or otherwise correlated with meeting transcripts.”;
HILLELI, para. 0115: “Per block 510, meeting content is determined (e.g., by the meeting monitor 250). For example, the meeting activity monitor 252 can use one or more sensors or other components to monitor chats, presented context, or portions of a transcript. The contextual information extractor/determiner 254 can determine the contextual information of an event, such as who is present or invited to a meeting, the topic of the meeting, location of the meeting, or other context or character sequences within a transcript or meeting content itself. Then the meeting content assembler 256 can generate an enriched meeting-activity timeline, such as tags or structured data that includes a timeline of each conversation and a timestamp indicating when the conversation started/stopped. In certain embodiments of block 510, content is determined from a meeting, which may be determined by monitoring the meeting receive information about the meeting, such as transcript information, or other information about the meeting such as the attendees, meeting topic, and/or related contextual information.”;
Examiner’s Note: the examiner notes that the broadest reasonable interpretation of “video data” includes audiovisual data because a camera can include both audio and video feeds; the HILLELI discloses creating a transcript using audiovisual data, identifying when each speaker speaks and the duration of that speaker’s speech (which can be a single sentence) which is created from the audio portion of the audiovisual data)
displaying, by the processor via a display, a notification (HILLELI, para. 0064: “In some embodiments, presentation component 220 generates user interface features associated with the clarification and/or feedback request. Such features can include interface elements (such as graphics buttons, sliders, menus, audio prompts, alerts, alarms, vibrations, pop-up windows, notification-bar or status-bar items, in-app notifications, or other similar features for interfacing with a user), queries, and prompts.”)
However, HILLELI and LEE fail to explicitly teach
to detect a gesture and an emotion of one of the participants,
associated with at least one event for which the emotion of the one of the participants is identified as being an unfavorable emotion.
However, in a related field of endeavor (voice communications in a call center environment, see para. 0001), WOLFELD teaches:
analyzing, by the processor using the trained model, the video data to detect a gesture and an emotion of one of the participants, (WOLFELD, para. 0017: “For example, the communication session detection applications 26 can include any one or more detection software features including, without limitation: ... (3) video detection software that can identify a human face associated with the customer and/or call agent and an identifiable change in facial expressions of the detected face (e.g., smile detection, frown detection and/or other facial expression detection) or color/tone of the detected face (e.g., a detection of a flush face of the call agent or the customer) that may occur during the communication session; (4) video detection software that can identify a frequency of movement or spatial displacement of the head and/or other body parts (e.g., arms or hands) of the call agent and/or the customer during the communication session that can be interpreted as agitated gestures indicative of an upset, angry or irritated person”; the HILLELI-LEE-WOLFELD combination now modifies the virtual meeting system of HILLELI to detect facial expressions and gestures indicating a person’s emotion (such as being upset, angry, or irritated))
displaying, by the processor via a display, a notification associated with at least one event for which the emotion of the one of the participants is identified as being an unfavorable emotion. (WOLFELD, para. 0004: “FIG. 1 is a schematic block diagram of an example call center communication system that facilitates multiple communications between call agents and customers simultaneously and further facilitates automatic monitoring of communications as well as notification to a supervisor of a scoring of problematic communications.”;
WOLFELD, para. 0017: “For example, the communication session detection applications 26 can include any one or more detection software features including, without limitation: ... (3) video detection software that can identify a human face associated with the customer and/or call agent and an identifiable change in facial expressions of the detected face (e.g., smile detection, frown detection and/or other facial expression detection) or color/tone of the detected face (e.g., a detection of a flush face of the call agent or the customer) that may occur during the communication session; (4) video detection software that can identify a frequency of movement or spatial displacement of the head and/or other body parts (e.g., arms or hands) of the call agent and/or the customer during the communication session that can be interpreted as agitated gestures indicative of an upset, angry or irritated person”;
WOLFELD, para. 0019: “Thus, the communication session detection applications 26 provide an automatic indication of whether anyone (i.e., call agent and/or customer) associated with a particular communication session may be angry, upset or agitated which in turn provides an indication of a potentially problematic communication session.”
Examiner’s Note: the HILLELI-LEE-WOLFELD combination now modifies the virtual meeting system of HILLELI to detect facial expressions and gestures indicating a person’s emotion (such as being upset, angry, or irritated), and if a person is identified as being upset, angry, or irritated (corresponding to recited “unfavorable emotion”) issuing a notification as in both WOLFELD (because a gesture indicates an angry customer, which is a “problematic communication” that creates a notification to the supervisor) and HILLELI (see para. 0064)).
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the virtual meeting system of HILLELI, with the teachings of LEE and WOLFELD as explained above. As disclosed by WOLFELD, one of ordinary skill would have been motivated to do so in order to identify a “potential problem associated with the corresponding communication session.” (para. 0009). For example, video detection software can detect if someone is “upset, angry, or irritated.” (para. 0017).
Claim 13 depends from claim 8 and claims a computing device that corresponds to the method of claim 6, and is therefore rejected for the same reasons explained with respect to claims 6 and 8.
Claim 14 depends from claim 8 and claims a computing device that corresponds to the method of claim 7, and is therefore rejected for the same reasons explained with respect to claims 7 and 8.
Regarding Claim 20
HILLELI discloses the non-transitory computer readable storage medium of claim 15 as explained above. HILLELI further teaches:
receiving video data associated with the participants in the meeting; (HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like.”;
(HILLELI, para. 0060: “The action item attributor 266 can map content character sequences to the identity of the speaker or person responsible for completing the action item in any suitable manner. For example, in some embodiments, a voice-recognition component can be used on audio content input to map phonemes of the input to a library of known or predetermined phonemes of particular users (e.g., as found within the participant behavior history 346). Accordingly, a voice-recognition component can record each user's voice in the user profile 240 (e.g., each user that can potentially attend a meeting). In this way, a prediction can be made that a particular parsed character sequence was said by a particular user.”;
HILLELI, para. 0072: “The natural language sequence normalizer 312 parses or tokenizes event content and/or other external information (e.g., information received by the user-data collection component 310) and re-structures the information. In some embodiments, the event content is or includes documents or transcripts of the order and content of everything that was said in an event written in natural language. For example, the event content can be a written transcript of everything that was said during an entire duration of a meeting. In some embodiments, the event content can alternatively or additionally include audio content of everything that was said during an event.”;
Examiner’s Note: HILLELI discloses that video and audio content from all meeting participants is collected, and further that a voice-recognition component can associate the audio with particular meeting participants)
analyzing the first audio data and the video data using the trained model to detect ... text spoken by the one of the participants, and a time duration of a spoken sentence; and (HILLELI, para. 0045: “The meeting activity monitor 252 monitors user activity via one or more sensors, (e.g., microphones, video), devices, chats, presented content, and the like. In some embodiments, the meeting activity monitor 252 outputs transcripts or activity that happens during a meeting. For example, activity or content may be timestamped or otherwise correlated with meeting transcripts.”;
HILLELI, para. 0115: “Per block 510, meeting content is determined (e.g., by the meeting monitor 250). For example, the meeting activity monitor 252 can use one or more sensors or other components to monitor chats, presented context, or portions of a transcript. The contextual information extractor/determiner 254 can determine the contextual information of an event, such as who is present or invited to a meeting, the topic of the meeting, location of the meeting, or other context or character sequences within a transcript or meeting content itself. Then the meeting content assembler 256 can generate an enriched meeting-activity timeline, such as tags or structured data that includes a timeline of each conversation and a timestamp indicating when the conversation started/stopped. In certain embodiments of block 510, content is determined from a meeting, which may be determined by monitoring the meeting receive information about the meeting, such as transcript information, or other information about the meeting such as the attendees, meeting topic, and/or related contextual information.”;
Examiner’s Note: HILLELI discloses creating a transcript, identifying when each speaker speaks and the duration of that speaker’s speech (which can be a single sentence))
displaying a notification associated with at least one event (HILLELI, para. 0064: “In some embodiments, presentation component 220 generates user interface features associated with the clarification and/or feedback request. Such features can include interface elements (such as graphics buttons, sliders, menus, audio prompts, alerts, alarms, vibrations, pop-up windows, notification-bar or status-bar items, in-app notifications, or other similar features for interfacing with a user), queries, and prompts.”)
However, HILLELI and LEE fail to explicitly teach
receiving the first audio data and the video data associated with the participants in the meeting to detect a pitch of one of the participants, a gesture of the one of the participants, an emotion of the one of the participants,
displaying a notification associated with at least one event for which the pitch of the one of the participants is higher than a predefined threshold level.
However, in a related field of endeavor (voice communications in a call center environment, see para. 0001), WOLFELD teaches:
receiving the first audio data and the video data associated with the participants in the meeting to detect a pitch of one of the participants, a gesture of the one of the participants, an emotion of the one of the participants, (WOLFELD, para. 0017: “For example, the communication session detection applications 26 can include any one or more detection software features including, without limitation: ... (2) voice volume and pitch detection software to detect when any person associated with a communication session may have a change in emotion such as becoming upset or angry (e.g., when volume and/or pitch levels increase above one or more threshold values, where such threshold value(s) can be determined based upon historical audio data associated with the communication session and/or any other communication sessions); (3) video detection software that can identify a human face associated with the customer and/or call agent and an identifiable change in facial expressions of the detected face (e.g., smile detection, frown detection and/or other facial expression detection) or color/tone of the detected face (e.g., a detection of a flush face of the call agent or the customer) that may occur during the communication session; (4) video detection software that can identify a frequency of movement or spatial displacement of the head and/or other body parts (e.g., arms or hands) of the call agent and/or the customer during the communication session that can be interpreted as agitated gestures indicative of an upset, angry or irritated person”; the HILLELI-WOLFELD combination now modifies the virtual meeting system of HILLELI to analyze speaker pitch and to analyze facial expressions and gestures indicating a person’s emotion (such as being upset, angry, or irritated), and if a person is identified as being upset, angry, or irritated (corresponding to recited “unfavorable emotion”) issuing a notification as in both WOLFELD and HILLELI).
Examiner’s Note: the HILLELI-WOLFELD combination now modifies the virtual meeting system of HILLELI to analyze the pitch of each speaker as in WOLFELD)
displaying a notification associated with at least one event for which the pitch of the one of the participants is higher than a predefined threshold level. (WOLFELD, para. 0004: “FIG. 1 is a schematic block diagram of an example call center communication system that facilitates multiple communications between call agents and customers simultaneously and further facilitates automatic monitoring of communications as well as notification to a supervisor of a scoring of problematic communications.”;
WOLFELD, para. 0017: “For example, the communication session detection applications 26 can include any one or more detection software features including, without limitation: ... (2) voice volume and pitch detection software to detect when any person associated with a communication session may have a change in emotion such as becoming upset or angry (e.g., when volume and/or pitch levels increase above one or more threshold values, where such threshold value(s) can be determined based upon historical audio data associated with the communication session and/or any other communication sessions)”;
WOLFELD, para. 0019: “Thus, the communication session detection applications 26 provide an automatic indication of whether anyone (i.e., call agent and/or customer) associated with a particular communication session may be angry, upset or agitated which in turn provides an indication of a potentially problematic communication session.”
Examiner’s Note: the HILLELI-WOLFELD combination now modifies the virtual meeting system of HILLELI to analyze the pitch of each speaker as in WOLFELD, and if such pitch is above a threshold as in WOLFELD, flagging the instance and issuing a notification as in both WOLFELD (because a pitch above a threshold indicates an angry customer, which is a “problematic communication” that creates a notification to the supervisor)and HILLELI (see para. 0064)).
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the virtual meeting system of HILLELI, with the teachings of LEE and WOLFELD as explained above. As disclosed by WOLFELD, one of ordinary skill would have been motivated to do so in order to identify a “potential problem associated with the corresponding communication session.” (para. 0009). For example, the pitch detection can determine if a speaker is “becoming upset or angry.” (para. 0017).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20140164501 A1 (Herger). “At tracking step 108, an amount of participation by the first identity is tracked. By way of example, if the shared media session comprises a teleconference, the amount the first identity speaks during the teleconference is tracked. After the tracking step 108 is completed, the method continues to displaying step 110. At displaying step 110, the amount of participation by the first identity is displayed on an electronic calendar for the shared media session to the plurality of participants. After the displaying step 110 is completed, the method continues to translating step 112.” (emphasis added).
US 20160142674 A1 (Travis). “For each teleconference, teleconference server 250 can allocate resources such as memory, input/output slots, processing, and security. The teleconference server will associate an identifier with each attendee in a teleconference, and provide that identifier to each attendee. As will be discussed in more detail below, the teleconference server will identify and track those attendees who are active participants in the teleconference. The teleconference server can also rank attendees as attendees, active participants, and current presenters, based upon a level of participation in the teleconference. For example, if an attendee begins to speak, and the time or frequency of speaking crosses a threshold, then the attendee can move from an attendee status to an active participant. Further, should the attendee continue to talk or talk and also provide gesture input (e.g., through mouse movement), that attendee can be further promoted to current presenter status should they cross a higher threshold.” (emphasis added).
US 20190147367 A1 (Bellamy). “For example, meeting software 114 can determine if one of the meeting participants is becoming a detractor to the progress in the online meeting as depicted in FIG. 8A. For example, in analyzing the audio stream data of the online meeting, meeting software 114 determines that the participation level (e.g., the number of times the user of computer 304 speaks) has become more frequent over a period of time, and repeatedly directs comments to the user of computer 306.” (para. 0059).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 12:00 pm - 8:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL C. LEE/Examiner, Art Unit 2128