DETAILED ACTION
This communication is a Final Office Action rejection on the merits. Claims 1, 3-8, and 10-19 are currently pending and have been addressed below.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 01/27/2026 (related to the 101 Rejection) have been fully considered but they are not persuasive.
Applicant states, on pages 2-3, that the claims are directed to a computer-implemented technical solution that uses real-time machine-learning analysis to automatically control how live interactions are handled in a multichannel contact-center system, rather than merely identifying or reporting information. Applicant respectfully submits that the claims already includes a system action, i.e., forwarding the interaction during runtime, which is a machine-executed control signal, not mere notification. The claims do not recite a mental process or a method of organizing human activity. Instead, they recite a specific, machine-executed pipeline for computing an Interaction Friction Score (IFS) in real time using trained NLP models, vector embeddings, timing normalization, and sequence-weighted aggregation across utterances. These operations are performed on streaming interaction data across multiple channels and cannot be practically performed by the human mind.
Examiner respectfully disagrees with Applicant. In this case, claim 1 recites a social activity since the system level-control action may include to provide information to a person when the interaction score exceeds a threshold (see Applicant’s specification, Paragraphs 0123-0124, pointing to users such as supervisors the problematic segments). This is similar to passing a note to a person (see MPEP 2106.04(a)(2), social activity). Also, the limitations of “receiving a transcript” and “calculating a score by calculating a relative offset between sentencen and sentencen-1” are still considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical calculations.” If a claim limitation, under its broadest reasonable interpretation, covers “managing interactions between people” and/or “mathematical calculations”, then the claim recites an abstract idea.
Applicant further states, on pages 3-7, that consistent with Example 47 of the USPTO 2024 AI Guidance, the claims use the computed result of the machine-learning analysis to control system operation, not merely to identify a condition or recommend human action. This is precisely the distinction drawn in Example 47 between ineligible claims that only identify information and eligible claims that apply computed results to automatically control or alter system behavior. To clarify how the claimed forwarding operates as an automated system control action, when the calculated IFS exceeds the IFT, the system emits a machine-generated control signal that is consumed by one or more runtime components of the contact-center platform while the interaction is active. By way of example, and not limitation, such control actions may include automatically transferring the live interaction to a specialist skill queue without disconnecting the customer, programmatically initiating a supervisor join or whisper session, temporarily gating or pausing automated agent-assist or bot outputs and injecting a clarification prompt, or dynamically raising processing priority or scaling resources for the IFS microservice instance handling the interaction. In each case, the computed IFS is used directly to control system operation during the live session, consistent with Example 47's distinction between merely identifying a condition and automatically altering system behavior. The claims further recite enabling a user to intervene in real time following the system-initiated forwarding. This human involvement is secondary to, and enabled by, the automated system action, and does not negate the fact that the claimed invention applies the computed IFS to control system operation. When viewed as a whole, the claims apply machine-learning analysis to automatically control live interaction handling in a contact-center system, thereby improving computerized contact-center operations. The claims therefore integrate any alleged abstract idea into a practical application under Step 2A and are patent-eligible under 35 U.S.C. §101. Withdrawal of the §101 rejection is respectfully requested.
Examiner respectfully disagrees with Applicant. The main functions of the additional elements recited in claim 1 are merely used to: collect data (e.g. a transcript and interaction metadata of the interaction between the customer and the agent), analyze the data (e.g. calculate an interaction friction score by calculating a relative offset between sentencen and sentencen-1), and display certain results of the collection and analysis (e.g. send a notification to another user when the score is above a threshold). Those are functions that the courts have described as merely indicating a field of use or technological environment in which to apply a judicial exception (see MPEP 2106.05(h)).
Further, the additional element of a Natural Language Processing (NLP) turn-talking model is merely used to yield probability vector prediction of sentences (Paragraph 0089). Although claim 1 further discloses wherein the NLP turn-talking model is trained using sentence samples which are classified as neutral, claim 1 does not provide any specific details about how the NLP turn-talking model operates and/or how the model is improved over previous NLP turn-talking models. Therefore, the machine learning is recited at a high level of generality, which results in “apply it” (MPEP 2106.05(f)). See 2024 AI Guidance, Example 47.
Although claim 1 further recites the steps of “automatically initiating a system-level control action during the interaction based on the calculated IFS exceeding the IFT” and “having a user intervene the interaction in real-time,” Examiner notes that “automatically initiating a system-level control action” may include engaging a user in the interaction or pointing to users such as supervisors the problematic segments (see Applicant’s specification, Paragraphs 0123-0124). In this case, “providing problematic segments to a user in real-time” is considered a conventional computer function of “receiving and transmitting over a network” (MPEP 2106.05d). Lastly, claim 1 does not recite any specific details of how the user intervention is implemented to solve a particular problem (see MPEP 2106.05(f), how a solution to a problem is accomplished). Therefore, the last step does not integrate a judicial exception into a practical application or provide significantly more because this type of recitation is equivalent to the words “apply it.”
Claim 1 fails to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, adding unconventional steps that confine the claim to a particular useful application, and/or meaningful limitations beyond generally linking the use of an abstract idea to a particular environment. See 84 Fed. Reg. 55. Viewed individually or as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a patent eligible application of the abstract idea such that the claim amounts to significantly more than the abstract idea itself. The claim is not patent eligible.
Examiner recommends to follow example 47, claim 3 of the 2024 AI Guidance (if supported by the specification).
Claim 19 recites similar limitations and therefore is rejected for the same reasons as claim 1. Claims 3-8 and 10-18 are rejected because of their dependency from independent claim 1.
Applicant's arguments filed 05/03/2025 (related to the 103 Rejection) have been fully considered and are persuasive. The combination of Cattaneo and Ouchi does not teach or suggest measuring and accumulating friction throughout the interaction and assigning per-utterance friction scores that penalize consecutive negativity more than scattered negativity. Therefore, claim 1 has potential allowable subject matter. Claim 19 recites similar limitations and therefore have Potential Allowable Subject Matter for the same reasons as claim 1. Claims 3-8 and 10-18 have Potential Allowable Subject Matter because of their dependency from independent claim 1.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 3-8, and 10-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without reciting significantly more.
Independent Claim 1
Step One - First, pursuant to step 1 in the January 2019 Revised Patent Subject Matter Eligibility Guidance (“2019 PEG”) on 84 Fed. Reg. 53, the claim 1 is directed to a method which is a statutory category.
Step 2A, Prong One - Claim 1 recites: A method for calculating a level of friction within a customer and agent interaction, for quality improvement thereof, in a multichannel contact center, said method comprising: to operate, for each interaction between the customer and the agent, in each channel, an Interaction Friction Score (IFS) calculation, said IFS calculation comprising: retrieving a transcript and interaction metadata of the interaction between the customer and the agent, wherein the transcript includes 'N' sentences, calculating a friction score for sentence n comprising: :(i) receiving a sentencen, a person one-hot vector of sentencen, and interaction time-offset of the sentencen, a sentencen-1, a person one-hot vector of sentencen-1, and interaction time-offset of the sentencen-1; (ii) calculating a friction-score for the sentence by: providing the person one-hot vector of sentence and the person one-hot vector of sentencen-1 to a Natural Language Processing (NLP) Turn-Talking model to yield probability vector prediction of sentencen; (iii) calculating a vector-distance between a provided probability vector prediction of sentencen and the person one-hot vector of sentencen; (iv) providing the sentencen-1to a next-sentence-prediction model to yield a predicted- sentencen; (v) embedding the predicted-sentencen and sentencen using an NLP-embedding-engine module to yield a predicted-sentencen embedding and a sentencen embedding; (vi) calculating a distance between the yielded predicted- sentencen embedding and the yielded sentencen embedding; (vii) providing the interaction time-offset of the sentencen-1 and interaction time- offset of the sentencen to normalized-relative-offset-module to calculate a relative offset between sentencen and sentencen-1 by dividing a difference between sentencen and sentencen-1 by time-offset of sentencen; and(viii) calculating a weighted average of the vector-distance, distance and the relative offset, wherein the weighted average has a value between '0' and '1', and wherein the weighted average is the Sn, of the sentencen, and wherein the NLP Turn-Talking model and the next-sentence-prediction model are trained on sentence samples which are classified as 'neutral', the NLP Turn-Talking model is trained by using conversations to statistically model the relationship between conversational speakers and the next-sentence-prediction model is trained by using conversation to statistically model the relationship between conversational sentences, calculating an IFS of the interaction between the customer and the agent by formula I: (I) IFS =
PNG
media_image1.png
21
161
media_image1.png
Greyscale
PNG
media_image2.png
18
93
media_image2.png
Greyscale
whereby: S, is the friction score for sentence n, a is hyper parameter that represents a value that is attributed to a high level of friction; and forwarding each interaction between the customer and the agent having a calculated IFS above a calculated Interaction Friction Threshold (IFT) for an intervention, wherein the forwarding comprises automatically initiating an action during the interaction based on the calculated IFS exceeding the IFT, wherein the intervention comprising having a user intervene the interaction when the IFS is operated in real-time. These claim elements are considered to be abstract ideas because they are directed to “certain methods of organizing human activity” which include “managing interactions between people.” In this case, claim 1 recites a social activity since the intervention may include to provide information to a person (e.g., supervisor and/or agent) when the interaction score exceeds a threshold. Also, limitations of “receiving a transcript” and “calculating a score by calculating a relative offset between sentencen and sentencen-1” are still considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical calculations.” If a claim limitation, under its broadest reasonable interpretation, covers “managing interactions between people” or “mathematical calculations,” then the claim recites an abstract idea.
Step 2A Prong 2 - The judicial exception is not integrated into a practical application. Claim 1 includes additional elements: one or more processors; a friction datastore and a database of interactions transcripts and metadata; a memory to store the plurality of databases; an IFS calculation module; and a Natural Language Processing (NLP) turn-talking model; and a system-level control.
The processor is merely used to operate, for each interaction between the customer and the agent, in each channel, an Interaction Friction Score (IFS) calculation module (Paragraph 0006). The friction datastore and the database of interactions transcripts and metadata are merely used to retrieve a transcript and interaction metadata of the interaction between the customer and the agent (Paragraph 0008). The memory is merely used to store the plurality of databases (Paragraph 0007). The calculation module is merely used to calculate an IFS of the interaction between the customer and the agent (Paragraph 0008). The NLP turn-talking model is merely used to yield probability vector prediction of sentences (Paragraph 0089). The system-level control is merely used to automatically engage a user in the interaction or point to users such as supervisors the problematic segments (Paragraph 0123-0124). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f). These elements of “processor,” “databases,” “memory,” “calculation module,” “NLP turn-talking model,” and “system-level control” are recited at a high level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer element. Also, the database is considered “field of use” since the database is not improved, and that data is just placed there (MPEP 2106.05h). Accordingly, alone and in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Step 2B - The claim does not include additional elements that are sufficient to amount significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the claims describe how to generally “apply” the concept of calculating a score and displaying information to a user when the score exceeds a predefined threshold. The specification shows that the processor is merely used to operate, for each interaction between the customer and the agent, in each channel, an Interaction Friction Score (IFS) calculation module (Paragraph 0006). The friction datastore and the database of interactions transcripts and metadata are merely used to retrieve a transcript and interaction metadata of the interaction between the customer and the agent (Paragraph 0008). The memory is merely used to store the plurality of databases (Paragraph 0007). The calculation module is merely used to calculate an IFS of the interaction between the customer and the agent (Paragraph 0008). The NLP turn-talking model is merely used to yield probability vector prediction of sentences (Paragraph 0089). The system-level control is merely used to automatically engage a user in the interaction or point to users such as supervisors the problematic segments (Paragraph 0123-0124). Although claim 1 further discloses wherein the NLP turn-talking model is trained using sentence samples which are classified as neutral, claim 1 does not provide any specific details about how the NLP turn-talking model operates and/or how the model is improved over previous NLP turn-talking models (see 2024 AI Guidance, Example 47, claim 2). Also, “initiating an action such as providing problematic segments to a user in real-time” is considered a conventional computer function of “receiving and transmitting over a network” (MPEP 2106.05d). The database is also considered a conventional computer function of “receiving and transmitting over a network” and “storing information in a memory” (see MPEP 2106.05d). Lastly, claim 1 does not recite any specific details of how the user intervention is implemented to solve a particular problem (see MPEP 2106.05(f), how a solution to a problem is accomplished). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Independent claim 19 is directed to a system at step 1, which is a statutory category. Claim 13 recites similar limitations as claim 1 and is rejected for the same reasons at step 2a, prong one; step 2a, prong 2; and step 2b. Thus, the claim is ineligible.
Dependent claims 3-4 are not directed to any additional claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claims and addressed above - such as: wherein the distance is selected from: an Euclidean distance; or a Cosine distance; wherein the model provides historical sentences. These processes are similar to the abstract idea noted in the independent claim because they further the limitations of the independent claim which are directed to “mathematical concepts” which include “mathematical calculations.” In addition, no additional elements are integrated into the abstract idea. Therefore, the claims still recite an abstract idea that can be grouped into mathematical concepts.
Dependent claims 5-6 are directed to an additional element such as: an open-source artificial intelligence (e.g., a generative Pre-trained Transformer 2 (GPT2)). The GPT2 is merely used to implement the next sentence prediction model (Paragraph 0157). In this case, the claim does not provide any details about how the GPT2 operates, which results in “apply it” on a computer (MPEP 2106.05f) being applicable at both Step 2A, Prong 2 and Step 2B. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claims 7-8, and 13 are directed to additional descriptive limitations. This claim offers further descriptive limitations of the abstract idea mentioned above - such as: wherein the embedding of the predicted-sentences and the sentences is a learned vector representation for text; wherein the IFT is calculated for each channel based on historic interactions scores operated in this channel in a preconfigured period; and wherein the transcript is a transcript of a voice interaction or a text interaction. These limitations are still considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical calculations.” The additional limitation of the Natural Language Processing (NLP) is merely used to describe wherein the embedding of the predicted-sentences and the sentences is a learned vector representation for text (Paragraph 0158). In this case, the claim does not provide any details about how the NLP operates or how the NLP is learning representation for text, which results in “apply it” on a computer (MPEP 2106.05f) being applicable at both Step 2A, Prong 2 and Step 2B. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claims 10-12 are directed to an additional element such as: a user interface and a Quality Management QM platform. The user interface is merely used to: present segments of the interaction wherein each segment is presented with the calculated IFS; present a visualization of 'neutral' segments and 'negative' segments; wherein the application is a supervised dashboard (Paragraphs 0017-0018). The quality management platform is merely used to distribute the interaction for evaluation; and send the transcript of the interaction to an application (Paragraphs 0017-0018). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f) being applicable at both Step 2A, Prong 2 and Step 2B. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, instructions to display and/or arrange information in a graphical user interface may not be sufficient to show an improvement in computer-functionality (MPEP 2106.05a). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claim 14 is not directed to any additional claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claims and addressed above - such as: wherein the IFP calculation module is a microservice having one or more instances thereof that operate in parallel. These limitations are still considered to be abstract ideas because they are directed to “mathematical concepts” which include “mathematical calculations.” The additional step of “processing instances in parallel” is merely accelerating the process of processing tasks, but the increased speed comes solely from the capabilities of a general-purpose computer (see MPEP 2106.05(a)). Merely stating that the step is performed by a computer component results in “apply it” on a computer (MPEP 2106.05f) being applicable at both Step 2A, Prong 2 and Step 2B. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Dependent claim 15-18 not directed to any additional claim elements. Rather, these claims offer further descriptive limitations of elements found in the independent claims and addressed above - such as: wherein the friction datastore comprising two parts: a cache for ongoing interactions and a database to store classification of the interactions as 'neutral' or 'negative' and IFS score of interactions classified as 'negative'; calculating an agent-IFS based on a preconfigured interactions from the database in the friction datastore in a preconfigured time for each agent and wherein the agent-IFS of the agent is used to categorize the agent based on one or more agent-preconfigured-thresholds; wherein the agent categorization is used in a Workforce Management when generating shift-schedules; and wherein the agent categorization is selected from: (i) "low'; (ii) 'medium'; and (iii) 'high'. These limitations are still considered to be abstract ideas because they are directed to “certain methods of organizing human activity” which include “managing interactions between people.” The additional functions of the datastore are merely describing wherein the datastore comprises two parts. In this case, the first part is merely used to evaluate ongoing interactions and the second part is merely used to store classifications. However, using a datastore is considered “field of use” MPEP 2106.05h at Step 2A, Prong 2, since the database is not improved, and that data is just placed there. At Step 2B, this is conventional still, storing information in a memory (see MPEP 2106.05d). Thus, nothing in the claim adds significantly more to the abstract idea. The claim is ineligible.
Potential Allowable Subject Matter
The closest prior art is Cattaneo et al. (US 2024/0211960 A1), Cattaneo et al. discloses a computerized-method for calculating a level of friction within a customer and agent interaction, for quality improvement thereof, in a multichannel contact center, said computerized-method comprising (Paragraph 0003, Aspects of the present disclosure relates to automatically evaluating an agent-customer interaction utilizing aspects of machine learning to score the quality of the interaction; Paragraph 0019, For example, the agent-customer interaction may occur as a voice call where the customer and agent are talking to each other, as a textual record (e.g., a transcript, an instant messaging chat in a chat window, an email exchange, etc.), as a combination of both voice and text, as a video conference where the agent and customer may see and speak to each other, and/or by some other means where the agent and customer may interact with each other over the network 150; Paragraph 0021, The dimension scoring engine 124 analyzes the content based on one or more dimension metrics to generate a dimension score for each of one or more dimensions considered. A dimension is a perspective from which the content may be analyzed and the agent-customer interaction may be evaluated. As such, a dimension encapsulates an aspect of the agent-customer interaction that is integral to evaluating the quality of an agent's performance. There could be many different dimensions included in the evaluation. Examples of dimensions include fluency, relevance, appropriateness, informativeness, assurance, responsiveness, empathy, compliance, and/or sentiment among many others which could be selected based on business and/or industry preferences. Each dimension is further broken down into one or more metrics which can be used to generate the relevant dimension score. A metric is a sub-component of a dimension further relating to how the dimension relates to the quality of the interaction; Examiner interprets the “quality score” as the “interaction friction score” since the score is based on metrics that take into consideration friction between the customer and the agent, i.e., abrupt verbal expressions, sentiment, and/or one person interrupting the other);
in a computerized system comprising one or more processors, a friction datastore and a database of interactions transcripts and metadata (see Figure 1 and related text in Paragraph 0020, n some embodiments, in order to determine an interaction quality score one or more of the dimension scoring engine 124, the conversation score module 126, service score model 128, and interaction quality score module 130 may need to analyze a text-based transcript labeled for speaker. In some embodiments, the content processor 122 may utilize a speech recognition engine, one or more a large language models, natural language processing, and/or other machine learning methods to identify utterances from a piece of content. The transcript remains connected to the piece of content where the content is the source document for the transcript. Once generated by the content processor 122, the transcript may be stored on data store 108. A transcript is a text-based record based on an agent-customer interaction which includes the spoken and written utterances of both the customer and agent labeled for who made the utterance, including a time stamp for when the utterance occurred. An utterance is an expression of something in speech or text from an individual which may occur as a statement, sentence, or any other segment of speech or text of varying length. The utterance does not need to be a complete sentence or thought. Abrupt verbal expressions, one person interrupting the other, slang and other colloquialisms may be considered utterances and be included in the transcript);
and a memory to store the plurality of databases, said one or more processors are configured to operate, for each interaction between the customer and the agent, in each channel, an Interaction Friction Score (IFS) calculation module (Paragraph 0076, According to an embodiment of the present disclosure, a system is disclosed comprising at least one processor, and memory storing instructions that, when executed by the at least one processor, causes the system to perform a set of operations, the set of operations comprising receive a piece of content, wherein the piece of content is a record of an interaction between an agent and a customer, pre-process the piece of content into a labeled text-based transcript, receive one or more dimensions to utilize in determining an interaction quality score, wherein a dimension is comprised of one or more metrics, and determine an interaction quality score; Examiner interprets the “quality score” as the “interaction friction score” since the score is based on metrics that take into consideration friction between the customer and the agent, i.e., abrupt verbal expressions, sentiment, and/or one person interrupting the other), said IFS calculation module comprising: retrieving a transcript and interaction metadata of the interaction between the customer and the agent from the friction datastore and the database of interactions transcripts and metadata, wherein the transcript includes 'N' sentences (see Figure 1 and related text in Paragraph 0020, The transcript remains connected to the piece of content where the content is the source document for the transcript. Once generated by the content processor 122, the transcript may be stored on data store 108. A transcript is a text-based record based on an agent-customer interaction which includes the spoken and written utterances of both the customer and agent labeled for who made the utterance, including a time stamp for when the utterance occurred. An utterance is an expression of something in speech or text from an individual which may occur as a statement, sentence, or any other segment of speech or text of varying length. The utterance does not need to be a complete sentence or thought. Abrupt verbal expressions, one person interrupting the other, slang and other colloquialisms may be considered utterances and be included in the transcript), calculating a friction score for sentence n of each sentence by a sentence-score module, said sentence-score module comprising: (i) receiving a sentencen, a person … vector of sentencen, and interaction … offset of the sentencen, a sentencen-1, a person … vector-of sentencen-1, and interaction … offset of the sentence n-1; (ii) calculating a friction-score for the sentence by: providing the person … vector of sentencen and the person … vector of sentencen-1 to a Natural Language Processing (NLP) [prediction] model to yield … vector prediction of sentencen; … (Paragraph 0028, The relevance dimension analyzes if the topic or topics of the interaction are relevant to the issue or issues presented by the customer. The dimension scoring engine 124 may use one or more ADMs to analyze the interaction in a turn-by-turn basis and determine the relevancy of the agent's responses for the customer's queries or statements. Some metrics that may be considered using FED and GRADE. Of the 18 measurements generated from a FED analysis, the relevance score is used by the dimension scoring engine 124. The relevance score is generated using the difference in below positive and negative next sentence prediction likelihood. A positive next sentence utterance would be “that's what I meant” and “you have understood what I asked”. A negative next sentence utterance would be “that's not what I meant”, “that's not even related to what I said”, “don't change the topic”, and “why are you changing the topic”. GRADE may be used in an interaction where an agent query is followed by a customer response to evaluate the relevance of the agent's next response to the round of conversation. For each round of agent query followed by a customer response a score would be generated. Once all scores are generated for each round a combined value of all rounds is calculated. In some embodiments, an average value for all rounds is calculated; Paragraph 0056, At operation 302, the content may be converted into a text-based transcript by a content processor (e.g., content processor 122). The content processor may use one or more natural language processing tools and/or other machine learning methods to review the piece of content and convert it into a text-based transcript; Paragraph 0057, The content processor may use one or more of a speech recognition engine, one or more a large language models, natural language processing, and/or other machine learning methods to identify utterances from the transcript. In embodiments where the interaction includes more participants than a single agent and a single customer (e.g., multiple agents, a supervisor, etc.) then the content processor will identify utterances associated with each participant; Examiner notes that Cattaneo et al. predicts whether next statement is going to be positive or negative); calculating an IFS of the interaction between the customer and the agent by formula …: whereby: Sn is the friction score for sentence n (Paragraph 0021, The dimension scoring engine 124 analyzes the content based on one or more dimension metrics to generate a dimension score for each of one or more dimensions considered. A dimension is a perspective from which the content may be analyzed and the agent-customer interaction may be evaluated. As such, a dimension encapsulates an aspect of the agent-customer interaction that is integral to evaluating the quality of an agent's performance. There could be many different dimensions included in the evaluation. Examples of dimensions include fluency, relevance, appropriateness, informativeness, assurance, responsiveness, empathy, compliance, and/or sentiment among many others which could be selected based on business and/or industry preferences), … parameter that represents a value that is attributed to a high level of friction (Paragraph 0029, In some embodiments, a weighted penalty will be applied to the appropriateness dimension score if the agent uses one or more swear words during the interaction);
and forwarding each interaction between the customer and the agent having a calculated IFS above a calculated Interaction Friction Threshold (IFT) for an intervention, wherein the forwarding comprises automatically initiating a system-level control action during the interaction based on the calculated IFS exceeding the IFT (Paragraph 0016, Ultimately, the interaction quality score provides assurances to the business that interaction quality is high across all agents, and that any agents with instances of low-quality interaction will be identified and addressed through coaching and additional training; Paragraph 0029, The swear words metric is a measure of if the agent uttered any swear words during the interaction. In some embodiments, there could be a target value which is desired for the swear words, that may be set at a value greater than, equal to, and/or less than zero. In instances, a response could be scored off this with either a positive or negative result based on the comparison. For example, if the target value was set for zero swear words, a value above zero could be worse (e.g., one, two, or more swear words), and a value of zero could be better. In some embodiments, a weighted penalty will be applied to the appropriateness dimension score if the agent uses one or more swear words during the interaction; Paragraph 0053, At operation 210, the interaction quality score may be reported to a supervisor device (e.g., supervisor device 106) and/or agent device (e.g., agent device 104) by a scoring engine (e.g., scoring engine 120). The interaction quality score may be reported with information relating to how the score was generated as well as recommendations for additional training to improve agent performance, as required. Operation 210 is an optional step, as indicated by the dashed box), wherein the intervention comprising having a user intervene … when the IFS module is operated in real-time (Paragraph 0003, In some embodiments, one or more machine learning models are utilized to generate an interaction quality score which is a comprehensive evaluation of agent performance during the interaction; Paragraph 0016, Ultimately, the interaction quality score provides assurances to the business that interaction quality is high across all agents, and that any agents with instances of low-quality interaction will be identified and addressed through coaching and additional training; Paragraph 0039, In further embodiments, the interaction quality score is determined using a ranking system to rank the scores, and/or by using a threshold-based system to combine scores above a certain threshold. The scoring engine 120 may store one or more of the metric scores, dimension scores, conversation score, service score, and/or interaction quality score in the data store 108. The scoring engine 120 may generate a report detailing the dimensions utilized and scoring process performed to generate the interaction quality score. The report may be stored on data store 108 and/or sent to one or both of the supervisor device 106 and agent device 104 for review; Examiner interprets “intervene” as the “supervisor providing additional coaching and training”).
Although Cattaneo et al. discloses using a trained NLP for calculating a friction score for the sentence (e.g., negative/positive next sentence prediction likelihood), Cattaneo et al. does not specifically disclose wherein the NLP is an NLP Turn-Talking model.
Roddy (Roddy, M., Skantze, G. and Harte, N., 2018. Investigating speech features for continuous turn-taking prediction using lstms. arXiv preprint arXiv:1806.11461). Roddy discloses (i) receiving a sentencen, a person one-hot vector of sentencen, and interaction time-offset of the sentencen, a sentencen-1, a person one-hot vector of sentencen-1, and interaction time-offset of the sentencen-1; (ii) calculating a friction… for the sentence by: providing the person one-hot vector of sentencen and the person one-hot vector of sentencenn-1 to a Natural Language Processing (NLP) Turn-Talking model to yield probability vector prediction of sentencen; (iii) calculating a vector-distance between a provided probability vector prediction of sentencen and the person one-hot vector of sentencen; (iv) providing the sentencen-1to a next-sentence-prediction model to yield a predicted- sentencen; (v) embedding the predicted-sentencen and sentencen using an NLP-embedding-engine module to yield a predicted-sentencen embedding and a sentencen embedding; (vi) calculating a distance between the yielded predicted- sentencen embedding and the yielded sentencen embedding; (vii) providing the interaction time-offset of the sentencen-1 and interaction time- offset of the sentencen to normalized-relative-offset-module to calculate a relative offset between sentencen and sentencen-1 … (Figure 2, Decisions at overlap; Page 1, Abstract, Traditional end-of-turn models, where decisions are made at utterance end-points, are limited in their ability to model fast turn-switches and overlap. A more flexible approach is to model turn-taking in a continuous manner using RNNs, where the system predicts speech probability scores for discrete frames within a future window. The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection; Pages 1-2, 2.1. Model Overview, Fig. 1 shows how LSTM networks are applied to make continuous turn-taking predictions. The main objective is to predict the future speech activity annotations of one of the speakers in a dyadic conversation using input speech features from both speakers (S0, S1). At each frame (n) of size 50ms, speech features are extracted and used to predict the future speech activity of one of the speakers. The future speech activity is a 3 second window comprising of 60 frames of the binary annotations for frames n + 1 to n + 60. The output layer of the network uses a sigmoid activation to predict a probability score for the target speaker’s speech activity at each future frame. The network uses a single LSTM layer with a variable number of hidden nodes. The features are concatenated into a single feature vector, with the exception of the linguistic features which use an embedding layer that is discussed in section 3.2. Each conversation in our data is used twice, with the positions of S0 and S1 swapped. The networks were trained to minimize binary cross entropy (BCE) loss; Examiner interprets “overlapping of a conversation” as the “interaction time- offset of the sentencen to normalized-relative-offset-module to calculate a relative offset between sentencen and sentencen-1” since overlapping is merely the difference between predicted speech and actual speech).
Although the combination of Cattaneo et al. and Roddy discloses using a trained (NLP) Turn-Talking model for calculating a friction score for the sentence (e.g., friction may be interpreted as: a negative sentiment such as angry or upset; and/or overlapping in a conversation such as speaking for a longer time than predicted), the combination of Cattaneo et al. and Roddy does not specifically disclose wherein the friction score is calculation using Formula I:
PNG
media_image3.png
61
519
media_image3.png
Greyscale
.
Selfridge (Selfridge, E.O., 2013. Importance-Driven Turn-taking for Spoken Dialogue Systems (Doctoral dissertation, Oregon Health & Science University)). Selfridge discloses …, a … parameter that represents a value that is attributed to a high level of friction (Page 21, 2.6 Overlapping Turn-Taking Approaches, In contrast to smooth turn-taking methods that penalize overlap, some approaches have been created with overlapping behavior in mind. While these approaches are certainly more flexible than those previously described, results have been mixed. We speculate that this is due to a lack of underlying motivation, which is the primary subject of this thesis).
Haikin et al. (US 2024/0211701 A1). Haikin et al. discloses …,
α
is hyper parameter that represents a value that is attributed to a [score] (Paragraph 0098, A language model is often used during the iterative process, and for each beam that ends with a space, a language-model score may be computed for the n-gram represented by the beam. This score may be added to the beam score multiplied by some hyper-parameter alpha (i.e., a normalization factor). In some embodiments, a sentence length penalty hyper-parameter may also be added).
However, the cited art, alone or in any combination, fails to teach or suggest at least: a computerized-method for calculating a level of friction within a customer and agent interaction, for quality improvement thereof, in a multichannel contact center, said computerized-method comprising: in a computerized system comprising one or more processors, a friction datastore and a database of interactions transcripts and metadata; and a memory to store the plurality of databases, said one or more processors are configured to operate, for each interaction between the customer and the agent, in each channel, an Interaction Friction Score (IFS) calculation module, said IFS calculation module comprising: retrieving a transcript and interaction metadata of the interaction between the customer and the agent from the friction datastore and the database of interactions transcripts and metadata, wherein the transcript includes 'N' sentences, calculating a friction score for sentence n of each sentence by a sentence-score module, said sentence-score module comprising:(i) receiving a sentencen, a person one-hot vector of sentencen, and interaction time-offset of the sentencen, a sentencen-1, a person one-hot vector of sentencen-1, and interaction time-offset of the sentencen-1; (ii) calculating a friction-score for the sentence by: providing the person one-hot vector of sentence and the person one-hot vector of sentencen-1 to a Natural Language Processing (NLP) Turn-Talking model to yield probability vector prediction of sentencen; (iii) calculating a vector-distance between a provided probability vector prediction of sentencen and the person one-hot vector of sentencen; (iv) providing the sentencen-1to a next-sentence-prediction model to yield a predicted- sentencen; (v) embedding the predicted-sentencen and sentencen using an NLP-embedding-engine module to yield a predicted-sentencen embedding and a sentencen embedding; (vi) calculating a distance between the yielded predicted- sentencen embedding and the yielded sentencen embedding; (vii) providing the interaction time-offset of the sentencen-1 and interaction time- offset of the sentencen to normalized-relative-offset-module to calculate a relative offset between sentencen and sentencen-1 by dividing a difference between sentencen and sentencen-1 by time-offset of sentencen; and(viii) calculating a weighted average of the vector-distance, distance and the relative offset, wherein the weighted average has a value between '0' and '1', and wherein the weighted average is the Sn, of the sentencen, and wherein the NLP Turn-Talking model and the next-sentence-prediction model are trained on sentence samples which are classified as 'neutral', the NLP Turn- Talking model is trained by using conversations to statistically model the relationship between conversational speakers and the next-sentence-prediction model is trained by using conversation to statistically model the relationship between conversational sentences, calculating an IFS of the interaction between the customer and the agent by formula I:
PNG
media_image3.png
61
519
media_image3.png
Greyscale
whereby: Sn is the friction score for sentence n, a is hyper parameter that represents a value that is attributed to a high level of friction; and forwarding each interaction between the customer and the agent having a calculated IFS above a calculated Interaction Friction Threshold (IFT) for an intervention, wherein the forwarding comprises automatically initiating a system-level control action during the interaction based on the calculated IFS exceeding the IFT, wherein the intervention comprising having a user intervene the interaction when the IFS module is operated in real-time.
Nor does the remaining prior art of record remedy the deficiencies found in the cited prior art. Furthermore, neither the prior art, the nature of the problem, nor knowledge of a person having ordinary skill in the art provides for any predictable or reasonable rationale to combine prior art teachings.
Claim 19 recites similar limitations and therefore have Potential Allowable Subject Matter for the same reasons as claim 1. Claims 2-18 have Potential Allowable Subject Matter because of their dependency from independent claim 1.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Ouchi (Ouchi, H. and Tsuboi, Y., 2016, November. Addressee and response selection for multi-party conversation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pp. 2133-2143)) – discloses for capturing multi-party conversational streams,
we jointly encode who is speaking what at each time step. Each agent and its utterance are integrated into the hidden states of an RNN. We present two multi-party modeling frameworks: (i) static modeling and (ii) dynamic modeling,
both of which jointly utilize agent and utterance representation for encoding multiple-party conversation. What distinguishes the models is that while the agent representation in the static modeling framework is fixed, the one in the dynamic modeling framework changes along with each time step t in a
conversation (see at least Page 2135, Multi-Party Encoder Models).
Martin et al. (US 2021/0342554 A1) – discloses to generate metadata for each participant (925). The participant metadata may represent aggregate values calculated based on the participant identifiers of the scoring units. Participant metadata structure 1015 of FIG. 10 is an example of the kind of participant metadata the system may generate (see at least Paragraph 0087).
Jin et al. (US 2022/0399006 A1) – discloses a recurrent neural network, such as the long short term memory (LSTM) recurrent neural network or the gated recurrent units (GRU) recurrent neural network, is used to generate a more accurate translation of the customer's spoken words based on the recurrent neural network predicting the probability of the next word in the sequence based on the words already observed in the sequence. In one embodiment, the recurrent neural network uses a distributed representation where different words with similar meanings have similar representation and uses a large context of recently observed words when making predictions (predicting the next word in sequence) (see at least Paragraph 0114).
Aoki et al. (US 8,463,600 B2) – discloses quantitative measures corresponding to the measurement of a particular “feature.” For example, one feature used by the turn-taking analysis is the amount of overlapping speech produced by speakers A and B over a specified time window (see Column 16, lines 30-34).
Chowdhury (Chowdhury, S.A., Stepanov, E.A. and Riccardi, G., 2016, September. Predicting User Satisfaction from Turn-Taking in Spoken Conversations. In Interspeech (pp. 2910-2914)) - discloses how the organizational structure of a conversation, such as turn-taking, contributes to the prediction of user satisfaction along with other more common levels of conversation description such as lexical and prosodic (see Page 2910, I. Introduction).
Salammagari (CA 3112204 A1) – discloses a method and apparatus for facilitating training of agents is disclosed. Raw transcripts representing textual form of interactions between the agents and customers of the enterprise are transformed to generate transformed transcripts. An interaction summary is generated in relation to each transformed transcript. A plurality of intent- based interaction clusters are derived using the interaction summary generated in relation to each transformed transcript. The plurality of interactions are classified based on the plurality of intent-based interaction clusters and an interaction flow map is generated for each intent-based interaction cluster based on the interactions classified into the respective intent-based interaction cluster. The generated interaction flow map is capable of facilitating training of agents for interacting with the customers of the enterprise (see at least Abstract).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARJORIE PUJOLS-CRUZ whose telephone number is (571)272-4668. The examiner can normally be reached Mon-Thru 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patricia H Munson can be reached at (571)270-5396. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.P./Examiner, Art Unit 3624
/HAMZEH OBAID/Primary Examiner, Art Unit 3624