Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Status of the Application
The following is a Final Office Action.
In response to Examiner's communication of 4/9/2025, Applicant responded on 6/25/2025. Amended claims 1, 12, 17.
IDS filed on 7/29/2025 are acknowledged and considered by the Examiner.
Claims 1-3, 7, 8, 10-13, 15, 17, 18 and 20-27 are pending in this application and have been examined.
Response to Amendment
Applicant's amendments to claims 1, 12, 17 are sufficient to overcome the 35 USC 101 rejections set forth in the previous action.
Applicant's amendments to claims 1, 12, 17 are not sufficient to overcome the prior art rejections set forth in the previous action.
Response to Arguments – 35 USC § 101
Applicant’s arguments with respect to the rejections have been fully considered, but they are not persuasive. However, Applicant’s amendments to claim 1, 12, 17 recite additional elements, in combination, that integrates the abstract elements into a practical application under Step 2A Prong 2, therefore the 35 USC 101 rejection on 1-3, 7, 8, 10-13, 15, 17, 18 and 20-27 is hereby withdrawn.
The amendments present additional elements in high details of specificity, when viewed as a whole, integrates any recited abstract ideas into a practical application, under Step 2A Prong 2 (see 84 Fed. Reg. 54-55, available at https://www.govinfo.gov/content/pkg/FR-2019-01-07/pdf/2018-28282.pdf), because, when viewed as an ordered combination, the specifically recited combination of additional elements applies any alleged abstract in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception. These specifically recited additional elements sufficient to integrate any abstract idea into a practical application that are beyond generally linking the abstract idea to a particular technological environment include, at least, the following additional elements recited in claim 1, 12, 17:
receiving a set of raw data records associated with one or more user interactions of a specific user with one or more entities, wherein at least some of the one or more user interactions take place in a respective domain, the set of raw data records comprising textual information, audio information, and video information;
performing a first analysis of the set of raw user data records to analyze their content for at least one of sentiments, emotions and intent to generate a set of first analysis results, the first analysis comprising:
video analysis processing, the video analysis processing configured to perform at least one of facial recognition and facial sentiment analysis to build one or more facial expression recognition models associated with the specific user;
voice analysis processing, the voice analysis processing configured to use one or more Mel Frequency Cepstral Coefficients (MFCC) to extract one or more features from the audio information for analysis to detect emotions of one or more speakers whose voices are present in the audio information; and
text analysis processing, the text analysis processing configured to use natural language processing and one or more neural networks to analyze the textual information to determine at least one of sentiments, emotions, and intent in the textual information;
performing a second analysis on the set of raw data records, the second analysis configured to access the set of raw data records and to segment the set of raw data records to identify one or more attributes, including the respective domain, within the set of raw data records to generate a set of second analysis results;
training a convolution neural network (CNN) to generate a trained CNN model based on the set of first analysis results and the set of second analysis results, where the trained CNN model is configured to learn at least one of sentiments, emotions, and intent about the specific user, wherein training of the CNN is configured to build expertise in the trained CNN model that is specific to the specific user based on the respective domain;
performing, based on the set of first analysis results and on the set of second analysis results, a third analysis of the set of raw data records, the third analysis configured to use the trained CNN model to perform at least one of interpreting, summarizing and classifying, of information associated with the set of first analysis results and the set of second analysis results, to generate a set of processed data records that are stored in an encrypted, computer-readable format in a repository configured to accumulate a set of accumulated information about the specific user, wherein the repository is configured with one or more predetermined entities configured to accumulate information about the specific user, the predetermined entities comprising at least one of a general-purpose entity and a domain-specific entity, wherein the domain-specific entity accumulates at least one of the set of raw data records and the set of processed user data which are associated with the respective domain, wherein the set of raw data records, set of first analysis results, set of second analysis results , and set of processed data records all become part of the set of accumulated user information about the specific user;
performing a fourth analysis to determine, based on the set of accumulated user information and using the trained CNN model, at least one recommended action to assist the specific user;
generating an output signal on behalf of the specific user based on the fourth analysis, the output signal comprising information configured to automatically perform the at least one recommended action for the specific user without requiring intervention from the specific user; and
storing the at least one recommended action and information relating to the output signal to the specific user, in the encrypted, computer-readable format, as part of the set of accumulated user information about the specific user.
These additional elements describes a specific method to training a convolutional neural network, by analyzing video, audio, textual data using specific technologies such as MFCC, natural language processing and neural networks, to perform an automatic action for a user, the action is stored in an encrypted format on the computer for future aggregation with future convolutional neural network output results, the training of a specific convolutional neural network using specific technologies necessarily root in computing technologies, and these additional elements apply any abstract ideas in a meaningful way that is beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception, and thus, these additional elements integrate any recited abstract idea into a practical application under the second prong of Step 2A.
The 35 USC 101 rejection is hereby withdrawn.
Response to Arguments – Prior Art
Applicant’s arguments with respect to the rejections have been fully considered, but they are not persuasive. However, Applicant’s remarks are moot in light of new grounds of rejections necessitated by Applicant’s amendments.
Claim Rejections – 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
Determining the scope and contents of the prior art.
Ascertaining the differences between the prior art and the claims at issue.
Resolving the level of ordinary skill in the pertinent art.
Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 7, 8, 10-13, 15, 17, 18 and 20-27 is/are rejected under 35 U.S.C. 103 as being unpatentable by US Patent Publication to US20200211532A1 to Rule et al., (hereinafter referred to as “Rule”) in view of US Patent Publication to US20210272040A1 to Johnson et al., (hereinafter referred to as “Johnson”)
As per Claim 1, Rule teaches: (Currently Amended) A computer-implemented method, comprising:
receiving a set of raw data records associated with one or more user interactions of a specific user with one or more entities, wherein at least some of the one or more user interactions take place in a respective domain, the set of raw data records comprising textual information, audio information, and video information; (in at least [0032] a user with a smartphone may interact with the system via a software application. In some cases, the software application may transmit audio, video, text, and graphics data between the user and the system [0045] data handling module 110 may interact with a user via audio, video, graphical or text data using front end interface module 220. [0070] The data received by data handling module 110 and analyzed by data analysis module 130 may be submitted for processing to user profile module 120. User profile module 120 may store characteristics associated with the user and the user request in a computer readable storage medium. For example, user profile module 120 may transmit the data associated with the user to database 170 of server system 135. In some embodiments, user profile module 120 may check if a user profile exists for the user communicating with interaction system 105, and create the user profile if it does not exist. If the user profile is already established for the user communicating with interaction system 105, user profile module 120 may update the user profile. User profile module 120 may identify the user profile based on one or more meta-data characteristics associated with the user obtained from data handling module 110 communicating with user profile module 120. For example, user profile module 120 may compare the phone number of the user making the request with the phone numbers stored in database 170 to identify the user profile. Additionally, or alternatively, the user profile module may identify the user profile based on the user name, user identification numbers such as social security number, driver license, password, account number, and the like. [0071] User profile module 120 may be used to process and store data associated with the user request. In various embodiments, user profile module 120 may store stationary data 510 and a history of stationary data 512. Stationary data 510 may be meta-data for the user that does not change or slowly changes with time. For example, the stationary data may include personable identifiable data (e.g., social security number, driver license, user id, password, account number, user name, user phone number, user address, user IP address) as well as user age, accent, language, dialect, gender, personality type [0072] user profile module 120 may store variable data 520 and a history of variable data 522. Variable data 520 may be related to meta-data for the user that changes from one communication to another. Variable data 520 may include the type of request from the user, the location of the user, user device 103A-103C used for interaction with the system 105, the type of network 115 used for communication, the type of communication (e.g., audio, video, or text), the environment of the user during the communication with interaction system 105 [0075] the request type may include “lost or stolen credit card,” “opening checking account,” “checking account balance,” and/or the like. FIG. 5 shows that configuration of a VA by user profile module 120 may involve selecting from among a plurality of types of pre-configured VAs (for the same user) depending on the type of request associated with the user communication.)
performing a first analysis of the set of raw user data records to analyze their content for at least one of sentiments, emotions and intent to generate a set of first analysis results, the first analysis comprising: ([0045][0061][0069])
video analysis processing, the video analysis processing configured to perform at least one of facial recognition and facial sentiment analysis to build one or more facial expression recognition models associated with the specific user; (in at least [0066] audio/video/image module 424 may identify a gender of the user, an age of the user, a dialect of the user, an accent of the user speech, a tone of voice, an emotional content of the speech, or any other aspects that uniquely identify the user based on the audio video or image characteristics of the receiving data. In some embodiments, when the user is communicating with interaction system 105 using video data, audio/video/image module 424 may identify the characteristics of the user (e.g., age, gender, ethnicity, hand gestures, facial expressions, body movements, and the like) associated with the video streaming data. [0069] to process audio video and image data to extract characteristics associated with the user or with the user request. In some embodiments, audio/video/image module 424 may include neural networks, recurrent neural networks (RNN) or convolutional neural networks (CNN) to process the audio data to identify various characteristics of the user. In some embodiments, a specific computer model (CM) (e.g., recurrent neural network) may be trained to identify a specific feature of the audio, video or image data associated with the user request. For example, a first CM may be used to identify the age of the user, and a second CM may be used to identify the emotional content of the user. The first CM may, for example, take a sample of the audio data associated with the user request and output the most probable range for the age of the user, while the second CM may take a sample of the audio data associated with the user request and output a set of emotions that are most likely experienced by the user during the request. The set of emotions, for example, may be {“happy”, “content”, “patient”} or {“sad”, “impatient”, “aggressive”, “scared”}. In some embodiments, a CM may be trained to recognize human emotions using a combination of verbal expressions and facial expressions obtained from the video data associated with the user request. In some embodiments, a CM may be trained to recognize a dialect of a user. In various embodiments, multiple CMs may be employed by audio/video/image module 424 to obtain characteristics associated with the user request. [0111] FIG. 13 shows elements of a video call 1321 that can be analyzed by data analysis module 130. For example, data analysis module 130 may analyze expressions 1340A and 1340B. Expressions 1340A and 1340B may be associated with different parts of the face as shown in FIG. 13. Data analysis module 130 may analyze video and audio background of the video call. For example, data analysis module 130 may detect that the user is in an airport by analyzing the background (such as, for example, detecting an airplane 1342 in the background of the user). Data analysis module 130 may correlate the audio streaming data associated with user request 1310, the video data associated with the user video, and the visual response of the user to the answers provided by the VA of interaction system 105. The correlational data may be used to identify the satisfaction of the user during communication with interaction system 105.)
voice analysis processing, the voice analysis processing configured to use … to extract one or more features from the audio information for analysis to detect emotions of one or more speakers whose voices are present in the audio information; and (in at least [0066] audio/video/image module 424 may identify a gender of the user, an age of the user, a dialect of the user, an accent of the user speech, a tone of voice, an emotional content of the speech, or any other aspects that uniquely identify the user based on the audio video or image characteristics of the receiving data. [0069] to process audio video and image data to extract characteristics associated with the user or with the user request. In some embodiments, audio/video/image module 424 may include neural networks, recurrent neural networks (RNN) or convolutional neural networks (CNN) to process the audio data to identify various characteristics of the user. In some embodiments, a specific computer model (CM) (e.g., recurrent neural network) may be trained to identify a specific feature of the audio, video or image data associated with the user request. For example, a first CM may be used to identify the age of the user, and a second CM may be used to identify the emotional content of the user. The first CM may, for example, take a sample of the audio data associated with the user request and output the most probable range for the age of the user, while the second CM may take a sample of the audio data associated with the user request and output a set of emotions that are most likely experienced by the user during the request. The set of emotions, for example, may be {“happy”, “content”, “patient”} or {“sad”, “impatient”, “aggressive”, “scared”}. In some embodiments, a CM may be trained to recognize human emotions using a combination of verbal expressions and facial expressions obtained from the video data associated with the user request. In some embodiments, a CM may be trained to recognize a dialect of a user. In various embodiments, multiple CMs may be employed by audio/video/image module 424 to obtain characteristics associated with the user request.)
text analysis processing, the text analysis processing configured to use natural language processing and one or more neural networks to analyze the textual information to determine at least one of sentiments, emotions, and intent in the textual information; (in at least [0049] Transcription module 222 may process other user communication data. For example, transcription module 222 may obtain text data associated with the user communication and transmit the text data to data analysis module 130 via back end interface module 228 for analysis of text data. [0061] Comprehension module 422 may be used to analyze transcribed data associated with the user request. For example, the comprehension module may be used to analyze the type of request associated with the user communication. The system may, for example, analyze the user speech and determine the reason for the user's request. In some embodiments, the system may use natural language processing to determine the subject matter of the request. For example, the comprehension module may determine that the user request may be associated with the theft of a credit card, or that the user request is related to opening a bank account or completing a purchase transaction. The comprehension module may determine the user request by analyzing the key words found in transcribed text data. Additionally, or alternatively, the comprehension module may determine the user request by providing a user with questions, such as multiple-choice questions, using data handling system 110. In some embodiments, the comprehension system may attempt to determine the type of the user request based on natural language processing using various algorithms developed for natural language processing that may include regular expressions, artificial neural networks, n-gram language models, logic regression, vector semantics, part-of-speech tagging, recurrent neural networks, and/or the like. For example, the comprehension module may use key phrases to attempt to determine the type of the user request.)
performing a second analysis on the set of raw data records, the second analysis configured to access the set of raw data records and to segment the set of raw data records to identify one or more attributes, including the respective domain, within the set of raw data records to generate a set of second analysis results; (in at least [0046] front end interface module 220 may be configured to prompt the user to provide personable identifiable information such as a name of the user, an account number of the user, a telephone number of the user, a social security number of the user, a driver license of the user, or any other user-specific information. The user may provide information as audio data, video data, graphics data, text data, [0048] transcription module 222 may include special alphanumerical characters or symbols (or combinations of alphanumerical characters and/or symbols) in the transcribed text data indicating other characteristics of a user speech. For example, similar to musical notes, the characters or symbols may record the sound volume of the user speech, as well as pitch and tempo of the speech, and related characters and/or symbols that are used to record music. In some embodiments, transcribed text data may contain text characters, words, and phrases containing “tags.” In various embodiments, the tags may include special alphanumerical characters or symbols (or combinations of alphanumerical characters and/or symbols) indicating other characteristics of a user speech [0065] comprehension module 422 may be configured to process transcribed text data to extract characteristics associated with the user or with the user request. In some embodiments, comprehension module 422 may include neural networks, recurrent neural networks (RNN) or convolutional neural networks (CNN) to process the text data to identify various characteristics of the user. In some embodiments, a specific computer model (CM) (e.g., recurrent neural network) may be trained to identify a specific feature of the text data associated with the user request. [0075] the request type may include “lost or stolen credit card,” “opening checking account,” “checking account balance,” and/or the like. FIG. 5 shows that configuration of a VA by user profile module 120 may involve selecting from among a plurality of types of pre-configured VAs (for the same user) depending on the type of request associated with the user communication. For example, FIG. 5 shows that for the type of request “Type A”, user profile module 120 may select either VAA1 or VAA2. The choice of VAA1 or VAA2 may be decided based on other data associated with the user request, such as other variable data 520, or streaming data 530. Similarly, FIG. 5 shows that for the type of request “Type B”, user profile module 120 may select either VAB1 or VA2B, and for the type of request :Type C″ user profile module 120 may select VAC. It should be noted, that describe embodiments for selecting or configuring VAs based on the user profile data are only illustrative and other embodiments may be considered. For example, VA may be selected and configured based on stationary data 510 without regard to the type of request associated with the user communication. [0079] data analysis module 130 may evaluate satisfaction of the user and may indicate that the user is not satisfied with a VA. The term “user satisfaction” refers to one of the characteristics of the emotional state of the user. Data analysis module 130 may, in real-time, dynamically communicate to virtual assistant module 140 to change attributes of a VA. In some embodiments, virtual assistant module 140 may be instructed to mirror the rate of speech of the user, or the cadence of the speech. In some embodiments, the cadence of the speech can be selected to reduce user dissatisfaction. The term “user dissatisfaction” refers to lack of user satisfaction. [0104] VA attributes may be selected depending on the topic assigned for discussion for a given VA. For example, in a video or audio conference regarding middle school education curriculum, the first VA may have a female voice and discuss the math curriculum, and the second VA may have a male voice and discuss music curriculum. In some embodiments, a live agent (for example a principle) for the middle school may exercise control over the VAs such as control over their topic of presentation as well as VAs attributes.)
training a convolution neural network (CNN) to generate a trained CNN model based on the set of first analysis results and the set of second analysis results, where the trained CNN model is configured to learn at least one of sentiments, emotions, and intent about the specific user, wherein training of the CNN is configured to build expertise in the trained CNN model that is specific to the specific user based on the respective domain; (in at least [0065] comprehension module 422 may be configured to process transcribed text data to extract characteristics associated with the user or with the user request. In some embodiments, comprehension module 422 may include neural networks, recurrent neural networks (RNN) or convolutional neural networks (CNN) to process the text data to identify various characteristics of the user. In some embodiments, a specific computer model (CM) (e.g., recurrent neural network) may be trained to identify a specific feature of the text data associated with the user request [0069] audio/video/image module 424 may include neural networks, recurrent neural networks (RNN) or convolutional neural networks (CNN) to process the audio data to identify various characteristics of the user. In some embodiments, a specific computer model (CM) (e.g., recurrent neural network) may be trained to identify a specific feature of the audio, video or image data associated with the user request. For example, a first CM may be used to identify the age of the user, and a second CM may be used to identify the emotional content of the user. The first CM may, for example, take a sample of the audio data associated with the user request and output the most probable range for the age of the user, while the second CM may take a sample of the audio data associated with the user request and output a set of emotions that are most likely experienced by the user during the request. The set of emotions, for example, may be {“happy”, “content”, “patient”} or {“sad”, “impatient”, “aggressive”, “scared”}. In some embodiments, a CM may be trained to recognize human emotions using a combination of verbal expressions and facial expressions obtained from the video data associated with the user request. In some embodiments, a CM may be trained to recognize a dialect of a user. In various embodiments, multiple CMs may be employed by audio/video/image module 424 to obtain characteristics associated with the user request. [0080] data analysis module 130 may assign a satisfaction score, or value, to the user satisfaction based on a function that maps the user speech characteristics pertaining to the user emotional state to numerical values. Such function may, for example, include a neural network, recurrent neural network or convolutional neural network. In an illustrative embodiment, the positive satisfaction values of this function may correspond to the user satisfaction and the negative satisfaction values of this function may correspond to the user dissatisfaction. [0083] FIG. 8 is a flowchart of an illustrative process 800 for configuring a VA for a user based, at least in part, on previous user interactions with interaction system 105. In step 811A, the user may interact with interaction system 105 for example, for the first time. The data associated with interaction history (interaction history data A) may be recorded via user profile module 120 in database 170 of server system 135. In various embodiments, the interaction history data A may include the data related to attributes of the VA (VAA) used for interaction with the user in step 811A as well as changes in the attributes schematically described by graphs (also referred to as timetables in some cases) 801A. For example, the changes in the attributes may be related to the changes in runtime modifiable attributes 620 such as sound volume, a tone of voice, or emotional content of the VAA. The changes in the runtime modifiable attributes are schematically graphed as a function of time of communication in 801, and can be stored as a part of interaction history data A. The user may interact with interaction system 105 in steps 811B-811M and the data associated with interaction history (interaction history data B through M and related graphs 801A-801M) may be recorded via user profile module 120. In an example embodiment, the user may interact with interaction system 105 in steps 811N and interaction system 105 may configure VAN for the interaction in step 811N, based on the audio, video, graphics and text data received in step 811N during the user communication with interaction system 105 as well as interaction history data A-M and associated VAA-VAM. [0084] in step 818, data analysis module 130 may evaluate if the attributes for the VAN need to be updated. Data analysis module 130 may compare the user interaction data with the interaction history data A-M and configure attributes for VAN that match the attributes of one of the VAA-VAM, related to the interaction history data A-M that closely matches the data associated with the user interaction data in step 811N. In an example embodiment, data analysis module 130 may further compare the user interaction data with the interaction history data A-M and select the timetable for changes 801N in runtime modifiable attributes of the VAN that matches one of the timetables 801A-801M that is related to one of the interaction history data A-M that closely matches the interaction data associated with user communication in step 811N. It should be noted, that the process of selecting attributes for the VA described above in step 818 is only illustrative and other approaches for selecting attributes for the VA may be considered. For example, some of the runtime modifiable attributes of the VAN may be obtained as a weighted average of the attributes of VAA-VAM where the weights WA-WM may be related to a “measure function” between the data associated with user interaction in step 811N and interaction history data A-M. [0085] FIG. 8 shows that in step 818, if VAN needs to be updated (step 818, Yes), process 800 proceeds to step 820 of updating VAN. After completing the update in step 820, process 800 proceeds to step 811N for continuing interacting with the user and acquiring user characteristics related to the user communication with interaction system 105. [0092] system 105 may chose to update VA attributes to result in an increased running average satisfaction score for the user. In some embodiments, system 105 may analyzes changes in average values of user characteristics based on historic data for one or more users, and select an appropriate update to VA attributes according to historic updates to VA attributes that led to improved results, such as, for example, increased running average satisfaction score for a user.)
performing, based on the set of first analysis results and on the set of second analysis results, a third analysis of the set of raw data records, the third analysis configured to use the trained CNN model to perform at least one of interpreting, summarizing and classifying, of information associated with the set of first analysis results and the set of second analysis results, to generate a set of processed data records that are stored in an encrypted, computer-readable format in a repository configured to accumulate a set of accumulated information about the specific user, wherein the repository is configured with one or more predetermined entities configured to accumulate information about the specific user, the predetermined entities comprising at least one of a general-purpose entity and a domain-specific entity, wherein the domain-specific entity accumulates at least one of the set of raw data records and the set of processed user data which are associated with the respective domain, wherein the set of raw data records, set of first analysis results, set of second analysis results, and set of processed data records all become part of the set of accumulated user information about the specific user; (in at least [0045] front end interface module 220 may initially use a generic VA for interaction with the user prior to collecting additional information from the user. The term “generic VA” refers to a default VA that may be configured based on expected preferences for an average user. For example, the default VA may have a female voice and communicate using American English. In some embodiments, for cases when the user has an associated user profile with an associated user VA, front end interface module 220 may configure an associated user VA for initial interaction with the user. [0050] Transcription module 222 may transmit the transcribed data to server system 135 and store both audio and transcribed data in database 170. In various embodiments, the transcribed data may be parsed by a parsing module 224 shown in FIG. 2. The transcribed text data may be transmitted to parsing module 224 from database 170. Parsing module 224 may parse transcribed text data using a language parser to produce identified data objects. The language parser may assign labels to data objects of the transcribed text data, including labels identifying parts of speech. For example, the labels identifying parts of speech for the transcribed text data objects may be used as additional information when transcribed text data is processed by data analysis module 130. [0052] in order to verify meta-data associated with a user, data handling module 110 may ask the user a set of verifiable questions. For example, data handling module 110 may prompt the user to communicate her/his credit card number, social security number, address, name, security questions, user id, pin number, password and/or account number. Data handling module 110 may use transcription module 222 to transcribe the user audio data. Data handling module 110 may parse the transcribed stream data using parsing module 224 to retrieve the valuable information. Data handling module 110 may then use verification module 226 to verify the data obtained from parsing the transcribed data of the information communicated by the user against the user profile and other available information. [0061] the comprehension module may be used to analyze the type of request associated with the user communication. The system may, for example, analyze the user speech and determine the reason for the user's request. In some embodiments, the system may use natural language processing to determine the subject matter of the request. For example, the comprehension module may determine that the user request may be associated with the theft of a credit card, or that the user request is related to opening a bank account or completing a purchase transaction. The comprehension module may determine the user request by analyzing the key words found in transcribed text data. Additionally, or alternatively, the comprehension module may determine the user request by providing a user with questions, such as multiple-choice questions, using data handling system 110. In some embodiments, the comprehension system may attempt to determine the type of the user request based on natural language processing using various algorithms developed for natural language processing that may include regular expressions, artificial neural networks, n-gram language models, logic regression, vector semantics, part-of-speech tagging, recurrent neural networks, and/or the like. For example, the comprehension module may use key phrases to attempt to determine the type of the user request. [0062] Comprehension module 422 may be configured to verify with the user the determined type of request by asking the user (via data handling module 110) to confirm the request type determined by comprehension module 422. In cases when comprehension module 422 cannot correctly identify the type of request from the user, the comprehension module may be configured to present the user with questions designed to narrow down the type of request associated with the user communication. The questions may require the user to answer “yes” or “no”. For example, comprehension module 422 may prompt the user to answer “Are you calling to report a loss of the credit card? Please answer yes or no. ” After receiving a reply “yes” from the user, comprehension module 422 may proceed to ask the user “Would you like to suspend the card, or would you like to cancel the card?” In various embodiments, comprehension module 422 may select questions for the user resulting in a set of expected answers. For example, for a question “would you like to suspend the card, or would you like to cancel the card?” the set of expected answers may be “suspend” or “cancel.” In some embodiment, the user may answer “the card was lost yesterday, and I would like to suspend it for now” and the comprehension module may be configured to determine that the card needs to be suspended. In some embodiments, the user may answer “I am reporting a theft of the credit card” and comprehension module 422 may be configured to determine that the card needs to be canceled. [0065] comprehension module 422 may be configured to process transcribed text data to extract characteristics associated with the user or with the user request. In some embodiments, comprehension module 422 may include neural networks, recurrent neural networks (RNN) or convolutional neural networks (CNN) to process the text data to identify various characteristics of the user. In some embodiments, a specific computer model (CM) (e.g., recurrent neural network) may be trained to identify a specific feature of the text data associated with the user request [0075] the variable data may include the type of the request associated with a user communication. For example, the request type may include “lost or stolen credit card,” “opening checking account,” “checking account balance,” and/or the like. FIG. 5 shows that configuration of a VA by user profile module 120 may involve selecting from among a plurality of types of pre-configured VAs (for the same user) depending on the type of request associated with the user communication. For example, FIG. 5 shows that for the type of request “Type A”, user profile module 120 may select either VAA1 or VAA2. The choice of VAA1 or VAA2 may be decided based on other data associated with the user request, such as other variable data 520, or streaming data 530. Similarly, FIG. 5 shows that for the type of request “Type B”, user profile module 120 may select either VAB1 or VA2B, and for the type of request :Type C″ user profile module 120 may select VAC. It should be noted, that describe embodiments for selecting or configuring VAs based on the user profile data are only illustrative and other embodiments may be considered. For example, VA may be selected and configured based on stationary data 510 without regard to the type of request associated with the user communication. [0092] if the averaged value of a user characteristic is changing significantly at a point in time (or during a time interval) during user communication session, system 105 may be configured to update VA attributes. For example, system 105 may chose to update VA attributes to result in an increased running average satisfaction score for the user. In some embodiments, system 105 may analyzes changes in average values of user characteristics based on historic data for one or more users, and select an appropriate update to VA attributes according to historic updates to VA attributes that led to improved results, such as, for example, increased running average satisfaction score for a user. [0095] FIG. 10 is a flowchart of an illustrative process 1000 for updating a VA for a user in communication with system 105 based on the level of user satisfaction or emotional state. The level of user satisfaction or emotional state may be inferred from language of the user speech, emotional content of the user speech, or other factors, such as how quickly the user receives the information that he/she is requesting. In an example embodiment, data analysis module 130 may be configured to evaluate user satisfaction or emotional state throughout the user communication with interaction system 105 and to store the timetable of user satisfaction or emotional state in the user profile. In some embodiments, data analysis module 130 may evaluate a base level satisfaction or a target emotional state for a user depending on user characteristics. For example, some users may, in general, be satisfied when communicating with VAs of the system 105, and some users may generally be irritated. In various embodiments, data analysis module 130 may store the base level satisfaction and/or target emotional state in the user profile classified by type of request from the user. [0104] in a video or audio conference regarding middle school education curriculum, the first VA may have a female voice and discuss the math curriculum, and the second VA may have a male voice and discuss music curriculum. In some embodiments, a live agent (for example a principle) for the middle school may exercise control over the VAs such as control over their topic of presentation as well as VAs attributes. [0116] Data analysis module 130 may be configured to identify the features of the user request. For example, the features may include a high volume of the speech of the user, high rate of the speech, and presence of key words/phrases that help identify the type of the request. For example, the key word/phrases may be “urgent”, “emergency”, “theft of the card”, “flat tire”, “account balance” and/or the like. Data analysis module 130 may classify the request by a type based on the request features. For example, data analysis module 130 may identify the request as type “urgent”, “important”, “complex”, “request that may require a lot of explanation”, “request that may require a lot of information from the user”, and/or the like. It should be noted that various other request types may be identified by data analysis module 130. In various embodiments, based on the request type, data analysis module 130 instructs virtual assistant module 140 to provide a VA having attributes configured for the request type. Additionally, the provided VA may have attributes configured to take into account user characteristics identified within the user request. [0117] data analysis module 130 may determine a type of the request and identify profile features for the user. Data analysis module 130 may select the first set of attributes for the virtual assistant corresponding to the request type and select a second set of attributes for the virtual assistant corresponding to the user profile features. For example, for request type “urgent” data analysis module 130 may select attributes of the VA that include concise language and clear pointed questions. For the user profile describing the user as a female from Rhode Island, data analysis module 130 may select a female voice with Rhode Island dialect. Data analysis module 130 may additionally modify the first and second set of attributes of the virtual assistant based on the user characteristics, such as user rate of speech, pitch, a cadence of speech and/or the like.)
performing a fourth analysis to determine, based on the set of accumulated user information and using the trained CNN model, at least one recommended action to assist the specific user; (in at least [0048] transcribed text data may contain text characters, words, and phrases containing “tags.” In various embodiments, the tags may include special alphanumerical characters or symbols (or combinations of alphanumerical characters and/or symbols) indicating other characteristics of a user speech [0061] the comprehension module may determine that the user request may be associated with the theft of a credit card, or that the user request is related to opening a bank account or completing a purchase transaction. The comprehension module may determine the user request by analyzing the key words found in transcribed text data. [0066] Data analysis module 130 may further include an audio/video/image module 424 for processing the audio video and image data, and for analyzing audio characteristics associated with the user request. In various embodiments, audio/video/image module 424 may analyze a pitch, a tone, a cadence of the user speech, volume and a rate of a user speech to extract various characteristics from the speech. For example, audio/video/image module 424 may identify a gender of the user, an age of the user, a dialect of the user, an accent of the user speech, a tone of voice, an emotional content of the speech, or any other aspects that uniquely identify the user based on the audio video or image characteristics of the receiving data. In some embodiments, when the user is communicating with interaction system 105 using video data, audio/video/image module 424 may identify the characteristics of the user (e.g., age, gender, ethnicity, hand gestures, facial expressions, body movements, and the like) associated with the video streaming data. [0069] audio/video/image module 424 may include neural networks, recurrent neural networks (RNN) or convolutional neural networks (CNN) to process the audio data to identify various characteristics of the user. In some embodiments, a specific computer model (CM) (e.g., recurrent neural network) may be trained to identify a specific feature of the audio, video or image data associated with the user request. For example, a first CM may be used to identify the age of the user, and a second CM may be used to identify the emotional content of the user. The first CM may, for example, take a sample of the audio data associated with the user request and output the most probable range for the age of the user, while the second CM may take a sample of the audio data associated with the user request and output a set of emotions that are most likely experienced by the user during the request. The set of emotions, for example, may be {“happy”, “content”, “patient”} or {“sad”, “impatient”, “aggressive”, “scared”}. In some embodiments, a CM may be trained to recognize human emotions using a combination of verbal expressions and facial expressions obtained from the video data associated with the user request. In some embodiments, a CM may be trained to recognize a dialect of a user. In various embodiments, multiple CMs may be employed by audio/video/image module 424 to obtain characteristics associated with the user request. [0080] data analysis module 130 may assign a satisfaction score, or value, to the user satisfaction based on a function that maps the user speech characteristics pertaining to the user emotional state to numerical values. Such function may, for example, include a neural network, recurrent neural network or convolutional neural network. In an illustrative embodiment, the positive satisfaction values of this function may correspond to the user satisfaction and the negative satisfaction values of this function may correspond to the user dissatisfaction. [0081] data analysis module 130 may select a reply for the user that may reduce the user dissatisfaction. For example, data analysis module 130 may include phrases indicating that VA understands the request from the user and is currently pursuing a solution. For example, data analysis module 130 may include phrases such as “we understand that there is a problem with your authentication.” Similarly, if data analysis module 130 indicates that the user's sentiment is becoming more negative, data analysis module 130 may instruct interaction system 105 to transfer the communication to a live agent. The term “live assistant” or “live agent” refers to an actual human being that may assist a user during the communication with the interaction system. [0112] a user might require an assistance of interaction system 105 during an interaction with a software application or a Webpage