Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawing submitted on 12/29/2023 is considered by the examiner.
Response to Amendment
Claims 59-78, are currently pending in the application and among them claims 59, 74, and 77, are independent claims and have been amended.
Response to Arguments
Applicant’s arguments with respect to claim(s) 59-78, have been considered but are moot in view of new ground of rejection.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 59 recites the limitation "the set of known prompt candidates" in line 7, which should be corrected with “the set of vector embedding for known prompt candidates”. There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 59-78, are rejected under 35 U.S.C. 103 as being unpatentable over Raghavan et al.(US 8761373 B1) in view of Berisha et al.(WO 2022/076891 A1).
Regarding Claim 59, Raghavan et al. teach: A computer-assisted method of identifying prompts from a batch (audio files of a plurality of calls) of recorded interactive voice response (IVR) calls, comprising (Abstract)The method involves automatically classifying each segment with an interactive voice response (IVR) state using a semantic classifier that classifies the segment with predefined IVR states based on a text in the segment and a statistical semantic model for IVR prompts, to obtain an IVR state sequence for a call. Col 2, lines 41-43, …a computer system processes the audio files of a plurality of calls and identifies the state sequence for each call (step 100.).): processing the batch of IVR call, including for each call, identifying, within a call under analysis, a plurality of discrete utterances (speech segments) (Col 2, lines, 40-53, … a computer system processes the audio files of a plurality of calls and identifies the state sequence for each call (step 100.)…the method for determining the state sequence of a single call (step 110 of Fig.1). The system must obtain an audio recording of the call within the IVR system (step 210). Col 2, lines 65-66, …the speech segments from the audio recording of the IVR portion are extracted from the audio files…); electronically comparing (semantic similarity, i.e. semantic classifier) a selected utterance (text segment) to a set of known prompt candidates (plurality of predefined IVR states or IVR prompts), wherein the known prompt candidates are candidates for being IVR prompts (Col 3, lines 3-57 The extracted speech segments are then converted into text…Each text segment is then automatically classified with one of a plurality of predefined IVR states (step 230)…may include a “Welcome” state, a “Main Menu” state, a “Payment Option” state or many more…If the confidence level is above the minimum threshold for each classified segments, the sequence of IVR states generated in step 230 (as a result of classifying the segments with predefined IVR states) is used as the IVR state sequence for the call (step 260)…a semantic classifier to automatically classify IVR prompts with an IVR state…prompts are then clustered together into groups based on semantic meaning, such that prompts with similar meaning or intent are grouped together (step 350) …IVR prompts are broken down into words and grouped into clusters of IVR prompts.); based on the comparing, determining that the selected utterance clusters with a known prompt candidate cluster (prompts are then clustered together into groups based on semantic meaning), and accumulating a counter (starting with a single cluster containing all the extracted IVR prompts and then splitting the clusters progressively into smaller clusters until a desired number of clusters is reached) for the known prompt candidate cluster (Col 3, lines 3-57 …Each text segment is then automatically classified with one of a plurality of predefined IVR states (step 230)… After automatically classifying each text segment with an IVR state, the system will compute the confidence level for each classification (step 240) and determine if the confidence level is above a minimum threshold for each of the classified segments (step 250).…a semantic classifier to automatically classify IVR prompts with an IVR state…prompts are then clustered together into groups based on semantic meaning, such that prompts with similar meaning or intent are grouped together (step 350) …IVR prompts are broken down into words and grouped into clusters of IVR prompts. Col 3, lines 57-59 …IVR prompts are broken down into words and grouped into clusters of IVR prompts…Frequently-used words are assigned a lower weight than infrequently-used words…Col 4, lines 1- 6, …IVR prompts are grouped into clusters using a hierarchical, bisecting K-Means clustering algorithm…starting with a single cluster containing all the extracted IVR prompts and then splitting the clusters progressively into smaller clusters until a desired number of clusters is reached. ); and after processing the batch (audio files of a plurality of calls or state sequences of multiple calls ) , identifying a set of good prompt candidates (the best candidate from each cluster that has state sequence closest to the center of the cluster ), wherein good prompt candidates have a counter above a threshold (confidence level is above a minimum threshold) (Col 2, lines, 40-53, … a computer system processes the audio files of a plurality of calls and identifies the state sequence for each call (step 100.). Col 3, lines 17-21, After automatically classifying each text segment with an IVR state, the system will compute the confidence level for each classification (step 240) and determine if the confidence level is above a minimum threshold for each of the classified segments (step 250). Col 4, lines 7-30…After clustering the prompts into groups, the system preliminary labels each group with an IVR states (step 355)…The system groups the state sequences of multiple calls into N clusters, wherein calls with similar state sequences are grouped together (step 410)…a hierarchical, bisecting K-Means clustering algorithm is used to group the call state sequences into groups…After the call state sequences have been grouped into clusters, the best candidate from each cluster (i.e., the state sequence closest to the center of the cluster) is selected to represent a single dominant Path for the IVR application call flow (step 420.).).
Raghavan et al. do not explicitly teach: “…calculating vector embeddings for the discrete utterance”, “…comparing vector embeddings for selected utterance to a set of vector embeddings…, and …cluster based on vector embeddings, the limitation expressly recited in the claim.
Berisha et al. teach: “…calculating vector embeddings for the discrete utterance”, “…comparing vector embeddings for selected utterance to a set of vector embeddings…, and …cluster based on their vector embeddings ([0046] Semantic coherence: Refers to the semantic relatedness of the participant’s response to the assessor’s prompt. These features are computed from numerical sentence embeddings and validated similarity measures between the prompt and each response. [0071] In NLP, semantics are computationally modeled with word or sentence embeddings, typically a high-dimensional vector representation of a body of text. Words or phrases used in similar semantic contexts are often represented closer together as measured by their cosine similarity… [0073] Therefore, a cosine similarity can have a maximum value of 1 if the vectors are perfectly aligned, indicating identical semantics. [0131] The speech or audio analysis can be performed using speech recognition and/or speaker diarization algorithms. Speaker diarization is the process of segmenting or partitioning the audio stream based on the speaker’s identity. In some embodiments, the diarization algorithm detects changes in the audio (e.g. acoustic spectrum) to determine changes in the speaker, and/or identifies the specific speakers during the conversation. An algorithm may be configured to detect the change in speaker, which can rely on various features corresponding to acoustic differences between individuals. The speaker change detection algorithm may partition the speech/audio stream into segments. These partitioned segments may then be analyzed using a model configured to map segments to the appropriate speaker. The model can be a machine learning model such as a deep learning neural network. Once the segments have been mapped (e.g. mapping to an embedding vector), clustering can be performed on the segments so that they are grouped together with the appropriate speaker(s). [0214] When computing the variables, semantic coherence was defined as a cosine similarity score between sentence embeddings computed between participant utterances and the interviewer prompts.).
Therefore, it would have been obvious to one of ordinary skilled in the art at the time of the invention was made for Raghavan et al. to include the teaching of Berisha et al. above measuring semantic coherence between utterances and prompts through mapping utterance word segments to an prompt embedding vector in order to performed clustering of the segments together into a group of an appropriate prompts groups.
Regarding Claims 60, 75, and 78, Raghavan et al. teach: The computer-assisted method of claim 59, further comprising determining that the selected utterance does not cluster with a known prompt candidate (a set of prompts may be similar except for having unique customer information), and designating the selected utterance as a prompt candidate (See rejection of claim 59 and Col 3, lines 47-52… a set of prompts may be similar except for having unique customer information, such a names or numbers. In this situation, the system would “normalize” or remove the unique text and replace the text with a static value such that the similar set of prompts may be clustered together for increase efficiency.).
Regarding Claim 61, Raghavan et al. teach: The computer-assisted method of claim 59, wherein the threshold is a scalar (confidence level or weight) threshold (See rejection of claim 59 and Col 3, lines 17-21, After automatically classifying each text segment with an IVR state, the system will compute the confidence level for each classification (step 240) and determine if the confidence level is above a minimum threshold for each of the classified segments (step 250). Col 3, lines 57-59 …IVR prompts are broken down into words and grouped into clusters of IVR prompts…Frequently-used words are assigned a lower weight than infrequently-used words…Col 4, lines 1- 6, …IVR prompts are grouped into clusters using a hierarchical, bisecting K-Means clustering algorithm…starting with a single cluster containing all the extracted IVR prompts and then splitting the clusters progressively into smaller clusters until a desired number of clusters is reached.).
Regarding Claim 62, Raghavan et al. teach: The computer-assisted method of claim 59, wherein the threshold is a prevalence threshold (See rejection of claim 59).
Regarding Claim 63, Raghavan et al. teach: The computer-assisted method of claim 59, wherein the discrete utterances are delimited by silence (See rejection of claim 59 and Col 2 line 65 to Col 3 line 1, …the speech segments from the audio recording of the IVR portion are extracted from the audio files by identifying the speech portions and discarding any silences, noise, and other non-speech sections of the audio streams (step 220).
Regarding Claims 64 and 76, Raghavan et al. teach: The computer-assisted method of claim 59, wherein the discrete utterances (speech segments) are delimited by silence above a threshold length (longest prompt) (See rejection of claim 59 and Col 2 line 65 to Col 3 line 1, …the speech segments from the audio recording of the IVR portion are extracted from the audio files by identifying the speech portions and discarding any silences, noise, and other non-speech sections of the audio streams (step 220). Col 3, lines 45-46, …the system would extract the unique longest prompt from the set of prompts…).
Regarding Claim 65, Raghavan et al. teach: The computer-assisted method of claim 59, wherein electronically comparing comprises comparing audio waveforms (See rejection of claim 59 and Col 2, lines, 40-58, … a computer system processes the audio files of a plurality of calls and identifies the state sequence for each call (step 100.)…the system can make use of uncompressed audio files (e.g., WAV)... ).
Regarding Claim 66, Raghavan et al. teach: The computer-assisted method of claim 59, wherein electronically comparing comprises comparing call transcripts (See rejection of claim 59 and Col 1, lines 34-35, An IVR state sequence is identified for each of a plurality of calls from recorded audio files of the call. Col 2, lines, 40-58, … a computer system processes the audio files of a plurality of calls and identifies the state sequence for each call (step 100.)…the system can make use of uncompressed audio files (e.g., WAV)... ).
Regarding Claim 67, Raghavan et al. teach: comparing call transcripts (See rejection of Claim 66).
Raghavan et al. do not teach: wherein comparing call transcripts comprises comparing average word embeddings for utterances.
Berisha et al. teach: comparing average word embeddings for utterances ([0046] Semantic coherence: Refers to the semantic relatedness of the participant’s response to the assessor’s prompt. These features are computed from numerical sentence embeddings and validated similarity measures between the prompt and each response. [0071] …the semantic relationships between the dialogue context and each spoken utterance for a given participant. In NLP, semantics are computationally modeled with word or sentence embeddings, typically a high-dimensional vector representation of a body of text. Words or phrases used in similar semantic contexts are often represented closer together as measured by their cosine similarity… [0131] Once the segments have been mapped (e.g. mapping to an embedding vector), clustering can be performed on the segments so that they are grouped together with the appropriate speaker(s). [0214] When computing the variables, semantic coherence was defined as a cosine similarity score between sentence embeddings computed between participant utterances and the interviewer prompts.).
Therefore, it would have been obvious to one of ordinary skilled in the art at the time of the invention was made for Raghavan et al. to include the teaching of Berisha et al. above measuring semantic coherence between utterances and prompts through mapping utterance word segments to an prompt embedding vector in order to performed clustering of the segments together into a group of an appropriate prompts groups.
Regarding Claim 68: The computer-assisted method of claim 67, wherein comparing call transcripts comprises comparing for cosine similarity (See rejection of Claim 67).
Regarding Claim 69: The computer-assisted method of any of claim 59, further comprising outputting a matrix of the good prompt candidates (See rejection of Claim 59 and 0025] FIG. 11 shows a table with the results from the two upstream logistic regression classification experiments performed on the language samples collected from the SSPA task. The results are reported with the confusion matrix, receiver operating characteristic area-under- curve (AUC), and a weighted average of precision, recall, and Fl score for each class prediction.).
Regarding Claim 70: The computer-assisted method of claim 69, further comprising cutting short initial snippets for the good prompt candidates (See Raghavan et al. teaching of Claim 59 along with teaching of Col 3, lines 57-59 …IVR prompts are broken down into words and grouped into clusters of IVR prompts… Col 4, lines 7-30…After clustering the prompts into groups, the system preliminary labels each group with an IVR states (step 355)…The system groups the state sequences of multiple calls into N clusters, wherein calls with similar state sequences are grouped together (step 410)…a hierarchical, bisecting K-Means clustering algorithm is used to group the call state sequences into groups…After the call state sequences have been grouped into clusters, the best candidate from each cluster (i.e., the state sequence closest to the center of the cluster) is selected to represent a single dominant Path for the IVR application call flow (step 420.).).
Regarding Claim 72, Raghavan et al. teach: The computer-assisted method of claim 59, further comprising tagging a set of call recordings with identified IVR prompts within the call recordings ( See rejection of claim 59 and Col 2, lines 59-61, The entire call may be recorded, including both the IVR prompts and the customer response. Col 4, lines 7-30…After clustering the prompts into groups, the system preliminary labels each group with an IVR states (step 355)…The system groups the state sequences of multiple calls into N clusters, wherein calls with similar state sequences are grouped together (step 410).).
It is obvious that all call transcripts comprise call recording with timestamp, however Raghavan et al. do not explicitly teach: call recordings with timestamps.
Berisha et al. teach: recordings with timestamps ([0101] Independent training datasets may comprise audio data captured at different time points from an individual, or a plurality of individuals. [0119] The record may further comprise a location identifier, a time stamp…).
Therefore, it would have been obvious to one of ordinary skilled in the art at the time of the invention was made for Raghavan et al. to include the teaching of Berisha et al. above capturing audio data at different time point from an individual, or a plurality of individuals and measuring semantic coherence between utterances and prompts through mapping utterance word segments to an prompt embedding vector in order to performed clustering of the segments together into a group of an appropriate prompts from audio file of recordings with timestamps.
Regarding Claim 73, Raghavan et al. teach: The computer-assisted method of claim 59, further comprising supplementing prompt identification with additional human analysis ( See rejection of claim 59 and Col 3, lines 17-32, After automatically classifying each text segment with an IVR state, the system will compute the confidence level for each classification (step 240) and determine if the confidence level is above a minimum threshold for each of the classified segments (step 250). If the confidence level is above the minimum threshold for each classified segment, the sequence of IVR states generated in step 230 (as a result of classifying the segments with predefined IVR states) is used as the IVR state sequence for the call (step 260). If, however, the confidence level of the classification is below the minimum threshold for one or more segments, the call recording is flagged as requiring manual supervision and labeling in accordance with FIG. 3 (step 270).).
Regarding Claims 74 and 77, Raghavan et al. teach: One or more tangible, non-transitory computer-readable storage media having stored thereon executable instructions to instruct a processor circuit to (Col 5, lines 17-24, The methods described with respect to FIGS. 1-4 are embodied in software and performed by a computer system executing the software. A person skilled in the art would understand that a computer system has a memory or other physical storage medium for storing software instructions and one or more processors for executing the software instructions. The computer system may have access to a database which stores audio and text files.):receive a batch of interactive voice response (IVR) call recordings (Col 2, lines, 40-53, … a computer system processes the audio files of a plurality of calls and identifies the state sequence for each call (step 100.)…the method for determining the state sequence of a single call (step 110 of Fig.1). The system must obtain an audio recording of the call within the IVR system (step 210). ); identify, within a selected call from the batch, a plurality of utterances, comprising calculating vector embeddings for the discrete utterances; electronically compare vector embeddings for a selected utterance to a set of vector embeddings for previously-identified IVR prompt candidates; based on the comparing, determine that the selected utterance clusters into a previously-identified prompt candidate cluster based on their vector embdeddings, and increasing a count for the candidate cluster; and after processing the batch, identify a set of good prompt candidates, based on counts for the candidate clusters (See rejection of claim 59).
Allowable Subject Matter
Claim 71 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art of record Gan et al.(US 2019/0243900 A1) teach: AUTOMATIC QUESTIONING AND ANSWERING PROCESSING METHOD AND AUTOMATIC QUESTIONING AND ANSWERING SYSTEM.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MOHAMMAD K ISLAM whose telephone number is (571)270-5878. The examiner can normally be reached Monday -Friday, EST (IFP).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MOHAMMAD K ISLAM/Primary Examiner, Art Unit 2653