Last updated: April 19, 2026
Application No. 18/723,458
METHOD AND APPARATUS FOR ASSESSING PARTICIPATION IN A MULTI-PARTY COMMUNICATION

Non-Final OA §102
Filed
Jun 23, 2024
Examiner
TRAN, QUOC DUC
Art Unit
2691
Tech Center
2600 — Communications
Assignee
Uniphore Technologies, Inc.
OA Round
1 (Non-Final)
Interview Optional

— +4.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 841 resolved cases, 2023–2026
Examiner Intelligence

TRAN, QUOC DUC View full profile →
Grants 86% — above average
Career Allow Rate
720 granted / 841 resolved
+23.6% vs TC avg
Minimal +5% lift
Without
With
+4.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
17 currently pending
Career history
858
Total Applications
across all art units
Statute-Specific Performance

§101
5.0%
-35.0% vs TC avg
§103
43.3%
+3.3% vs TC avg
§102
30.5%
-9.5% vs TC avg
§112
5.3%
-34.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 841 resolved cases
Office Action

§102
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Reece et al (2021/0264162).
Consider claim 1, Reece et al teach a computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor (par. 0224), configure the apparatus to: receive, at an analytics server, from a plurality of multimedia devices corresponding to a plurality of participants of a multimedia event comprising multi-party communication, multi-modal data for each of the plurality of participants, the multi-modal data comprising video data and audio data (par. 0033; “machine learning system for determining conversation analysis indicators using acoustic, video, and text data of a multiparty conversation are provided. In the example implementation, the machine learning system analyzes multiple data modalities of a multiparty conversation to determine conversation analysis indicators”; par. 0036-0038; “The conversation analytics system coordinates the application of these machine learning algorithms between multiple participants for higher-level analysis of the conversation”), wherein the plurality of multimedia devices are remote to the analytics server (par. 0051-0052; “Client computing devices 205 can operate in a networked environment using logical connections through network 230 to one or more remote computers, such as a server computing device”); and send, from the analytics server, to a user device or at least one of the plurality of multimedia devices, at least one of vision information, tonal information, text information, or a representative participation score (RPS) (par. 0043; “The conversation analytics system provides a user interface for users to interrogate the conversation analysis indicators”; par. 0071; “conversation analytics system 400 generates multiple conversation analysis indicators (e.g., conversation scores, conversation analysis indicators, openness scores, engagement scores, ownership scores, goal scores, interruptions scores, “time spent listening” scores, and emotional labels) at block 406”), wherein, for each of the plurality of participants, vision information is extracted from the video data, at least one of tonal information or text information from the audio data, and wherein the RPS is determined based on the vision information and the at least one of the tonal information or the text information (par. 0058-0061; “Video processing component 346 can extract the video data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Acoustic processing component 348 can extract the audio data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Textual processing component 350 can transcribe the audio from a conversation or utterance from a conversation into text and encapsulate it as a conversation feature for use by a machine learning system”), for each of the plurality of participants, for a predefined time interval (par. 0069; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”; par. 0201; “User interface 2200 may further include progress score chart 2208, illustrating values of the performance score at preset intervals or after each conversation”).
Consider claim 2, Reece et al teach wherein the RPS comprises a sentiment score and an engagement score (par. 0198; “conversation score 2202 and/or multiple sub-scores (e.g., an engagement score, an ownership score, a goal score, an interruptions score, a “time spent listening” score, an openness score, etc.) are generated by one or more machine learning systems based on identified conversation features (e.g., audio and video from a conversation, facial expressions, voice tone, key phrases, etc.) of the recorded conversation”; par. 0219; “generate a particular conversation analysis indicator (e.g., emotional score, emotional labels, ownership score, etc.)”).
Consider claim 3, Reece et al teach wherein the vision information comprises at least one of sentiments, head nods, or disapprovals, wherein the tonal information comprise s at least one of sentiments, empathy, politeness, speak rate, talk ratio, or talk over ratio, wherein the text information comprises at least one of sentiment, or hyper-relevant text keyphrases (HRTKs) (par. 0058-0061; “Video processing component 346 can extract the video data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Acoustic processing component 348 can extract the audio data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Textual processing component 350 can transcribe the audio from a conversation or utterance from a conversation into text and encapsulate it as a conversation feature for use by a machine learning system”; par. 0112; “the video data may be labeled with facial expressions (e.g., smiling, grimacing, crying, nodding) and/or emotional labels (happy, sad, aggressive, surprised, disappointed). These labels may further each have confidence scores, indicating the relative confidence of the video processing part in that particular label (e.g., decimal score between zero and one)”).
Consider claim 4, Reece et al teach wherein the RPS is determined by combining the vision information and at least one of tonal information or text information for a given time interval (par. 0069; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”).
Consider claim 5, Reece et al teach wherein the instructions further configure the apparatus to send, from the analytics server, to the user device or the at least one multimedia device, an aggregated RPS for multiple participants for a first time interval (par. 0108; “Conversation analysis indicators (e.g., 706, 708, 710) may be stored with timestamps, for correlation with the source utterance and acoustic/video data of the conversation. In other words, conversation analysis indicators may be stored in a series, based on a series (i.e., sequence) of utterances and/or concatenated speaker data. The stored conversation analysis indicators may be graphed, visualized, and analyzed in aggregate by conversation analytics system 400”).
Consider claim 6, Reece et al teach wherein the instructions further configure the apparatus to send, from the analytics server, to the user device or the at least one multimedia device, aggregated RPS for a single participant for a plurality of consecutive time intervals (par. 0071; “These conversation analysis indicators are collected into conversation analysis indicators 411 and passed on to block 408. In other words, conversation analysis indicator 411 can include a single score or multiple scores/indicators for a particular conversation”).
Consider claim 7, Reece et al teach a computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor (par. 0224), configure the apparatus to: receive, at an analytics server, from a plurality of multimedia devices corresponding to a plurality of participants of a multimedia event comprising a multi-party communication, multi-modal data for each of the plurality of participants, the multi-modal data comprising video data and audio data (par. 0033; “machine learning system for determining conversation analysis indicators using acoustic, video, and text data of a multiparty conversation are provided. In the example implementation, the machine learning system analyzes multiple data modalities of a multiparty conversation to determine conversation analysis indicators”; par. 0036-0038; “The conversation analytics system coordinates the application of these machine learning algorithms between multiple participants for higher-level analysis of the conversation”), wherein the plurality of multimedia devices are remote to the analytics server (par. 0051-0052; “Client computing devices 205 can operate in a networked environment using logical connections through network 230 to one or more remote computers, such as a server computing device”); extract, at the analytics server, for each of the plurality of participants, vision information from the video data, at least one of tonal information or text information from the audio data (par. 0058-0061; “Video processing component 346 can extract the video data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Acoustic processing component 348 can extract the audio data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Textual processing component 350 can transcribe the audio from a conversation or utterance from a conversation into text and encapsulate it as a conversation feature for use by a machine learning system”; par. 0157; 0161; “Additional conversation features may be extracted from data modalities (e.g., video, audio, text, etc.). Examples of conversation features from the video data modality 1602 may include smiles, nods, laughter, posture (e.g., slanted, forward, backward, open, closed, expanded, deflated), head position, gestures, etc. Examples of conversation features from the audio data modality 1604 may include listener feedback (e.g., “ah-ha,” “um?”, “uh-huh,” etc.), paralanguage, vocal traffic signals (e.g., “go on,” “um . . . ,” “but!”, etc.), turn length, conversation percentage (e.g., percent of total conversation during which a speaker was active), etc.”); determine, at the analytics server, an aggregated representative participation score (RPS) based on the vision information and the at least one of the tonal information or the text information, for each of the plurality of participants, for a plurality of consecutive time intervals (par. 0069; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”; par. 0201; “User interface 2200 may further include progress score chart 2208, illustrating values of the performance score at preset intervals or after each conversation”; par. 0198; “conversation score 2202 and/or multiple sub-scores (e.g., an engagement score, an ownership score, a goal score, an interruptions score, a “time spent listening” score, an openness score, etc.) are generated by one or more machine learning systems based on identified conversation features (e.g., audio and video from a conversation, facial expressions, voice tone, key phrases, etc.) of the recorded conversation”; par. 0219; “generate a particular conversation analysis indicator (e.g., emotional score, emotional labels, ownership score, etc.)”); determine, at the analytics server, at least one of a first plurality of time intervals from the plurality of consecutive time intervals, the first plurality of time intervals comprising pronounced RPS for at least one of the plurality of participants, a second plurality of time intervals from the plurality of consecutive time intervals, the second plurality of time intervals comprising pronounced RPS for at least a predefined duration for at least one of the plurality of participants, a third plurality of time intervals from the plurality of consecutive time intervals, the third plurality of time intervals comprising pronounced RPS for at least two of the plurality of participants, or a fourth plurality of time intervals from the plurality of consecutive time intervals, the fourth plurality of time intervals comprising at least one phrase from a predefined set of phrases for at least one of the plurality of participants (par. 0067-0069; “the conversation features can include an embedding of the audio, video, or textual versions of the audio, tone, sound level, emotional characteristics (e.g., supportive, agreeable, combative, engaged, enthusiastic, passionate, uncertainty, etc.), effectiveness ratings, physical reactions or movements (e.g., eye gaze directions, participant postures, participant gestures, participant head positions, laughter, nodding, facial expressions, etc.); and/or identify particular significant phrases or word choices (e.g., mm-hmm, yes, yah, oh my god, huh, uhh, etc.)”; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”; par. 0108; “Conversation analysis indicators (e.g., 706, 708, 710) may be stored with timestamps, for correlation with the source utterance and acoustic/video data of the conversation. In other words, conversation analysis indicators may be stored in a series, based on a series (i.e., sequence) of utterances and/or concatenated speaker data. The stored conversation analysis indicators may be graphed, visualized, and analyzed in aggregate by conversation analytics system 400”); determine, at the analytics server, a ranked list of a plurality of a group of consecutive time intervals comprised in the plurality of consecutive time intervals, based on the aggregated RPS, the at least one of the first plurality of time intervals, the second plurality of time intervals, the third plurality of time intervals, or the fourth plurality of time intervals (par. 0069; 0071; “conversation analytics system 400 generates multiple conversation analysis indicators (e.g., conversation scores, conversation analysis indicators, openness scores, engagement scores, ownership scores, goal scores, interruptions scores, “time spent listening” scores, and emotional labels) at block 406”; par. 0174; “process 1900 optimally leverages previously generated conversation features to generate higher level (e.g., 2nd order, 3rd order, 4th order, etc.) synthesized conversation features”); and send, from the analytics server, to a user device or at least one of the plurality of multimedia devices, at least one of the vision information, the tonal information, the text information, the RPS, or the ranked list (par. 0043; “The conversation analytics system provides a user interface for users to interrogate the conversation analysis indicators”; par. 0071; “conversation analytics system 400 generates multiple conversation analysis indicators (e.g., conversation scores, conversation analysis indicators, openness scores, engagement scores, ownership scores, goal scores, interruptions scores, “time spent listening” scores, and emotional labels) at block 406”).
Consider claim 8, Reece et al teach wherein the plurality of consecutive time intervals comprises all time intervals since the beginning of the multimedia event to a current time “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”), and wherein the sending is performed in real time at the end of the plurality of consecutive time intervals (par. 0064; “Conversation analytics system 400 may record the conversation in real time or retrieve a stored conversation. Conversation analytics system 400 may be configured to retrieve and/or automatically generate a transcription of the conversation, based on the acoustic and video recordings”).
Consider claim 9, Reece et al teach wherein the instructions further configure the apparatus to: generate, at the analytics server, a baseline for each of the plurality of participants based on the aggregated RPS (par. 0078; “In some implementations, one or more of these scores can be provided in a mentee report with corresponding explanations and/or a baseline or comparison value so the mentee can interpret the values in terms of their progress, goals, or as a comparison to other mentees”); and adjust, at the analytics server, the at least one of the first plurality of time intervals, the second plurality of time intervals, the third plurality of time intervals, or the fourth plurality of time intervals using the baseline (par. 0129; “conversation analytics system 400 may transmit identified utterances to annotator 405, and further receive utterances modified by a user (e.g., start/stop times of the utterance modified) in response”).
Consider claim 10, Reece et al teach wherein the vision information comprises at least one of sentiments, head nods, or disapprovals, wherein the tonal information comprises at least one of sentiments, empathy, politeness, speak rate, talk ratio, or talk over ratio, wherein the text information comprises at least one of sentiment, or hyper-relevant text keyphrases (HRTKs) (par. 0058-0061; “Video processing component 346 can extract the video data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Acoustic processing component 348 can extract the audio data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Textual processing component 350 can transcribe the audio from a conversation or utterance from a conversation into text and encapsulate it as a conversation feature for use by a machine learning system”; par. 0112; “the video data may be labeled with facial expressions (e.g., smiling, grimacing, crying, nodding) and/or emotional labels (happy, sad, aggressive, surprised, disappointed). These labels may further each have confidence scores, indicating the relative confidence of the video processing part in that particular label (e.g., decimal score between zero and one)”).
Consider claim 11, Reece et al teach wherein the instructions further configure the apparatus to: identify, at the analytics server, the fourth plurality of time intervals associated with the first plurality of time intervals; send, from the analytics server, to a user device or at least one of the plurality of multimedia devices, the at least one phrase, and the first plurality of time intervals (par. 0067-0069; “the conversation features can include an embedding of the audio, video, or textual versions of the audio, tone, sound level, emotional characteristics (e.g., supportive, agreeable, combative, engaged, enthusiastic, passionate, uncertainty, etc.), effectiveness ratings, physical reactions or movements (e.g., eye gaze directions, participant postures, participant gestures, participant head positions, laughter, nodding, facial expressions, etc.); and/or identify particular significant phrases or word choices (e.g., mm-hmm, yes, yah, oh my god, huh, uhh, etc.)”; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”; par. 0043; “The conversation analytics system provides a user interface for users to interrogate the conversation analysis indicators”; par. 0071; “conversation analytics system 400 generates multiple conversation analysis indicators (e.g., conversation scores, conversation analysis indicators, openness scores, engagement scores, ownership scores, goal scores, interruptions scores, “time spent listening” scores, and emotional labels) at block 406”).
Consider claim 12, Reece et al teach s computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor (par. 0224), configure the apparatus to: receive, at an analytics server, a first and a second recording of a first and a second multimedia event respectively, each multi-media event comprising multi-party communication, each recording comprising multi-modal data for each of the plurality of participants, the multi-modal data comprising video data and audio data (par. 0033; “machine learning system for determining conversation analysis indicators using acoustic, video, and text data of a multiparty conversation are provided. In the example implementation, the machine learning system analyzes multiple data modalities of a multiparty conversation to determine conversation analysis indicators”; par. 0036-0038; “The conversation analytics system coordinates the application of these machine learning algorithms between multiple participants for higher-level analysis of the conversation”), wherein the plurality of multimedia devices are remote to the analytics server (par. 0051-0052; “Client computing devices 205 can operate in a networked environment using logical connections through network 230 to one or more remote computers, such as a server computing device”); and extract, at the analytics server, for each of the plurality of participants, vision information from the video data, at least one of tonal information or text information from the audio data (par. 0058-0061; “Video processing component 346 can extract the video data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Acoustic processing component 348 can extract the audio data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Textual processing component 350 can transcribe the audio from a conversation or utterance from a conversation into text and encapsulate it as a conversation feature for use by a machine learning system”; par. 0157; 0161; “Additional conversation features may be extracted from data modalities (e.g., video, audio, text, etc.). Examples of conversation features from the video data modality 1602 may include smiles, nods, laughter, posture (e.g., slanted, forward, backward, open, closed, expanded, deflated), head position, gestures, etc. Examples of conversation features from the audio data modality 1604 may include listener feedback (e.g., “ah-ha,” “um?”, “uh-huh,” etc.), paralanguage, vocal traffic signals (e.g., “go on,” “um . . . ,” “but!”, etc.), turn length, conversation percentage (e.g., percent of total conversation during which a speaker was active), etc.”); determine, at the analytics server, an aggregated representative participation score (RPS) based on the vision information and the at least one of the tonal information or the text information, for each of the plurality of participants, for a plurality of consecutive time intervals (par. 0069; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”; par. 0201; “User interface 2200 may further include progress score chart 2208, illustrating values of the performance score at preset intervals or after each conversation”; par. 0198; “conversation score 2202 and/or multiple sub-scores (e.g., an engagement score, an ownership score, a goal score, an interruptions score, a “time spent listening” score, an openness score, etc.) are generated by one or more machine learning systems based on identified conversation features (e.g., audio and video from a conversation, facial expressions, voice tone, key phrases, etc.) of the recorded conversation”; par. 0219; “generate a particular conversation analysis indicator (e.g., emotional score, emotional labels, ownership score, etc.)”); determine, at the analytics server: a first time interval from the first record, the first time interval comprising at least one of pronounced RPS for at least one of the plurality of participants, pronounced RPS for at least a predefined duration for at least one of the plurality of participants, or pronounced RPS for at least two of the plurality of participants, and a second time interval from the second record, the second time interval comprising at least one of pronounced RPS for at least one of the plurality of participants, pronounced RPS for at least a predefined duration for at least one of the plurality of participants, or pronounced RPS for at least two of the plurality of participants (par. 0067-0069; “the conversation features can include an embedding of the audio, video, or textual versions of the audio, tone, sound level, emotional characteristics (e.g., supportive, agreeable, combative, engaged, enthusiastic, passionate, uncertainty, etc.), effectiveness ratings, physical reactions or movements (e.g., eye gaze directions, participant postures, participant gestures, participant head positions, laughter, nodding, facial expressions, etc.); and/or identify particular significant phrases or word choices (e.g., mm-hmm, yes, yah, oh my god, huh, uhh, etc.)”; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”; par. 0108; “Conversation analysis indicators (e.g., 706, 708, 710) may be stored with timestamps, for correlation with the source utterance and acoustic/video data of the conversation. In other words, conversation analysis indicators may be stored in a series, based on a series (i.e., sequence) of utterances and/or concatenated speaker data. The stored conversation analysis indicators may be graphed, visualized, and analyzed in aggregate by conversation analytics system 400”); identify, from the first and the second time intervals, at least one phrase associated with at least one type of participation flow, wherein participation flow types include at least one of positive sentiment, neutral sentiment, negative sentiment, positive engagement, neutral engagement or negative engagement (par. 0067-0069; “the conversation features can include an embedding of the audio, video, or textual versions of the audio, tone, sound level, emotional characteristics (e.g., supportive, agreeable, combative, engaged, enthusiastic, passionate, uncertainty, etc.), effectiveness ratings, physical reactions or movements (e.g., eye gaze directions, participant postures, participant gestures, participant head positions, laughter, nodding, facial expressions, etc.); and/or identify particular significant phrases or word choices (e.g., mm-hmm, yes, yah, oh my god, huh, uhh, etc.)”; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”); and send, from the analytics server, to a user device or at least one of the plurality of multimedia devices, the at least one phrase for display (par. 0043; “The conversation analytics system provides a user interface for users to interrogate the conversation analysis indicators”; par. 0071; “conversation analytics system 400 generates multiple conversation analysis indicators (e.g., conversation scores, conversation analysis indicators, openness scores, engagement scores, ownership scores, goal scores, interruptions scores, “time spent listening” scores, and emotional labels) at block 406”).
Consider claim 13, Reece et al teach wherein the instructions further configure the apparatus to: generate, at the analytics server, for each of the first recording and the second recording, a baseline for each of the plurality of participants based on the aggregated RPS (par. 0078; “In some implementations, one or more of these scores can be provided in a mentee report with corresponding explanations and/or a baseline or comparison value so the mentee can interpret the values in terms of their progress, goals, or as a comparison to other mentees”); and adjust, at the analytics server, the at first time interval using the baseline of the first recording, and the second time interval using the baseline of the second recording (par. 0129; “conversation analytics system 400 may transmit identified utterances to annotator 405, and further receive utterances modified by a user (e.g., start/stop times of the utterance modified) in response”).
Consider claim 14, Reece et al teach wherein the vision information comprises at least one of sentiments, head nods, or disapprovals, wherein the tonal information comprises at least one of sentiments, empathy, politeness, speak rate, talk ratio, or talk over ratio, wherein the text information comprises at least one of sentiment, or hyper-relevant text keyphrases (HRTKs) (par. 0058-0061; “Video processing component 346 can extract the video data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Acoustic processing component 348 can extract the audio data from a conversation or utterance from a conversation and can encapsulate it as a conversation feature for use by a machine learning system”; “Textual processing component 350 can transcribe the audio from a conversation or utterance from a conversation into text and encapsulate it as a conversation feature for use by a machine learning system”; par. 0112; “the video data may be labeled with facial expressions (e.g., smiling, grimacing, crying, nodding) and/or emotional labels (happy, sad, aggressive, surprised, disappointed). These labels may further each have confidence scores, indicating the relative confidence of the video processing part in that particular label (e.g., decimal score between zero and one)”).
Consider claim 15, Reece et al teach wherein the RPS is determined by combining the vision information and at least one of tonal information or text information for a given time interval (par. 0069; “conversation analytics system 400 determines conversation analysis indicators. In some implementations, conversation analysis indicators can be one or more scores for the entire conversation or parts of the conversation, such as an overall effectiveness or quality rating for the conversation or parts of the conversation…The output for the sequences of utterances in a time window (e.g., where utterances overlap or for a set time period such as 5 seconds) can be combined into combined speaker features for the time window of the conversation”).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Any response to this action should be mailed to:
Mail Stop ____(explanation, e.g., Amendment or After-final, etc.)        	Commissioner for Patents
        			P.O. Box 1450
        			Alexandria, VA 22313-1450
		Facsimile responses should be faxed to:

(571) 273-8300

Hand-delivered responses should be brought to: 
Customer Service Window
Randolph Building
401 Dulany Street
Alexandria, VA 22314

Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUOC DUC TRAN whose telephone number is (571) 272-7511. The examiner can normally be reached Monday-Friday 8:30am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on (571) 272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Quoc D Tran/
Primary Examiner, Art Unit 2691
March 5, 2026
Read full office action
Prosecution Timeline

Jun 23, 2024
Application Filed
Jul 17, 2025
Response after Non-Final Action
Mar 05, 2026
Non-Final Rejection — §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/354,297
Patent 12598268
STAGE USER REPLACEMENT TECHNIQUES FOR ONLINE VIDEO CONFERENCES
2y 5m to grant Granted Apr 07, 2026
18/458,276
Patent 12598251
PREVENTING DEEP FAKE VOICEMAIL SCAMS
2y 5m to grant Granted Apr 07, 2026
18/298,443
Patent 12592989
DETECTING A SPOOFED CALL
2y 5m to grant Granted Mar 31, 2026
18/634,280
Patent 12593011
APPARATUS AND METHODS FOR VISUAL SUMMARIZATION OF VIDEOS
2y 5m to grant Granted Mar 31, 2026
18/360,188
Patent 12581033
ENFORCING A LIVENESS REQUIREMENT ON AN ENCRYPTED VIDEOCONFERENCE
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
90%
With Interview (+4.8%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 841 resolved cases by this examiner. Grant probability derived from career allow rate.