Last updated: April 19, 2026

Application No. 18/415,024

DETERMINING VIDEO CALL EFFECTIVENESS SCORES FOR ACCOMPLISHING TARGET GOALS

Final Rejection §103§DP

Filed

Jan 17, 2024

Examiner

NGUYEN, PHUNG HOANG JOSEPH

Art Unit

2691

Tech Center

2600 — Communications

Assignee

Dropbox Inc.

OA Round

2 (Final)

Interview Optional

— +32.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 877 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, PHUNG HOANG JOSEPH View full profile →

Grants 79% — above average

Career Allow Rate

694 granted / 877 resolved

+17.1% vs TC avg

Strong +32% interview lift

Without

With

+32.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

32 currently pending

Career history

909

Total Applications

across all art units

Statute-Specific Performance

§101

5.6%

-34.4% vs TC avg

§103

56.8%

+16.8% vs TC avg

§102

15.2%

-24.8% vs TC avg

§112

8.2%

-31.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 877 resolved cases

Office Action

§103 §DP

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

 				Double Patenting
The previous double patenting rejection remains outstanding since applicant did not address this issue in the latest Remark (12/18/25). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Peters in view of Vasylyev OR Van Rensburg and further in view of Agley (from IDS) OR Takiel. 

Claims 1, 8 and 15,  Peters teaches a method, a system and a medium comprising: 
a.	extracting, utilizing a video call prediction model, natural language from an agenda for a video call of a user account within a content management system; (Peters, Fig. 1, [0169] – a video conference call where each of the multiple remote endpoint devices can provide media streams over a communication network where content of plurality of topics (i.e., audio and/or visual characteristics) is shared among the participants to achieve a target result of common interest and objective.  Furthermore, via Fig. 19E, Peters uses various type of machine learning models to make predictions and recommendations to steer communication sessions toward desirable emotional and cognitive states that promote desirable outcomes or avoid undesirable outcomes, [0460-0463] which suggests comprising natural language.  
Here examiner notices that Peters does not use the term “agenda”.  Agenda is however a well-known practice in the art of conferencing communication where an invitation is formed and extended to the participant as well as issues to be discussed.  Specifically Peters does teach “different topics, specific items of content, [0421], time of day and meeting duration, [0456] for an efficiency and effectiveness of a communication session, [0003].  Thus, examiner wishes to provide additional references to support the well-known practices.  For example, 
Vasylyev provides a meeting’s agenda where an AI assistant can analyze/track the progress of the meeting, [0443] including topics for discussion, [0184], ensuring that all the critical agenda items are given necessary attention, [0456], OR
Van Rensburg teaches an conference invitation with an agenda including topics of discussion and timestamp for each topics, [0080]). 

b.	generating, utilizing the video call prediction model, a target goal embedding from the natural language from the agenda that encodes a target goal for the video call; (Peters: “metadata embedded in or accompany the video data” [0478, 0509] for communication for specific or certain topic, [0235] to achieve goals and objectives, [0149, 0163, 0368]);  

c.	generating, utilizing the video call prediction model, from video call data captured from multiple video streams of participating client devices during the video call, a plurality of topic discussion embeddings that encode topics discussed during the video call; (see steps a and b); and 

 (d)	generating, utilizing the video call prediction model to compare the plurality of topic discussion embeddings and the target goal embedding in an embedding space of the video call prediction model, a video call effectiveness score indicating a probability of accomplishing the target goal during the video call by determining a distance in the embedding space between a topic discussion embedding of the plurality of topic discussion embeddings and the target goal embedding.
Peters: machine learning model 2060 can be trained to output predictions of emotional/cognitive states, [0460] leading to desired outcome, encouraging development of proper states, [0464], to improve engagement and emotion among the participants and facilitate more effective communication, [0405] based on the score of the best/most effective communication/communicator, [00456-0457].	Furthermore, Peters, via Fig. 19E, uses various type of machine learning models to make predictions and recommendations to steer communication sessions toward desirable emotional and cognitive states that promote desirable outcomes or avoid undesirable outcomes, [0460-0463]. As evidence provided, Peters teaches “metadata embedded in or accompany the video data” [0477, 0509] for communication for specific or certain topic, [0235] to achieve goals and objectives, [0149, 0163, 0368].  Peters does not teach “determining a distance in the embedding space between a topic discussion embedding of the plurality of topic discussion embeddings and the target goal embedding.
Agley teaches, “utilize several threshold distances to identify content items which are highly relevant to the request,.. creation of embedding vectors representing each content item in the first knowledge space, and calculating a center of the first knowledge space by averaging the embedding vectors, [0083, 0084, 0087]) OR
Takiel, via Fig. 3, teaches in some implementations, nodes representing topics being raised in the current dialog turn (whether they are newly added or updated) may be identified, e.g., as nodes A. A measure of relevance of a given topic may be determined during a current dialog turn by calculating, for each node in directed graph 364, the shortest distance d.sub.i to any node belonging to A. Additionally, a difference a.sub.i between the current dialog turn number and the node's turn number may also be calculated. The node's (and hence, topic's) measure of relevance may be set to the minimum of d.sub.i and a.sub.i., [0074-0078].

It would have been obvious to the ordinary artisan before the effective filing date to make a  minor modification to Peters’ teaching to include the explicit and precise use of term/language in Vasylyev or Van Rensburg for the purpose of eliminating any potential dispute or unnecessary dialogues and also to include  the teaching of Agley for the purpose of identifying a plurality of content items in a second knowledge space relevant to the concept and receiving a selection of one or more of the content items for addition to the first knowledge space, thus advantageously assisting users in evaluating the purchasing content and also to include Takiel’s teaching to utilize a context of an ongoing human-to-computer dialog to enhance the ability of an automated assistant to interpret and respond to determining shortest distance and the relevance of node/topic.
Claim 2 . The method of claim 1, wherein extracting the target goal embedding comprises utilizing the video call prediction model to generate a vector representation of the agenda from a plurality of meeting items included in the agenda.  (Peters: The analysis process may include machine learning, including training of predictive models to learn relationships among these items. The system then uses the results of the analysis to generate and provide recommendations to participants to improve communication sessions, [0416];  Van Rensburg: an agenda including topics of discussion and timestamp for each topics, [0080]; Vasylyev: The respective representations of the ongoing conversation may be provided with multiple timestamps. In the context of storing audio data and its transcripts and other forms of representation, timestamps may refer to the specific times at which certain words or sounds occur in the recorder audio file or stream. The timestamps may be configured to provide a link between the transcribed text, its tokenized form, and its occurrence in the audio stream. For example, in the recorded audio data, each sample can have a corresponding timestamp which refers to the point in time at which that sample occurs, [0295]).

Claim 3. The method of claim 1, wherein generating the plurality of topic discussion embeddings comprises: utilizing the video call prediction model to extract a first vector representation of a first topic discussed at a first timestamp during the video call; and utilizing the video call prediction model to extract a second vector representation of a second topic discussed at a second timestamp during the video call.  (Vasylyev: The respective representations of the ongoing conversation may be provided with multiple timestamps. In the context of storing audio data and its transcripts and other forms of representation, timestamps may refer to the specific times at which certain words or sounds occur in the recorder audio file or stream. The timestamps may be configured to provide a link between the transcribed text, its tokenized form, and its occurrence in the audio stream. For example, in the recorded audio data, each sample can have a corresponding timestamp which refers to the point in time at which that sample occurs, [0295]).

Claim 4. The method of claim 1, wherein generating the video call effectiveness score comprises comparing the target goal embedding with the plurality of topic discussion embeddings to determine a probability of accomplishing the target goal during the video call.  (Peters: The scores may be expressed in any appropriate way. Examples of types of scores include (1) a binary score (e.g., indicating whether or not an attribute is present with at least a threshold level); (2) a classification (e.g., indicating that an attribute is in a certain range, such as low happiness, medium happiness, or high happiness); (3) a numerical value indicating a level or degree of an attribute (e.g., a numerical value along a range, such as a score for happiness of 62 on a scale from 0 to 100), [0383]).

Claim 5 and 19. The method of claim 1, further comprising: determining, from the video call data captured from the multiple video streams, a plurality of user accounts attending the video call; and generating the video call effectiveness score based on the plurality of user accounts attending the video call.  (Peters: teaches that the user state data is generated by the respective endpoint devices each processing video data that the respective endpoint devices captured during the communication session, [0026] during a conference with score for the total group of participants, [0521]).
Claims 6, 12 and 20, Peter does not teach, “detecting a discussion dissonance by determining that fewer than a threshold number of topic discussion embeddings are within a threshold distance of each other within the embedding space; and generating the video call effectiveness score based on the discussion dissonance”.  Takiel, via Fig. 3, teaches in some implementations, nodes representing topics being raised in the current dialog turn (whether they are newly added or updated) may be identified, e.g., as nodes A. A measure of relevance of a given topic may be determined during a current dialog turn by calculating, for each node in directed graph 364, the shortest distance d.sub.i to any node belonging to A. Additionally, a difference a.sub.i between the current dialog turn number and the node's turn number may also be calculated. The node's (and hence, topic's) measure of relevance may be set to the minimum of d.sub.i and a.sub.i., [0074-0078] where each grammar may be associated both with a topic and a threshold relevance score for that topic. If the topic is persisted in the contextual data structure but its relevance score does not satisfy the threshold, the grammar may not be selected. This enables fine-tuning of when grammars will be applied and when they won't, [0010].

Claims 7, 14 and 17. The method of claim 1, further comprising generating, for display during the video call on a client device associated with the user account, a notification comprising a prompt for improving the video call effectiveness score.  (Peters: The analysis results can characterize the state of the communication session and its participants, as well provide recommendations or recommendations to improve the efficiency and effectiveness of the communication session. The techniques described herein can provide various improvements to traditional video conferences, [0003]).

Claim 9. The system of claim 8, further comprising instructions that, when executed by the at least one processor, cause the system to: generate agenda item embeddings from items in the agenda utilizing the video call prediction model; generate the plurality of topic discussion embeddings from the video call data in real time as topics are discussed during the video call; and compare the plurality of topic discussion embeddings with the agenda item embeddings to predict accomplishment of the items in the agenda.  (Peters: The analysis results can show how the participants are responding to the lecture, e.g., overall engagement level, average levels of emotion across the participants, distribution of participants in different classifications or categories (e.g., classifications for high engagement, moderate engagement, and low engagement), how engagement and emotion levels compare to prior lectures involving the same or different people, etc., [0284]; the system 1510 can compare the real-time monitored emotional and cognitive states of students in the class with the profile or range of emotional and cognitive states predicted to result in good learning outcomes. When the system determines that the students' emotional and cognitive states are outside a desired range for good results, the system 1510 can generate a recommendation for an action to improve the emotional and cognitive states of the students, and thus better facilitate the desired educational outcomes. The action can be selected by the system 1510 based on scores for outcomes, based on output of a machine learning model, or other technique. The system 1510 then sends the recommendation for presentation on the teacher's client device, [0437]).

Claim 10. The system of claim 8, further comprising instructions that, when executed by the at least one processor, cause the system to utilize the video call prediction model to: classify, based on the agenda, the video call into a video call category from among a set of video call categories comprising a decision category, an information gathering category, and an information sharing category; and generate the video call effectiveness score based on determining, from the video call data captured from the multiple video streams of the video call, one or more category parameters met for the video call category.  (Peters:  distance learning and remote educational interactions become more common to provide recommendations for specific actions, customized or selected for the particular emotional context and state of the audience, that allow the presenter to act in an emotionally intelligent way. In other words, the system guides the presenter to appropriately respond to and address the needs of the audience due to the emotions and experience at the current time, even if the presenter does not have the information or capability perceive and address those needs, [0376]).

Claims 11 and 18. The system of claim 8, further comprising instructions that, when executed by the at least one processor, cause the system to: generate the video call effectiveness score at a first timestamp of the video call by comparing the target goal embedding with the plurality of topic discussion embeddings to determine a probability of accomplishing the target goal during the video call, (Peters: For each of the items measured (e.g., emotion levels, sentiment, attention, engagement, collaboration, detected gestures, and so on), the server 1510 can store the progression of values over time, with the scores for different endpoints 2202a-2202c or participants aligned and synchronized by timestamps, [0493); and update the video call effectiveness score at a second timestamp of the video call based on comparing the target goal embedding with new topic discussion embeddings for new topic discussions after the first timestamp of the video call.  (Peters:  repeatedly (i) obtaining updated participant scores for the participants as additional image data or video data captured for the respective participants during the communication session, (ii) generating an updated aggregate representation of the emotional states or levels of engagement of the set of multiple participants based on the updated participant scores, and (iii) providing updated output data indicative of the updated aggregate representation, [0097]).

Claims 13 and 16 when executed by the at least one processor, cause the at least one processor to generate the video call effectiveness score by: determining a trajectory for the video call based on a number of agenda items accomplished up to a timestamp within the video call; and determining a predicted percentage of agenda items that will be accomplished by a scheduled end of the video call based on the trajectory.  (Vasylyev:  Assistant system 2 may even further employ machine learning models with predictive caching and prefetching to predict which parts of the context are likely to be needed in the near future based on the conversation's trajectory and the user's behavior, [0244]; the system can more accurately anticipate the direction of the conversation and adjust its pre-fetching strategies accordingly, [0236]).

Response to Arguments
Applicant’s arguments with respect to claim(s) 12/18/25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant argues that Peters does not teach the newly amended independent claims such as Peters fails to disclose all of the limitations of currently amended independent claims 1, 8, and 15. For example, Peters fails to disclose "generating, utilizing the video call prediction model, a target goal embedding from the natural language from the agenda that encodes a target goal for the video call, generating, utilizing the video call prediction model, from video call data captured from multiple video streams of participating client devices during the video call, a plurality of topic discussion embeddings that encode topics discussed during the video call," or "generating, utilizing the video call prediction model to compare the plurality of topic discussion embeddings and the target goal embedding in an embedding space of the video call prediction model, a video call effectiveness score indicating a probability of accomplishing the target goal during the video call by determining a distance in the embedding space between a topic discussion embedding of the plurality of topic discussion embeddings and the target goal embedding," as recited by currently amended independent claim 1 and as similarly recited by independent claims 8 and 15. (see Remark for more detail). 

 Examiner respectfully disagrees as examiner has produced additional reference addressing the applicant’s remark and concern. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHUNG-HOANG J. NGUYEN whose telephone number is (571)270-1949. The examiner can normally be reached Reg. Sched. 6:00-3:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/PHUNG-HOANG J NGUYEN/Primary Examiner, Art Unit 2691

Read full office action

Prosecution Timeline

Jan 17, 2024

Application Filed

Oct 30, 2025

Non-Final Rejection — §103, §DP

Nov 19, 2025

Interview Requested

Nov 21, 2025

Applicant Interview (Telephonic)

Nov 21, 2025

Examiner Interview Summary

Dec 18, 2025

Response Filed

Feb 11, 2026

Final Rejection — §103, §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/843,731

Patent 12598256

DISRUPTED-SPEECH MANAGEMENT ENGINE FOR A MEETING MANAGEMENT SYSTEM

2y 5m to grant Granted Apr 07, 2026

18/518,577

Patent 12591408

DISPLAY APPARATUS AND METHOD INCORPORATING INTEGRATED SPEAKERS WITH ADJUSTMENTS

2y 5m to grant Granted Mar 31, 2026

17/989,972

Patent 12587612

Method and Device for Invoking Public or Private Interactions during a Multiuser Communication Session

2y 5m to grant Granted Mar 24, 2026

18/256,155

Patent 12587705

LIVESTREAMING AUDIO PROCESSING METHOD AND DEVICE

2y 5m to grant Granted Mar 24, 2026

18/629,549

Patent 12587700

GROUPING IN A SYSTEM WITH MULTIPLE MEDIA PLAYBACK PROTOCOLS

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

79%

Grant Probability

99%

With Interview (+32.1%)

2y 9m

Median Time to Grant

Moderate

PTA Risk

Based on 877 resolved cases by this examiner. Grant probability derived from career allow rate.

DETERMINING VIDEO CALL EFFECTIVENESS SCORES FOR ACCOMPLISHING TARGET GOALS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email