Last updated: April 19, 2026
Application No. 18/634,588
GENERATING IMAGE SCENARIOS BASED ON LLM PROMPTS

Final Rejection §103
Filed
Apr 12, 2024
Examiner
WANG, JIN CHENG
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Snap Inc.
OA Round
2 (Final)
Interview Optional

— +10.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 832 resolved cases, 2023–2026
Examiner Intelligence

WANG, JIN CHENG View full profile →
Grants 59% of resolved cases
Career Allow Rate
492 granted / 832 resolved
-2.9% vs TC avg
Moderate +10% lift
Without
With
+10.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
40 currently pending
Career history
872
Total Applications
across all art units
Statute-Specific Performance

§101
11.8%
-28.2% vs TC avg
§103
62.7%
+22.7% vs TC avg
§102
7.6%
-32.4% vs TC avg
§112
15.5%
-24.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 832 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
Applicant’s submission filed 12/11/2025 has been entered. The claims 1, 3, 8, 9, 18, 19 and 20 have been amended. The claims 1-20 are pending in the current application. 

Response to Arguments
Applicant's arguments filed 12/11/2025 have been fully considered but they are not persuasive. 
In Remarks, applicant repeated the new claim limitation and made general allegation that Lu or Mikutel does not teach the new claim limitation of comprising automatically adding the individual content item to a sharable content item feed for presentation to a user using the interaction application. The examiner cannot concur. 
Lu teaches the claim limitation: presenting an individual content item corresponding to an individual scenario of the plurality of scenarios comprising automatically adding the individual content item to a sharable content item feed for presentation to a user using the interaction application (
Lu teaches at Paragraph 0071 that responses may be interpreted by the copilot component 1328 to determine if additional prompts are to be issued and additional prompts may be issued and responses 1338 may be received and a first prompt may obtain the list of ideas and the LLM 1330 may be queries for each idea to find the pros and cons of that idea. Multiple prompts may be to produce a summary and then further prompts may be utilized to apply that summary to a particular query. 
Lu teaches at Paragraph 0086 that the system may construct a third prompt to the machine-learned large language model, the third prompt including the response and second response and third instructions to summarize the response and second response. At operation 1630, the copilot submits the third prompt to the machine-learned large language model. At operation 1635, the copilot receives a third response from the machine-learned large language model, the third response being a summary of the entire communication session up to the point the request for the summary was received. At operation 1640, the copilot determines the result based upon the third response. In some examples, the third response may be the result. In other examples, the system may utilize the third result in another prompt to the LLM-for example, issuing a query on the summary and the response to that prompt may be the result.
Lu teaches at FIG. 14 and Paragraph 0077 that the collaborative copilot is a shared experience with all participants able to see and interact with a common copilot. The collaborative copilot may have a summary pane 1410 that shows a summary of goals, key topics, varying opinions, and other information about the communication session. The collaborative copilot may also include selectable graphical elements (in interaction application) in another pane 1412 that allow participants to ask copilot questions such as “what's missing from the discussion?” “suggest new ideas related to topics discussed,” “what topics need more clarification,” “what are unresolved questions,” and other options to explore. In some examples, the functionality of the collaborative view (of sharable content item) may be a restricted functionality of the private copilot. For example, queries about people in the meeting may not be allowed in the collaborative view.
Lu teaches at FIG. 5 and Paragraph 0059 that the GUI 510 shows a prompt “who is speaking now” at 540 and the response 540 includes a name as well as information about the current speaker wherein the response 540 also includes the time (today’s date) of the response and at FIG. 6 and Paragraph 0060 that GUI 600 is a continuation of GUI 600 (of interaction application) where the user has asked “what questions can I ask Nicholas” and the copilot has responded at box 642 with possible questions (ideas)----what are the specific dates and milestones that affect the software routine?----what are the common themes or overlaps among the different modules? and at FIG. 8 and Paragraph 0062 and the GUI 800 is a continuation of GUI 800 where the user has asked “capture actions items” at 840 and the copilot has responded at box 842 with action items (ideas)).
Mikutel teaches the claim limitation of presenting an individual content item corresponding to an individual scenario of the plurality of scenarios comprising automatically adding the individual content item to a sharable content item feed for presentation to a user using an interaction application (Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant (using an interaction application) selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0022 that responses from the AI-LLM 122 to the combined AI requests 316 are received by the application 112 in the server 110 by an AI response receiver module 320 and then sent to the visual collaboration applications 134A-134N of the participants 132A-132N as text and/or visual image responses via an AI response transmitter module 322. 
Mikutel teaches at FIG. 5 that the idea can be selected in the dropdown menu 530 (of interaction application) to show a plurality of scenarios (meetings) that are relevant to the selected idea and each of the meetings describes a scene that depicts one or more participants by processing the selected idea “Shoe Slogan Ideation”. 
Mikutel teaches at Paragraph 0027 that In FIG. 5, the project that is the subject of the collaboration is identified in a collaboration subject window 530 (in this example as “Shoe slogan ideation”). A meeting chat area 535, separate from the brainstorming canvas 515, can also be provided for communication between the collaboration participants separate from the collaboration using the brainstorming canvas).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1, and 5-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. US-PGPUB No. 2024/0290331 (hereinafter LU) in view of Mikutel et al. US-PGPUB No. 2024/0311576 (hereinafter Mikutel); 
Graham et al. US-PGPUB No. 2024/0346731 (hereinafter Graham); 
White US-PGPUB No. 2024/0411906 (hereinafter White); 
Inkpen et al. US-PGPUB No. 2025/0157103 (hereinafter Inkpen); Rivera-Rodriguez US-Patent No. 12,063,123 (hereinafter Rodriguez); Hassan et al. US-PGPUB No. 2025/0260770 (hereinafter Hasan). 
Re Claim 1: 
Lu suggests a method comprising:
generating, by a device of a user, a first prompt comprising a demographic of a person and a date (Lu teaches at Paragraph 0055 that a text box 120 allows users to enter custom language queries and custom free text prompts to the LLM may allow participants to ask the copilot any question about the meeting in progress, past meetings. Lu teaches at Paragraph 0068 a current live transcript 1310 of the communication session, user data 1312, other communication session metadata 1314 (participant locations (demographic), number of participants, time of the session, current duration of the session, communication session agenda, participant information, session title, and the like), transcripts of past relevant sessions 1316 (e.g., if the communication session is a recurring communication session other past communication sessions of the series; other communication sessions with similar or a same subject; other communication sessions with similar or a same title and/or agenda, or the like), a conversation history between the user and the copilot in this session and/or other relevant sessions 1318, media shared during the communication session 1319 (including files, screen sharing, videos, communication session chat history between participants, and the like), and the like. 
Lu teaches at FIG. 5 and Paragraph 0059 that the GUI 510 shows a prompt “who is speaking now” at 540 and the response 540 includes a name as well as information about the current speaker wherein the response 540 also includes the time (today’s date) of the response and at FIG. 6 and Paragraph 0060 that GUI 600 is a continuation of GUI 600 where the user has asked “what questions can I ask Nicholas” and the copilot has responded at box 642 with possible questions (ideas)----what are the specific dates and milestones that affect the software routine?----what are the common themes or overlaps among the different modules?);
processing the first prompt by a large language model (LLM) to generate a plurality of ideas relevant to the person on that date, each idea comprising a respective description and vibe (
Lu teaches at FIG. 5 and Paragraph 0059 that the GUI 510 shows a prompt “who is speaking now” at 540 and the response 540 includes a name as well as information about the current speaker wherein the response 540 also includes the time (today’s date) of the response and at FIG. 6 and Paragraph 0060 that GUI 600 is a continuation of GUI 600 where the user has asked “what questions can I ask Nicholas” and the copilot has responded at box 642 with possible questions (ideas)----what are the specific dates and milestones that affect the software routine?----what are the common themes or overlaps among the different modules? and at FIG. 8 and Paragraph 0062 and the GUI 800 is a continuation of GUI 800 where the user has asked “capture actions items” at 840 and the copilot has responded at box 842 with action items (ideas) and at FIG. 9 and Paragraph 0063 that the GUI 900 shows a table of pros and cons 940 for each idea expressed during the communication session and at FIG. 11 that each idea shows the description of an agenda and vibe (pros));
generating a second prompt comprising a selected idea from the plurality of ideas and a request for a plurality of scenarios that are relevant to the selected idea, each of the plurality of scenarios describing a scene that depicts one or more participants (
Lu teaches at Paragraph 0071 that Multiple rounds of prompts and responses may be issued by the copilot component 1328 to the LLM 1330 for a single user query 1320 depending on the query. For example, to generate the pros and cons for each idea, a first prompt may obtain the list of ideas, and the LLM 1330 may be queried for each idea to find the pros and cons (scenarios) of that idea. Once the copilot component 1328 has the requisite information to answer the query, the copilot component 1328 may send a response 1322 to the participant. 
Lu teaches at Paragraph 0072 that if the user asks one of those follow up questions, the answer may be served from the response cache 1352 of the copilot component 1328.
Lu teaches at Paragraph [0076] Participants may provide user feedback 1326, which may be edits to the responses and/or suggested questions, and the like. This feedback may be used by the copilot component 1328 to update the intermediate model 1350 and/or refine the LLM 1330. When a user edits a response the system stores the changes and the uses the edits as additional input to the LLM. The edits are given in a different section in the prompt and then we ask the model to take these changes into account when answering the next question. This way copilot can fix erroneous utterances/words in the transcription, fix errors or emphasize specific details. 
Lu teaches at Paragraph 0082 that, to create the pros and cons list, the copilot may first query the LLM for the ideas expressed. Then for each idea, the copilot may query the pros and cons discussed for each idea. 
Lu teaches at Paragraph 0035 that a communication session transcript may include chronologically ordered content items with each content item representing a communication (e.g., a spoken statement) of a meeting participant during a meeting….each content item may also include information identifying the person who made the communication, and the time during the meeting at which the communication was made. 
Lu teaches a request for a plurality of communication sessions that are relevant to the selected idea, each of the plurality of communication sessions describes a scene that depicts one or more participants. 
Lu teaches at Paragraph 0027 that the copilot may utilize video of previous communication sessions that are related to the current communication session and the copilot may utilize the communication session transcript, chats, files, and/or any audio or video shared during the current communication session. 
Lu teaches at Paragraph 0071 that responses may be interpreted by the copilot component 1328 to determine if additional prompts are to be issued and additional prompts may be issued and responses 1338 may be received and a first prompt may obtain the list of ideas and the LLM 1330 may be queries for each idea to find the pros and cons of that idea. Multiple prompts may be to produce a summary and then further prompts may be utilized to apply that summary to a particular query. 
Lu teaches at Paragraph 0077 that the collaborative copilot may also include selectable graphical elements in another pane 1412 that allow participants to ask copilot questions such as “what's missing from the discussion?” “suggest new ideas related to topics discussed,” “what topics need more clarification,” “what are unresolved questions,” and other options to explore. In some examples, the functionality of the collaborative view may be a restricted functionality of the private copilot. For example, queries about people in the meeting may not be allowed in the collaborative view.
Lu teaches at Paragraph 0086 that the system may construct a third prompt to the machine-learned large language model, the third prompt including the response and second response and third instructions to summarize the response and second response. At operation 1630, the copilot submits the third prompt to the machine-learned large language model. At operation 1635, the copilot receives a third response from the machine-learned large language model, the third response being a summary of the entire communication session up to the point the request for the summary was received. At operation 1640, the copilot determines the result based upon the third response. In some examples, the third response may be the result. In other examples, the system may utilize the third result in another prompt to the LLM-for example, issuing a query on the summary and the response to that prompt may be the result.
Lu teaches at FIG. 5 and Paragraph 0059 that the GUI 510 shows a prompt “who is speaking now” at 540 and the response 540 includes a name as well as information about the current speaker wherein the response 540 also includes the time (today’s date) of the response and at FIG. 6 and Paragraph 0060 that GUI 600 is a continuation of GUI 600 where the user has asked “what questions can I ask Nicholas” and the copilot has responded at box 642 with possible questions (ideas)----what are the specific dates and milestones that affect the software routine?----what are the common themes or overlaps among the different modules? and at FIG. 8 and Paragraph 0062 and the GUI 800 is a continuation of GUI 800 where the user has asked “capture actions items” at 840 and the copilot has responded at box 842 with action items (ideas));
automatically processing the second prompt by the LLM to generate the plurality of scenarios that are relevant to the selected idea (
Lu teaches at Paragraph 0071 that responses may be interpreted by the copilot component 1328 to determine if additional prompts are to be issued and additional prompts may be issued and responses 1338 may be received and a first prompt may obtain the list of ideas and the LLM 1330 may be queries for each idea to find the pros and cons of that idea. Multiple prompts may be to produce a summary and then further prompts may be utilized to apply that summary to a particular query. 
Lu teaches at Paragraph 0086 that the system may construct a third prompt to the machine-learned large language model, the third prompt including the response and second response and third instructions to summarize the response and second response. At operation 1630, the copilot submits the third prompt to the machine-learned large language model. At operation 1635, the copilot receives a third response from the machine-learned large language model, the third response being a summary of the entire communication session up to the point the request for the summary was received. At operation 1640, the copilot determines the result based upon the third response. In some examples, the third response may be the result. In other examples, the system may utilize the third result in another prompt to the LLM-for example, issuing a query on the summary and the response to that prompt may be the result.
Lu teaches at FIG. 5 and Paragraph 0059 that the GUI 510 shows a prompt “who is speaking now” at 540 and the response 540 includes a name as well as information about the current speaker wherein the response 540 also includes the time (today’s date) of the response and at FIG. 6 and Paragraph 0060 that GUI 600 is a continuation of GUI 600 where the user has asked “what questions can I ask Nicholas” and the copilot has responded at box 642 with possible questions (ideas)----what are the specific dates and milestones that affect the software routine?----what are the common themes or overlaps among the different modules? and at FIG. 8 and Paragraph 0062 and the GUI 800 is a continuation of GUI 800 where the user has asked “capture actions items” at 840 and the copilot has responded at box 842 with action items (ideas)); and
presenting an individual content item corresponding to an individual scenario of the plurality of scenarios comprising automatically adding the individual content item to a sharable content item feed for presentation to a user using the interaction application (Lu teaches at Paragraph 0071 that responses may be interpreted by the copilot component 1328 to determine if additional prompts are to be issued and additional prompts may be issued and responses 1338 may be received and a first prompt may obtain the list of ideas and the LLM 1330 may be queries for each idea to find the pros and cons of that idea. Multiple prompts may be to produce a summary and then further prompts may be utilized to apply that summary to a particular query. 
Lu teaches at Paragraph 0086 that the system may construct a third prompt to the machine-learned large language model, the third prompt including the response and second response and third instructions to summarize the response and second response. At operation 1630, the copilot submits the third prompt to the machine-learned large language model. At operation 1635, the copilot receives a third response from the machine-learned large language model, the third response being a summary of the entire communication session up to the point the request for the summary was received. At operation 1640, the copilot determines the result based upon the third response. In some examples, the third response may be the result. In other examples, the system may utilize the third result in another prompt to the LLM-for example, issuing a query on the summary and the response to that prompt may be the result.
Lu teaches at FIG. 14 and Paragraph 0077 that the collaborative copilot is a shared experience with all participants able to see and interact with a common copilot. The collaborative copilot may have a summary pane 1410 that shows a summary of goals, key topics, varying opinions, and other information about the communication session. The collaborative copilot may also include selectable graphical elements (in interaction application) in another pane 1412 that allow participants to ask copilot questions such as “what's missing from the discussion?” “suggest new ideas related to topics discussed,” “what topics need more clarification,” “what are unresolved questions,” and other options to explore. In some examples, the functionality of the collaborative view (of sharable content item) may be a restricted functionality of the private copilot. For example, queries about people in the meeting may not be allowed in the collaborative view.
Lu teaches at FIG. 5 and Paragraph 0059 that the GUI 510 shows a prompt “who is speaking now” at 540 and the response 540 includes a name as well as information about the current speaker wherein the response 540 also includes the time (today’s date) of the response and at FIG. 6 and Paragraph 0060 that GUI 600 is a continuation of GUI 600 (of interaction application) where the user has asked “what questions can I ask Nicholas” and the copilot has responded at box 642 with possible questions (ideas)----what are the specific dates and milestones that affect the software routine?----what are the common themes or overlaps among the different modules? and at FIG. 8 and Paragraph 0062 and the GUI 800 is a continuation of GUI 800 where the user has asked “capture actions items” at 840 and the copilot has responded at box 842 with action items (ideas)).

Mikutel in view of Rodriguez/Graham/White teaches a method comprising:
generating, by a device of a user, a first prompt [comprising a demographic of a person and a date] (
Graham teaches at Paragraph 0014 that one or more AI models may be trained to receive prompt to generate output data in response to the received prompt. 
White teaches at 0041 that input as used herein invokes processing by an ML model (e.g., according to an associated prompt template) to process a given conferment input (e.g., as may be received from a user or as may be intermediate output). In examples, context is processed as part of the ML model evaluation. For example, an input may include an indication as to context or define a context that is provided to the ML model, and/or a chain orchestrator. 
Rodriguez teaches at FIG. 7 and column 13, lines 9-45 that the text-based transcript includes multiple textual elements with each text-based element including the time (date) at which the spoken message was recorded, an identifier for the meeting participant and a name of the meeting participant. 
Mikutel teaches at Paragraph 0022 that the server 110 includes a receiver module 312 to receive natural language requests from the client devices and at Paragraph 0026 that natural language requests are received from the client devices 130A-130N via the receiver module 312 of FIG. 3. In step 430 the received natural language requests are combined in the AI request generator 314 (see FIG. 3) with context prompts from the prompt generation system 114 of FIG. 1 to produce the combined AI request 316. This combined AI request 316 is sent to the AI-LLM 122 in step 440. Following this, the AI response from the AI-LLM 122 is received by the application 112 and then sent to the client devices 130A-130N via the receiver module 320 and transmitter module 322 of FIG. 3. 
Mikutel teaches at Paragraph 0025 that a first step 410 of the application 112 is to send a collaboration template, including a brainstorming canvas, via a collaboration template generator 310 to the client devices 130A-130N for the participants 130A-130N to collaborate with one another via the application 112 in the server 110, and to communicate with the AI-LLM 122 via the server 110 and the network 140. As will be discussed further below, the template includes a selection element, such as a dialog box, configured to activate an artificial intelligence (AI) chat interface to receive natural language commands from at least one of the participants to be transmitted to the AI-LLM (i.e., an AI system). 
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant selects to insert ideas received from the AI-LLM on a canvas);
processing the first prompt by a large language model (LLM) to generate a plurality of ideas relevant to the person on that date, each idea comprising a respective description (
Graham teaches at Paragraph 0015 that a user can prompt the AI model by describing a subject/person and the prompted AI model generates output data representing synthetic content and at Paragraph 0016 that the AI model is prompted with body part data representing a source body part of an actor exhibiting facial expressions and/or the AI-generated synthetic face exhibiting facial expressions can be fed back into the AI model and at Paragraph 0019 that the director of a movie might prompt a trained AI model to de-age a famous actor who is performing a scene and can speak into a microphone to prompt the AI model saying “make John look 35 years old” and the content is displayed in real-time can be generated entirely by the AI model. 
White teaches at 0041 that input as used herein invokes processing by an ML model (e.g., according to an associated prompt template) to process a given conferment input (e.g., as may be received from a user or as may be intermediate output). In examples, context is processed as part of the ML model evaluation. For example, an input may include an indication as to context or define a context that is provided to the ML model, and/or a chain orchestrator. 
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Mikutel teaches at FIG. 7 and Paragraph 0030 that the AI response is provided, using the steps discussed above with regard to FIGS. 1-4, giving the requesting participant 132A five marketing slogans. As noted in the overlay area 540, the AI response window interface 710 allows users to edit prompts and pick favorite responses to insert in the brainstorming area 515. In other words, the participant 132A can choose to place all five marketing slogans suggested in the AI response in the brainstorming area 515 (or in one or more of the theme areas 520-526), or only to place selected ones of the five in one of these areas.
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0022 that responses from the AI-LLM 122 to the combined AI requests 316 are received by the application 112 in the server 110 by an AI response receiver module 320 and then sent to the visual collaboration applications 134A-134N of the participants 132A-132N as text and/or visual image responses via an AI response transmitter module 322. 
Mikutel teaches at FIG. 5 that the idea can be selected in the dropdown menu 530 to show a plurality of scenarios (themes) that are relevant to the selected idea and each of the themes describes a scene that depicts one or more participants);
generating a second prompt comprising a selected idea from the plurality of ideas and a request for a plurality of scenarios that are relevant to the selected idea, each of the plurality of scenarios describing a scene that depicts one or more participants (
White teaches at Paragraph [0054] FIG. 3C depicts a user interface 300C of conferencing session 302 at time T3. As illustrated, a brainstorming session is being conducted by the participants during the conferencing session 302. As will be appreciated, based on the confidentiality level of the second participant 306, at least some content shared by the first participant 304 and the third participant 308 during the brainstorming session may not be available to the second participant 306. For example, text boxes 324 have been overwritten with pseudo text to obscure confidential content from disclosure to the second participant 306. Similarly, text box 328 has been overwritten with wavy lines to obscure confidential content from disclosure to the second participant 306. In some aspects, as the third participant 308 (“Phil”) is typing content into text box 326, the text may be reflected by thinking bubbles until confirmation that the text can be disclosed to the second participant 306. In still other examples, text box 330 entered by first participant (“Sean”) has been rewritten from “the Thunderstorm Project” (not shown) to “Project” to prevent disclosure of a project name to the second participant 306.. 
White teaches at FIGS. 3A-3C and Paragraph 0054 that text boxes are presented as prompts from one or more participants. White teaches at Paragraph 0052 that the spoken text 322A uttered by the third participant 306 has been modified to eliminate the term “Thunderstorm,” such that the spoken text 322B transmitted to the second participant 306 comprises, “Oh, you mean the . . . project?”. 
White teaches at Paragraph 0092 generative model package 504 is pre-trained according to a variety of inputs (e.g., a variety of human languages, a variety of programming languages, and/or a variety of content types) and therefore need not be finetuned or trained for a specific scenario. Rather, generative model package 504 may be more generally pre-trained, such that conference input 502 includes a prompt that is generated, selected, or otherwise engineered to induce generative model package 504 to produce certain generative model output 506. For example, a prompt includes a context and/or one or more completion prefixes that thus preload generative model package 504 accordingly. As a result, generative model package 504 is induced to generate output based on the prompt that includes a predicted sequence of tokens (e.g., up to a token limit of generative model package 504) relating to the prompt. 
White teaches at Paragraph 0030 that machine learning framework 120 and/or machine learning interface 128 manages the evaluation of the conference input (e.g., generating subsequent requests to machine learning service 102 for subsequent detection and/or modification of confidential content) according to pre-designated samples of confidential content and at Paragraph 0033 that natural language input may be provided via the user interface (e.g., as text input and/or as voice input), which may be processed according to aspects described herein and used to detect and/or modify confidential content in a conference input accordingly.  

Graham teaches at Paragraph 0040 that video content 108 featuring the synthetic content 102 may be viewable by the user 110 who provided the prompt and/or by one or more other users. Graham teaches at Paragraph 0048] Based at least in part on the user-provided prompt data 212(A) and on the body part data 220 (i.e., additional prompt data relating to a face(s) (or other body part(s))), the trained machine learning model(s) 200(A) may be used to generate output data 204(A) representing a synthetic face 102(1) and/or a synthetic body part (e.g., synthetic body 102(2).
Graham teaches at Paragraph [0084] that, additional user-provided prompt data 112, 212 may be received by the processor(s) after one or more iterations through the process 600, and in this scenario, the video content 108 displayed at block 616 may iteratively feature additional types of synthetic content 102 prompted by the user 110, 210, 310 (e.g., a synthetic face 102(1), followed by a synthetic body 102(2), followed by a synthetic background 102(3), etc.). In this manner, the additional prompts may be provided by the user 110, 210, 310 to build upon one or more previously-provided prompts. As such, the user 110, 210, 310 can continue to prompt the AI model(s) 100, 200, 300 with additional prompts in order to iteratively build upon an original synthetic manipulation of a source performance and/or a real-world scene that is being, or was, captured by a video capture device 202. 
Grahma teaches Paragraph 0020 that the AI model can iteratively refine the generative output it is providing in response to prompts and at Paragraph 0030 that the predefined set of prompts can be updated periodically as new prompts are discovered. 
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Mikutel teaches at FIG. 7 and Paragraph 0030 that the AI response is provided, using the steps discussed above with regard to FIGS. 1-4, giving the requesting participant 132A five marketing slogans. As noted in the overlay area 540, the AI response window interface 710 allows users to edit prompts and pick favorite responses to insert in the brainstorming area 515. In other words, the participant 132A can choose to place all five marketing slogans suggested in the AI response in the brainstorming area 515 (or in one or more of the theme areas 520-526), or only to place selected ones of the five in one of these areas.
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Mikutel teaches at FIGS. 5-8 that each scene includes a theme that depicts one or more participants. 
Mikutel teaches at FIG. 7 and Paragraph 0030 that the AI response is provided, using the steps discussed above with regard to FIGS. 1-4, giving the requesting participant 132A five marketing slogans (scenarios). As noted in the overlay area 540, the AI response window interface 710 allows users to edit prompts and pick favorite responses to insert in the brainstorming area 515. In other words, the participant 132A can choose to place all five marketing slogans suggested in the AI response in the brainstorming area 515 (or in one or more of the theme areas 520-526), or only to place selected ones of the five in one of these areas.
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0022 that responses from the AI-LLM 122 to the combined AI requests 316 are received by the application 112 in the server 110 by an AI response receiver module 320 and then sent to the visual collaboration applications 134A-134N of the participants 132A-132N as text and/or visual image responses via an AI response transmitter module 322. 
Mikutel teaches at FIG. 5 that the idea can be selected in the dropdown menu 530 to show a plurality of scenarios (meeting scenes/images/videos) that are relevant to the selected idea and each of the meeting scenes/images/videos describes a scene that depicts one or more participants. 
Mikutel teaches at Paragraph 0027 that In FIG. 5, the project that is the subject of the collaboration is identified in a collaboration subject window 530 (in this example as “Shoe slogan ideation”). A meeting chat area 535, separate from the brainstorming canvas 515, can also be provided for communication between the collaboration participants separate from the collaboration using the brainstorming canvas);
Automatically processing the second prompt by the LLM to generate the plurality of scenarios that are relevant to the selected idea (
White teaches at Paragraph [0054] FIG. 3C depicts a user interface 300C of conferencing session 302 at time T3. As illustrated, a brainstorming session is being conducted by the participants during the conferencing session 302. As will be appreciated, based on the confidentiality level of the second participant 306, at least some content shared by the first participant 304 and the third participant 308 during the brainstorming session may not be available to the second participant 306. For example, text boxes 324 have been overwritten with pseudo text to obscure confidential content from disclosure to the second participant 306. Similarly, text box 328 has been overwritten with wavy lines to obscure confidential content from disclosure to the second participant 306. In some aspects, as the third participant 308 (“Phil”) is typing content into text box 326, the text may be reflected by thinking bubbles until confirmation that the text can be disclosed to the second participant 306. In still other examples, text box 330 entered by first participant (“Sean”) has been rewritten from “the Thunderstorm Project” (not shown) to “Project” to prevent disclosure of a project name to the second participant 306.. 
White teaches at FIGS. 3A-3C and Paragraph 0054 that text boxes are presented as prompts from one or more participants wherein the text box 330 entered by first participant has been rewritten from “the Thunderstorm project” to “project” to prevent disclosure of a project name to the second participant 306.  
Graham teaches at Paragraph 0030 that the predefined set of prompts can be updated periodically as new prompts are discovered. Graham teaches at Paragraph 0040 that video content 108 featuring the synthetic content 102 may be viewable by the user 110 who provided the prompt and/or by one or more other users. Graham teaches at Paragraph 0048] Based at least in part on the user-provided prompt data 212(A) and on the body part data 220 (i.e., additional prompt data relating to a face(s) (or other body part(s))), the trained machine learning model(s) 200(A) may be used to generate output data 204(A) representing a synthetic face 102(1) and/or a synthetic body part (e.g., synthetic body 102(2).
Graham teaches at Paragraph [0084] that, additional user-provided prompt data 112, 212 may be received by the processor(s) after one or more iterations through the process 600, and in this scenario, the video content 108 displayed at block 616 may iteratively feature additional types of synthetic content 102 prompted by the user 110, 210, 310 (e.g., a synthetic face 102(1), followed by a synthetic body 102(2), followed by a synthetic background 102(3), etc.). In this manner, the additional prompts may be provided by the user 110, 210, 310 to build upon one or more previously-provided prompts. As such, the user 110, 210, 310 can continue to prompt the AI model(s) 100, 200, 300 with additional prompts in order to iteratively build upon an original synthetic manipulation of a source performance and/or a real-world scene that is being, or was, captured by a video capture device 202. 

Graham teaches at Paragraph 0021 that the AI model generates an actor performing a scene and output data for a sparse set of frames and Paragraph 0022 that the system may be implemented to generate synthetic audio content based on a live prompt from a user and at Paragraph 0027-0028 that the model 100 is prompted to generate face images of a famous actor at various ages based on the user-provided prompt data 112 and the user provided prompt 112 can represent one or more of these types of prompts.
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Mikutel teaches at FIG. 7 and Paragraph 0030 that the AI response is provided, using the steps discussed above with regard to FIGS. 1-4, giving the requesting participant 132A five marketing slogans. As noted in the overlay area 540, the AI response window interface 710 allows users to edit prompts and pick favorite responses to insert in the brainstorming area 515. In other words, the participant 132A can choose to place all five marketing slogans suggested in the AI response in the brainstorming area 515 (or in one or more of the theme areas 520-526), or only to place selected ones of the five in one of these areas.
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0022 that responses from the AI-LLM 122 to the combined AI requests 316 are received by the application 112 in the server 110 by an AI response receiver module 320 and then sent to the visual collaboration applications 134A-134N of the participants 132A-132N as text and/or visual image responses via an AI response transmitter module 322. 
Mikutel teaches at FIG. 5 that the idea can be selected in the dropdown menu 530 to show a plurality of scenarios (meeting scenes/images/videos) that are relevant to the selected idea and each of the meeting scenes/images/videos describes a scene that depicts one or more participants by processing the selected idea “Shoe Slogan Ideation”. 
Mikutel teaches at Paragraph 0027 that In FIG. 5, the project that is the subject of the collaboration is identified in a collaboration subject window 530 (in this example as “Shoe slogan ideation”). A meeting chat area 535, separate from the brainstorming canvas 515, can also be provided for communication between the collaboration participants separate from the collaboration using the brainstorming canvas); and
presenting an individual content item corresponding to an individual scenario of the plurality of scenarios comprising automatically adding the individual content item to a sharable content item feed for presentation to a user using an interaction application (
White teaches at Paragraph 0052 that the spoken text 322A uttered by the third participant 306 has been modified to eliminate the term “Thunderstorm,” such that the spoken text 322B transmitted to the second participant 306 comprises, “Oh, you mean the . . . project?”. 
White teaches at Paragraph 0092 generative model package 504 is pre-trained according to a variety of inputs (e.g., a variety of human languages, a variety of programming languages, and/or a variety of content types) and therefore need not be finetuned or trained for a specific scenario. Rather, generative model package 504 may be more generally pre-trained, such that conference input 502 includes a prompt that is generated, selected, or otherwise engineered to induce generative model package 504 to produce certain generative model output 506. For example, a prompt includes a context and/or one or more completion prefixes that thus preload generative model package 504 accordingly. As a result, generative model package 504 is induced to generate output based on the prompt that includes a predicted sequence of tokens (e.g., up to a token limit of generative model package 504) relating to the prompt. 
White teaches at Paragraph 0030 that machine learning framework 120 and/or machine learning interface 128 manages the evaluation of the conference input (e.g., generating subsequent requests to machine learning service 102 for subsequent detection and/or modification of confidential content) according to pre-designated samples of confidential content and at Paragraph 0033 that natural language input may be provided via the user interface (e.g., as text input and/or as voice input), which may be processed according to aspects described herein and used to detect and/or modify confidential content in a conference input accordingly.  
White teaches at FIGS. 3A-3C and Paragraph 0054 that text boxes are presented as prompts from one or more participants wherein the text box 330 entered by first participant has been rewritten from “the Thunderstorm project” to “project” to prevent disclosure of a project name to the second participant 306.  
Graham teaches at Paragraph [0034] that, the subject may not capture a sufficient amount of data to make these determinations. For example, the sparse data pertaining to the subject (e.g., the user 110) may include images and/or videos of the subject's face with certain facial expressions. Graham teaches at Paragraph 0072 that the user-provided prompt data 112, 212 may be received at block 502 based on an interaction of the user 110, 210, 310 with any suitable type of input device. 
Graham teaches at Paragraph 0040 that video content 108 featuring the synthetic content 102 may be viewable by the user 110 who provided the prompt and/or by one or more other users. Graham teaches at Paragraph 0048] Based at least in part on the user-provided prompt data 212(A) and on the body part data 220 (i.e., additional prompt data relating to a face(s) (or other body part(s))), the trained machine learning model(s) 200(A) may be used to generate output data 204(A) representing a synthetic face 102(1) and/or a synthetic body part (e.g., synthetic body 102(2).
Graham teaches at Paragraph [0084] that, additional user-provided prompt data 112, 212 may be received by the processor(s) after one or more iterations through the process 600, and in this scenario, the video content 108 displayed at block 616 may iteratively feature additional types of synthetic content 102 prompted by the user 110, 210, 310 (e.g., a synthetic face 102(1), followed by a synthetic body 102(2), followed by a synthetic background 102(3), etc.). In this manner, the additional prompts may be provided by the user 110, 210, 310 to build upon one or more previously-provided prompts. As such, the user 110, 210, 310 can continue to prompt the AI model(s) 100, 200, 300 with additional prompts in order to iteratively build upon an original synthetic manipulation of a source performance and/or a real-world scene that is being, or was, captured by a video capture device 202. 
Graham teaches at Paragraph 0030 that the predefined sets of prompts can be updated periodically as new prompts are discovered to modify existing ones of the predefined set of prompts and at Paragraph 0024 that pairs of the multiple different models 100 may be synchronized and/or configured to interact with each other and at FIG. 5 and Paragraph 0033 that the machine learning model 110 can be tuned to the likeness of the user 110 using a camera and at Paragraph 0072 that the user provided prompt data is provided by the user based on an interaction of the user 110, 210, 310. 
Mikutel teaches at Paragraph 0019 that a response from the AI-LLM can include recommendations for actual placement of new notes generated based on the AI response…when a participant (using an interaction application) selects to insert ideas received from the AI-LLM on a canvas and at Paragraph 0022 that responses from the AI-LLM 122 to the combined AI requests 316 are received by the application 112 in the server 110 by an AI response receiver module 320 and then sent to the visual collaboration applications 134A-134N of the participants 132A-132N as text and/or visual image responses via an AI response transmitter module 322. 
Mikutel teaches at FIG. 5 that the idea can be selected in the dropdown menu 530 (of interaction application) to show a plurality of scenarios (meetings) that are relevant to the selected idea and each of the meetings describes a scene that depicts one or more participants by processing the selected idea “Shoe Slogan Ideation”. 
Mikutel teaches at Paragraph 0027 that In FIG. 5, the project that is the subject of the collaboration is identified in a collaboration subject window 530 (in this example as “Shoe slogan ideation”). A meeting chat area 535, separate from the brainstorming canvas 515, can also be provided for communication between the collaboration participants separate from the collaboration using the brainstorming canvas).
Rodriguez teaches at FIG. 7 and column 13, lines 9-45 that the text-based transcript includes multiple textual elements with each text-based element including the time (date) at which the spoken message was recorded, an identifier for the meeting participant and a name of the meeting participant. 

It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Rodriguez/Graham’s transcript with person identification and date identification for each spoken message into Mikutel’s text-based prompt to have provided person identification and date identification for the text-based prompt from the audio-to-text transcript. One of the ordinary skill in the art would have been motivated to have identified date/time of the person who has provided the spoken message. 
Mikutel in view of Inkpen teaches the claim limitation: 
generating a second prompt comprising a selected idea from the plurality of ideas and a request for a plurality of scenarios that are relevant to the selected idea, each of the plurality of scenarios describing a scene that depicts one or more participants (
Mikutel teaches at FIG. 6 and Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Inkpen teaches at FIG. 2 and Paragraph 0038-00041 generating a second prompt comprising a selected idea from the plurality of ideas and a request for a plurality of scenarios 260-270-280 that are relevant to the selected idea, each of the plurality of scenarios describing a scene that depicts one or more participants. 
Mikutel FIG. 6 when combined with the feature of Inkpen FIG. 2 allows the plurality of scenarios being generated based on a selected idea in the transcript);
Automatically processing the second prompt by the LLM to generate the plurality of scenarios that are relevant to the selected idea (
Mikutel teaches at Paragraph 0014 that his AI server 120 can include an AI large language model (LLM) 122, which is a subset of artificial intelligence that has been trained on vast quantities of text data to produce human-like responses to dialogue or other natural language inputs. As will be discussed below, the visual collaboration and AI copilot application 112 of the server 110 communicates with the AI LLM 122 to provide AI responses to commands made on the visual collaboration applications 134A-134N.  
Mikutel teaches at FIG. 6 and Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Inkpen teaches at FIG. 2 and Paragraph 0038-00041 generating a second prompt comprising a selected idea from the plurality of ideas and a request for a plurality of scenarios 260-270-280 that are relevant to the selected idea, each of the plurality of scenarios describing a scene that depicts one or more participants. 
Mikutel FIG. 6 when combined with the feature of Inkpen FIG. 2 allows the plurality of scenarios being generated based on a selected idea in the transcript); and
presenting an individual content item corresponding to an individual scenario of the plurality of scenarios comprising automatically adding the individual content item to a sharable content item feed for presentation to a user using an interaction application (Mikutel teaches at FIG. 6 and Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Inkpen teaches at FIG. 2 and Paragraph 0038-00041 generating a second prompt comprising a selected idea from the plurality of ideas and a request for a plurality of scenarios 260-270-280 that are relevant to the selected idea, each of the plurality of scenarios describing a scene that depicts one or more participants. 
Mikutel FIG. 6 when combined with the feature of Inkpen FIG. 2 allows the plurality of scenarios being generated based on a selected idea in the transcript). 
When incorporating the transcript-based query of Inkpen as a part of dialog or natural language inputs of Mikutel, it would have been obvious to have generating a second prompt comprising the natural language inputs for the selected ideas in Mikutel FIGS. 5-8 to have modified the natural language input 545 of FIG. 6 of Mikutel to have incorporated transcript of Inkpen to have generated a plurality of scenarios in the same manner as Inkpen FIG. 2 that are relevant to the selected idea. One of the ordinary skill in the art would have presented scenarios describing one or more participants. 
Rodriguez teaches at FIG. 7 and column 13, lines 9-45 that the text-based transcript includes multiple textual elements with each text-based element including the time (date) at which the spoken message was recorded, an identifier for the meeting participant and a name of the meeting participant. 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated Rodriguez’s transcript with person identification and date identification for each spoken message to have modified Mikutel and Inkpen’s text-based transcript to have included the person identification and date identification for each message segment given by each participant to have provided person identification and date identification for the text-based prompt based the audio-to-text transcript with the person identification and date identification. One of the ordinary skill in the art would have been motivated to have identified date/time of the person who has provided the spoken message given by each participant. 
Mikutel in view of Hsaan teaches the claim limitation: 
generating a second prompt comprising a selected idea from the plurality of ideas and a request for a plurality of scenarios that are relevant to the selected idea, each of the plurality of scenarios describing a scene that depicts one or more participants (
Mikutel teaches at FIG. 6 and Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Hassan teaches at Paragraph 0008 that text can be sent to a large language model (LLM) with a prompt (second prompt comprising the second presenter’s idea) to instruct the LLM to interpret the conversation. The prompt can associate users with different sections of text. The prompt can also instruct the LLM to determine if a first person is interrupting a second person. If the LLM determines that a first person is being interrupted by a second person while that first person is selected as an active speaker, the system can generate a notification to the second person to let them know they are interrupting the first person. 
Hassan teaches at Paragraph [0049] FIG. 5 shows an embodiment of the system 100 that can be used to identify an interjecting participant using a large language model. This embodiment includes a selector module 105 that can be used to identify portions of a live meeting transcript 104. The selector module identifies portions of the transcript based on the activity of the meeting participants. For example, if a meeting includes a person who is a designated presenter, and the system detects that a second person's live audio signal indicates that the second person, who is not a designated presenter, has started to speak, the selector can identify a set of segments 116 from the transcript 104 to be sent as select content 106 to a large language model (LLM) 109. The select content 106 can be sent to the LLM with parameters 107 such as query prompts and policies. The query prompts and policies can include instructions that cause the LLM to determine if one person's statements are actually interrupting another person's statements. The LLM can then generate notification instructions 1101, which may cause the system to generate a notification 120 if the LLM determines that one person's statements interrupted a designated presenter.
Hassan teaches at FIGS. 3A-3E a plurality of scenarios in the user interface 110. 
Hassan shows at FIGS. 2A-2D and 3A-3E and Paragraph 0034 scene images are generated corresponding to a selected idea where the system detects that Miguel provides an input indicating an interest in becoming a designated presenter. Hassan teaches at Paragraph 0037 that the system can start with an input from User 2 10B indicating that they raised their hand. Another graphical indicator 140 can be displayed to show that User 2 10B has raised their hand. Then, as shown in FIG. 2B, the system transitions to a state where User 2 10B is added to a speaker queue, and the system designates User 2 10B as a designated presenter. Then, as shown in FIG. 2C, the system detects that another user, User 4 10D, starts to speak before User 2. In response to detecting that User 4 started to speak before User 2, the system generates a graphical element 120 that is displayed in association with User 4. This can indicate that User 4 is speaking out of turn.
Hassan teaches at FIGS. 1-5 and Paragraph 0049-0052 that the system may identify select segments 116 in response to the detection of predetermined event and the select segments 116 can be identified when statements are detected by two or more individuals. Hassan teaches at Paragraph 0031 that the user interface can include a participant region 110 and at Paragraph 0036 that in response to detecting that the participant 10D is speaking while the participant 10B is assigned with the speaker permissions, the system can cause a computing device 11D associated with the participant 10D to generate a graphical element 120 indicating that the second participant 10D is speaking out of turn);
Automatically processing the second prompt by the LLM to generate the plurality of scenarios that are relevant to the selected idea (
Mikutel teaches at Paragraph 0014 that his AI server 120 can include an AI large language model (LLM) 122, which is a subset of artificial intelligence that has been trained on vast quantities of text data to produce human-like responses to dialogue or other natural language inputs. As will be discussed below, the visual collaboration and AI copilot application 112 of the server 110 communicates with the AI LLM 122 to provide AI responses to commands made on the visual collaboration applications 134A-134N.  
Mikutel teaches at FIG. 6 and Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Hassan teaches at Paragraph 0008 that text can be sent to a large language model (LLM) with a prompt (second prompt comprising the second presenter’s idea) to instruct the LLM to interpret the conversation. The prompt can associate users with different sections of text. The prompt can also instruct the LLM to determine if a first person is interrupting a second person. If the LLM determines that a first person is being interrupted by a second person while that first person is selected as an active speaker, the system can generate a notification to the second person to let them know they are interrupting the first person. 
Hassan teaches at Paragraph [0049] FIG. 5 shows an embodiment of the system 100 that can be used to identify an interjecting participant using a large language model. This embodiment includes a selector module 105 that can be used to identify portions of a live meeting transcript 104. The selector module identifies portions of the transcript based on the activity of the meeting participants. For example, if a meeting includes a person who is a designated presenter, and the system detects that a second person's live audio signal indicates that the second person, who is not a designated presenter, has started to speak, the selector can identify a set of segments 116 from the transcript 104 to be sent as select content 106 to a large language model (LLM) 109. The select content 106 can be sent to the LLM with parameters 107 such as query prompts and policies. The query prompts and policies can include instructions that cause the LLM to determine if one person's statements are actually interrupting another person's statements. The LLM can then generate notification instructions 1101, which may cause the system to generate a notification 120 if the LLM determines that one person's statements interrupted a designated presenter.
Hassan teaches at FIGS. 3A-3E a plurality of scenarios in the user interface 110. 
Hassan teaches at Paragraph 0039 that during the meeting, the system continually analyzes the voice of each user and identifies each person's voice. At first, User B and User C start conversing. User A then performs a hand raise gesture that causes the system to select User A as a designated presenter. Then, User D, who is not in a presenter role, starts talking out of turn. The system can take one or more actions based on a level associated with the detected behavior.
Hassan shows at FIGS. 2A-2D and 3A-3E and Paragraph 0034 scene images are generated corresponding to a selected idea where the system detects that Miguel provides an input indicating an interest in becoming a designated presenter. Hassan teaches at Paragraph 0037 that the system can start with an input from User 2 10B indicating that they raised their hand. Another graphical indicator 140 can be displayed to show that User 2 10B has raised their hand. Then, as shown in FIG. 2B, the system transitions to a state where User 2 10B is added to a speaker queue, and the system designates User 2 10B as a designated presenter. Then, as shown in FIG. 2C, the system detects that another user, User 4 10D, starts to speak before User 2. In response to detecting that User 4 started to speak before User 2, the system generates a graphical element 120 that is displayed in association with User 4. This can indicate that User 4 is speaking out of turn.
Hassan teaches at FIGS. 1-5 and Paragraph 0049-0052 that the system may identify select segments 116 in response to the detection of predetermined event and the select segments 116 can be identified when statements are detected by two or more individuals. Hassan teaches at Paragraph 0031 that the user interface can include a participant region 110 and at Paragraph 0036 that in response to detecting that the participant 10D is speaking while the participant 10B is assigned with the speaker permissions, the system can cause a computing device 11D associated with the participant 10D to generate a graphical element 120 indicating that the second participant 10D is speaking out of turn. 
); and
presenting an individual content item corresponding to an individual scenario of the plurality of scenarios comprising automatically adding the individual content item to a sharable content item feed for presentation to a user using an interaction application (Mikutel teaches at FIG. 6 and Paragraph 0029 that the AI aspect of this disclosure begins with one of the participants (in this example, participant 132A) typing in a prompt in the AI text box 545 to request information from the AI-LLM 122. In the example shown, the participant 132A types “Give me 5 marketing slogans for new running shoes product launch?” in the AI text box 545. As noted in the overlay area 540 of FIG. 6, the typing of the request in the AI text box 545 is an initial step in an AI idea generation operation. Of course, the actual details of the request entered in the AI text box 545 depend completely on what the participant making the request is interested in. It is noted that, although the AI request is shown in this example as being entered as typed text, other forms of entry could be used such as audio commands. 
Hassan teaches at Paragraph 0008 that text can be sent to a large language model (LLM) with a prompt (second prompt comprising the second presenter’s idea) to instruct the LLM to interpret the conversation. The prompt can associate users with different sections of text. The prompt can also instruct the LLM to determine if a first person is interrupting a second person. If the LLM determines that a first person is being interrupted by a second person while that first person is selected as an active speaker, the system can generate a notification to the second person to let them know they are interrupting the first person. 
Hassan teaches at Paragraph [0049] FIG. 5 shows an embodiment of the system 100 that can be used to identify an interjecting participant using a large language model. This embodiment includes a selector module 105 that can be used to identify portions of a live meeting transcript 104. The selector module identifies portions of the transcript based on the activity of the meeting participants. For example, if a meeting includes a person who is a designated presenter, and the system detects that a second person's live audio signal indicates that the second person, who is not a designated presenter, has started to speak, the selector can identify a set of segments 116 from the transcript 104 to be sent as select content 106 to a large language model (LLM) 109. The select content 106 can be sent to the LLM with parameters 107 such as query prompts and policies. The query prompts and policies can include instructions that cause the LLM to determine if one person's statements are actually interrupting another person's statements. The LLM can then generate notification instructions 1101, which may cause the system to generate a notification 120 if the LLM determines that one person's statements interrupted a designated presenter.
Hassan teaches at FIGS. 3A-3E a plurality of scenarios in the user interface 110. 
Hassan shows at FIGS. 2A-2D and 3A-3E and Paragraph 0034 scene images are generated corresponding to a selected idea where the system detects that Miguel provides an input indicating an interest in becoming a designated presenter. Hassan teaches at Paragraph 0037 that the system can start with an input from User 2 10B indicating that they raised their hand. Another graphical indicator 140 can be displayed to show that User 2 10B has raised their hand. Then, as shown in FIG. 2B, the system transitions to a state where User 2 10B is added to a speaker queue, and the system designates User 2 10B as a designated presenter. Then, as shown in FIG. 2C, the system detects that another user, User 4 10D, starts to speak before User 2. In response to detecting that User 4 started to speak before User 2, the system generates a graphical element 120 that is displayed in association with User 4. This can indicate that User 4 is speaking out of turn.
Hassan teaches at FIGS. 1-5 and Paragraph 0049-0052 that the system may identify select segments 116 in response to the detection of predetermined event and the select segments 116 can be identified when statements are detected by two or more individuals. Hassan teaches at Paragraph 0031 that the user interface can include a participant region 110 and at Paragraph 0036 that in response to detecting that the participant 10D is speaking while the participant 10B is assigned with the speaker permissions, the system can cause a computing device 11D associated with the participant 10D to generate a graphical element 120 indicating that the second participant 10D is speaking out of turn). 
When incorporating a second prompt from the second presenter based on the natural language audio transcript of Hassan as a part of dialog or natural language inputs of Mikutel, it would have been obvious to have generating a second prompt comprising the natural language inputs for the selected ideas in Mikutel FIGS. 5-8 to have modified the natural language input 545 of FIG. 6 of Mikutel to have incorporated transcript of Hassan to have generated a plurality of scenarios in the same manner as Hassan FIGS. 3A-3E that are relevant to the selected idea based on the selector module identifying portions of the transcript based on the activity of the meeting participants by identifying a set of segments 116 from plurality of segments (ideas). One of the ordinary skill in the art would have presented scenarios describing one or more participants. 
Re Claim 5: 
The claim 5 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that each of the plurality of scenarios comprises information about who is in a respective scenario, a pose or activity performed by each person present in the respective scenario, an expression of each person present in the respective scenario, and a description of a background of the respective scenario. 
Inkpen further teaches the claim limitation that each of the plurality of scenarios comprises information about who is in a respective scenario, a pose or activity performed by each person present in the respective scenario, an expression of each person present in the respective scenario, and a description of a background of the respective scenario (
Inkpen teaches at Paragraph [0056] The segment image generator 308 may use one or more templates for generation of the segment images. For example, a four panel template may be used to generate the segment image 260, where the template includes upper left, upper right, lower left, and lower right panels that may be populated with avatars or image portions. As another example, a three panel template may be used to generate the segment image 280, where the template includes a left panel, an upper right panel, and a lower right panel. The templates may also specify font styles, colors, background images, label locations (e.g., to be populated by segment labeler 306), dialog bubble locations (e.g., to be populated by dialog bubble generator 310), links or links locations (e.g., to be populated by link generator 312), etc. Other variations of templates will be apparent to those skilled in the art.
Inkpen teaches at Paragraph 0058 that the segment image generator 308 is configured to use a facial recognition algorithm (e.g., an instance of the neural network model 126) to extract a suitable image from the media stream. In one example, the segment image generator 308 extracts an image when the facial recognition algorithm indicates that the participant is smiling, laughing, frowning, or making another expression that represents the content of the corresponding segment. 
Inkpen teaches at Paragraph [0041] that the transcript 200 includes dialog content during a meeting among users Alice, Bob, Cher, and Katie as they discuss a location for an annual retreat. It is clearly understood that Alice, Bob, Cher and Katie are friends or best friends. 
Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”. 
Inkpen teaches at Paragraph 0040 that the media stream processor 122 identifies three segments within the media stream, generates segment labels for the segments, and generates segment images 260, 270, and 280 for the segments. The media stream processor 122 may generate the segment images to include the corresponding segment label, such as the segment labels 262, 272, and 282.
Inkpen teaches at FIGS. 2 and Paragraph 0055 that each individual scenario comprises first and second participants (Alice and Cher), the first participant (Alice) corresponding to the user of the device; selecting a subset of participants (friends) associated with the user Alice to be depicted in an image. The other participants are friends of the user Alice). 
Re Claim 6: 
The claim 6 encompasses the same scope of invention as that of the claim 5 except additional claim limitation that one or more of the plurality of scenarios includes a message for a caption. 
Inkpen further teaches the claim limitation that one or more of the plurality of scenarios includes a message for a caption (Inkpen teaches at FIG. 2 and Paragraph 0041 that the plurality of scenarios 260-270-280 includes a message for a caption including “1. Choices for conference location”, “2. Travel cost and living expenses” and “3. Climate in London”. Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”). 
Re Claim 7: 
The claim 7 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the plurality of scenarios comprises: a first scenario that includes a first scenario description, a first set of details about a pose and expression of only a first person, and a first message; and a second scenario that includes a second scenario description, a second set of details about a pose and expression of the first person and a pose and expression of a second person, and a second message. 
Inkpen further teaches the claim limitation that the plurality of scenarios comprises: a first scenario that includes a first scenario description, a first set of details about a pose and expression of only a first person, and a first message (Inkpen teaches at FIG. 2 and Paragraph 0041 that the plurality of scenarios 260-270-280 includes a message for a caption including “1. Choices for conference location”, “2. Travel cost and living expenses” and “3. Climate in London”. Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”. 
Inkpen teaches at Paragraph 0057 that the segment image generator 308 may generate an animated avatar for a participant based on dialog from the participant, for example, so that the avatar's mouth appears to match the dialog. Animations may be generated to highlight facial expressions of a participant, actions taken by the participant. 
Inkpen teaches at Paragraph 0058 that the segment image generator 308 is configured to use a facial recognition algorithm (e.g., an instance of the neural network model 126) to extract a suitable image from the media stream. In one example, the segment image generator 308 extracts an image when the facial recognition algorithm indicates that the participant is smiling, laughing, frowning, or making another expression that represents the content of the corresponding segment. Inkpen teaches at Paragraph 0042 that each of Alice, Bob, Cher, and Katie contributed by providing at least a city name and the media stream processor 122 generates the segment image 260 for the first segment to include an avatar for each user, along with dialog provided by the corresponding user. The dialog may be displayed within a dialog bubble, such as dialog bubble 264, in a separate text box, or other suitable manner.); and 
a second scenario that includes a second scenario description, a second set of details about a pose and expression of the first person and a pose and expression of a second person, and a second message (Inkpen teaches at FIG. 2 and Paragraph 0041 that the plurality of scenarios 260-270-280 includes a message for a caption including “1. Choices for conference location”, “2. Travel cost and living expenses” and “3. Climate in London”. Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”. 
Inkpen teaches at Paragraph 0057 that the segment image generator 308 may generate an animated avatar for a participant based on dialog from the participant, for example, so that the avatar's mouth appears to match the dialog. Animations may be generated to highlight facial expressions of a participant, actions taken by the participant. 
Inkpen teaches at Paragraph 0058 that the segment image generator 308 is configured to use a facial recognition algorithm (e.g., an instance of the neural network model 126) to extract a suitable image from the media stream. In one example, the segment image generator 308 extracts an image when the facial recognition algorithm indicates that the participant is smiling, laughing, frowning, or making another expression that represents the content of the corresponding segment). 
Re Claim 8: 
The claim 8 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that randomly selecting a first scenario from the plurality of scenarios; after automatically processing the second prompt by the LLM to generate the individual content item: identifying one or more faces in the individual content item; replacing a first identified face with a real-world face of the user; replacing a second identified face with a real-world face of a friend associated with the user, the friend being selected from best friends of the user or friends with whom the user recently exchanged messages within a  predetermined time period; and overlaying a message on the individual content item after replacing the faces.
Graham teaches the claim limitation that randomly selecting a first scenario from the plurality of scenarios; after automatically processing the second prompt by the LLM to generate the individual content item: identifying one or more faces in the individual content item; replacing a first identified face with a real-world face of the user; replacing a second identified face with a real-world face of a friend associated with the user, the friend being selected from best friends of the user or friends with whom the user recently exchanged messages within a  predetermined time period; and overlaying a message on the individual content item after replacing the faces (
Graham teaches at Paragraph  [0038] Once the model(s) 100 is tuned to the likeness of the subject (e.g., the user 110), as described above, the user 110, or another person, can prompt the model(s) 100 (e.g., with a voice prompt 116) to generate output data 104 that corresponds to synthetic content 102 requested in the prompt. For instance, the user 110 can prompt the model(s) 100 to feature the user 110 in a movie starring a famous actor(s), and the model(s) 100 can, based on this prompt, generate output data 104 that corresponds to synthetic content 102 featuring the user's 110 face (e.g., the user's 110 face may be overlaid on a character's face in the movie). This allows for synthesizing the subject (e.g., the user 110) in various different contexts, by generating novel, synthetic faces of the subject from any desired angle, pose, etc., and/or with any desired expression (e.g., smiling, eyes closed, frowning, etc.). In some examples, latent space manipulation (or editing) and neural animation techniques can be used to generate the synthetic content 102 (e.g., a synthetic face 102(1)) pertaining to the subject to achieve a desired angle, pose, expression. 
Graham teaches at Paragraph 0027 that the body part data 106(2) includes face images of multiple different subjects (e.g., people) to allow for generating synthetic faces 102(1) of the different subjects (e.g., people) and/or for swapping faces on-demand, when the model(s) 100 is prompted to do so. 
Graham teaches at Paragraph 0036 that an AI model(s) (e.g., a tuned diffusion model(s)) can be used to generate a synthetic version of the subject's (e.g., the user's 110) face using the sparse data (e.g., a single face image, a few face images, etc.) that the subject (e.g., the user 110) captured using a camera(s) of an electronic device, and this synthetic face of the subject (e.g., the user 110) can be overlaid on the face of the stand-in actor featured in the captured footage. For example, the synthetic face of the subject can be generated by a tuned diffusion model(s) to mirror the facial expressions of the stand-in actor, from the angles at which the stand-in actor was filmed in the original footage. Accordingly, video data can be generated of a synthetic version of the subject (e.g., the user 110) doing things (e.g., making facial expressions, body movements, etc.) that the stand-in actor did in the original footage. 
Graham teaches at Paragraph 0039 that the user 110 can prompt the model(s) 100 to feature the user 110 in a movie starring a famous actor(s), and the model(s) 100 can, based on this prompt, generate output data 104 that corresponds to synthetic content 102 featuring the user's 110 face (e.g., the user's 110 face may be overlaid on a character's face in the movie), which can make it appear as though the user 110 acted in the movie with the famous actor(s). 
Graham teaches at Paragraph 0040 that video content 108 featuring the synthetic content 102 may be viewable by the user 110 who provided the prompt and/or by one or more other users).

Re Claim 9: 
The claim 9 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that determining that the first scenario corresponds to the user; and
searching a collection of previously captured content items that depict only the user based on the first scenario to provide the individual content item that depicts the user having a pose and expression matching a first set of details; the searching comprising: accessing a plurality of content items that were previously captured using the interaction application, each of the plurality of content items associated with a respective timestamp indicating when the respective content item in the plurality of content items was captured, the collection of the plurality of content items comprises one or more content items associated with respective timestamps that precede a current time by a specified threshold. 
Graham further teaches the claim limitation that determining that the first scenario corresponds to the user; and
searching a collection of previously captured content items that depict only the user based on the first scenario to provide the individual content item that depicts the user having a pose and expression matching a first set of details; the searching comprising: accessing a plurality of content items that were previously captured using the interaction application, each of the plurality of content items associated with a respective timestamp indicating when the respective content item in the plurality of content items was captured, the collection of the plurality of content items comprises one or more content items associated with respective timestamps that precede a current time by a specified threshold (Graham teaches at Paragraph 0027 that the trained machine learning model 100 can be trained to change an angle of the face based on prompts from a user 100 and at Paragraph 0028 that a user 100 can prompt the trained machine learning model by describing a person and the prompted model 100 generates output data 104 representing synthetic content 102 based on user-provided prompt data 112 and the user 100 prompts the model 100 with a voice prompt 116 by saying “Show me 35-year old John Smith walking outside on a sunny day” and the model 100 generates output data 104 and a suitable speech-to-text technology can be used to convert audio data into text data representing a voice prompt and at Paragraph 0030 that the training data 106 may include a sample dataset that is within the prescribed boundary and at Paragraph 0031 that the training data 106 pertaining to a particular subject may be sparse and the only data pertaining to the subject that is available may be image data representing a single image, a few images, a short video of the subject and a subject may wish to have a face or another body part. 
It is noted that the image frame is the short-video captured previously is time-stamped). 

Re Claim 10: 
The claim 10 encompasses the same scope of invention as that of the claim 9 except additional claim limitation that appending to a front portion of the first message a graphical element that indicates that the first message was generated by the LLM;
appending to an end portion of the first message the graphical element that indicates that the first message was generated by the LLM; and
overlaying the first message with the graphical element in the front and end portions on the individual content item to generate the individual content item that is presented. 
Inkpen further teaches the claim limitation that appending to a front portion of the first message a graphical element that indicates that the first message was generated by the LLM;
appending to an end portion of the first message the graphical element that indicates that the first message was generated by the LLM; and
overlaying the first message with the graphical element in the front and end portions on the individual content item to generate the individual content item that is presented (Inkpen teaches at Paragraph 0047 that the media stream processor 300 comprises a transcript generator 302, a segment identifier 304, a segment labeler 306, a segment image generator 308 and a dialog bubble generator 310 and at Paragraph 0050 that the segment identifier 304 may process the transcript 200 and identify the first, second, and third segments. In some examples, the segment identifier 304 is implemented as a large language model. 
Inkpen teaches at FIGS. 1-2 and Paragraph 0041-0043 that the media stream processor 122 executing the language model appends the first message with the graphical element (a bounding box) including bounding the first portion of the first message and the end portion of the first message).  
Re Claim 11: 
The claim 11 encompasses the same scope of invention as that of the claim 9 except additional claim limitation that determining that the collection of previously captured content items fails to include content items that depict the user having a pose and expression matching the first set of details; and in response to determining that the collection of previously captured content items fails to include content items that depict the user having the pose and expression matching the first set of details, generating an additional prompt with instructions for the LLM to generate a new image that depicts the first scenario. 
Hassan and Inkpen further teach the claim limitation that determining that determining that the collection of previously captured content items fails to include content items that depict the user having a pose and expression matching the first set of details; and in response to determining that the collection of previously captured content items fails to include content items that depict the user having the pose and expression matching the first set of details, generating an additional prompt with instructions for the LLM to generate a new image that depicts the first scenario (
Hansan shows at FIG. 4 a scenario associated with each of FIGS. 3A-3E wherein the interrupting user 102L’s hand gesture is shown and the hand gesture is dynamically changed. 
Hassan teaches at Paragraph 0038 that as shown in FIG. 3B through FIG. 3D, the notification 120 increases over time. This can include causing the graphical element 120 to increase in size over time while the subsequent input indicates that User 4 10D is speaking over User 2 who is the designated presenter. The rate of the size increase and the size of the graphical element is based on a priority of first participant or a characteristic of the subsequent input. For example, if the designated presenter, User 2, is a higher ranking employee over User 4, and/or User 4 has spoken for more than a threshold period of time or is speaking over a threshold volume, the system can cause the notification 120 to increase at a faster rate or to a larger size versus a scenario where User 4 does not rank higher and/or does not meet the speaking criteria.
Hassan teaches at Paragraph 0058 that the user 1 raises their hand. This can be done with a hand-raise button, gesture captured by a camera indicating that the person raised their hand or mentioned they would like to speak, or by the use of an AI model analyzing a transcript and configured to grant a speaker role when the person raises an interest to speak.
Hassan teaches at Paragraph [0049] FIG. 5 shows an embodiment of the system 100 that can be used to identify an interjecting participant using a large language model. This embodiment includes a selector module 105 that can be used to identify portions of a live meeting transcript 104. The selector module identifies portions of the transcript based on the activity of the meeting participants. For example, if a meeting includes a person who is a designated presenter, and the system detects that a second person's live audio signal indicates that the second person, who is not a designated presenter, has started to speak, the selector can identify a set of segments 116 from the transcript 104 to be sent as select content 106 to a large language model (LLM) 109. The select content 106 can be sent to the LLM with parameters 107 such as query prompts and policies. The query prompts and policies can include instructions that cause the LLM to determine if one person's statements are actually interrupting another person's statements. The LLM can then generate notification instructions 1101, which may cause the system to generate a notification 120 if the LLM determines that one person's statements interrupted a designated presenter. 
Hassan teaches at Paragraph 0065 that FIGS. 3A-3D show an example where an enhancement could be an animation, e.g., a bigger size, or other transformation of the UI element. The operations can include a graphical element 120 that increases in size over time while the subsequent input indicates that the second participant 10D is speaking, wherein a rate of size increase and the size of the graphical element is based on a priority of first participant or a characteristic of the subsequent input. 
Inkpen teaches at FIG. 2 and Paragraph 0041 that the plurality of scenarios 260-270-280 includes a message for a caption including “1. Choices for conference location”, “2. Travel cost and living expenses” and “3. Climate in London”. Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”. 
Inkpen teaches at Paragraph 0057 that the segment image generator 308 may generate an animated avatar for a participant based on dialog from the participant, for example, so that the avatar's mouth appears to match the dialog. Animations may be generated to highlight facial expressions of a participant, actions taken by the participant. 
Inkpen teaches at Paragraph 0058 that the segment image generator 308 is configured to use a facial recognition algorithm (e.g., an instance of the neural network model 126) to extract a suitable image from the media stream. In one example, the segment image generator 308 extracts an image when the facial recognition algorithm indicates that the participant is smiling, laughing, frowning, or making another expression that represents the content of the corresponding segment. Inkpen teaches at Paragraph 0042 that each of Alice, Bob, Cher, and Katie contributed by providing at least a city name and the media stream processor 122 generates the segment image 260 for the first segment to include an avatar for each user, along with dialog provided by the corresponding user. The dialog may be displayed within a dialog bubble, such as dialog bubble 264, in a separate text box, or other suitable manner).  
Re Claim 12: 
The claim 12 encompasses the same scope of invention as that of the claim 11 except additional claim limitation that the additional prompt comprises an image of a face of the user, and wherein the new image depicts the face of the user in the pose and expression matching the first set of details. 
Hassan and Inkpen further teach the claim limitation that the additional prompt comprises an image of a face of the user, and wherein the new image depicts the face of the user in the pose and expression matching the first set of details (
Hansan shows at FIG. 4 a scenario associated with each of FIGS. 3A-3E wherein the interrupting user 102L’s hand gesture is shown and the hand gesture is dynamically changed. 
Hassan teaches at Paragraph 0038 that as shown in FIG. 3B through FIG. 3D, the notification 120 increases over time. This can include causing the graphical element 120 to increase in size over time while the subsequent input indicates that User 4 10D is speaking over User 2 who is the designated presenter. The rate of the size increase and the size of the graphical element is based on a priority of first participant or a characteristic of the subsequent input. For example, if the designated presenter, User 2, is a higher ranking employee over User 4, and/or User 4 has spoken for more than a threshold period of time or is speaking over a threshold volume, the system can cause the notification 120 to increase at a faster rate or to a larger size versus a scenario where User 4 does not rank higher and/or does not meet the speaking criteria.
Hassan teaches at Paragraph 0058 that the user 1 raises their hand. This can be done with a hand-raise button, gesture captured by a camera indicating that the person raised their hand or mentioned they would like to speak, or by the use of an AI model analyzing a transcript and configured to grant a speaker role when the person raises an interest to speak.
Hassan teaches at Paragraph [0049] FIG. 5 shows an embodiment of the system 100 that can be used to identify an interjecting participant using a large language model. This embodiment includes a selector module 105 that can be used to identify portions of a live meeting transcript 104. The selector module identifies portions of the transcript based on the activity of the meeting participants. For example, if a meeting includes a person who is a designated presenter, and the system detects that a second person's live audio signal indicates that the second person, who is not a designated presenter, has started to speak, the selector can identify a set of segments 116 from the transcript 104 to be sent as select content 106 to a large language model (LLM) 109. The select content 106 can be sent to the LLM with parameters 107 such as query prompts and policies. The query prompts and policies can include instructions that cause the LLM to determine if one person's statements are actually interrupting another person's statements. The LLM can then generate notification instructions 1101, which may cause the system to generate a notification 120 if the LLM determines that one person's statements interrupted a designated presenter. 
Hassan teaches at Paragraph 0065 that FIGS. 3A-3D show an example where an enhancement could be an animation, e.g., a bigger size, or other transformation of the UI element. The operations can include a graphical element 120 that increases in size over time while the subsequent input indicates that the second participant 10D is speaking, wherein a rate of size increase and the size of the graphical element is based on a priority of first participant or a characteristic of the subsequent input. 
Inkpen teaches at FIG. 2 and Paragraph 0041 that the plurality of scenarios 260-270-280 includes a message for a caption including “1. Choices for conference location”, “2. Travel cost and living expenses” and “3. Climate in London”. Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”. 
Inkpen teaches at Paragraph 0057 that the segment image generator 308 may generate an animated avatar for a participant based on dialog from the participant, for example, so that the avatar's mouth appears to match the dialog. Animations may be generated to highlight facial expressions of a participant, actions taken by the participant. 
Inkpen teaches at Paragraph 0058 that the segment image generator 308 is configured to use a facial recognition algorithm (e.g., an instance of the neural network model 126) to extract a suitable image from the media stream. In one example, the segment image generator 308 extracts an image when the facial recognition algorithm indicates that the participant is smiling, laughing, frowning, or making another expression that represents the content of the corresponding segment. Inkpen teaches at Paragraph 0042 that each of Alice, Bob, Cher, and Katie contributed by providing at least a city name and the media stream processor 122 generates the segment image 260 for the first segment to include an avatar for each user, along with dialog provided by the corresponding user. The dialog may be displayed within a dialog bubble, such as dialog bubble 264, in a separate text box, or other suitable manner).  
Re Claim 13: 
The claim 13 encompasses the same scope of invention as that of the claim 8 except additional claim limitation that generating an additional prompt with instructions for the LLM to generate a new image that depicts the first scenario.
Hassan and Inkpen further teach the claim limitation that generating an additional prompt with instructions for the LLM to generate a new image that depicts the first scenario (
Hansan shows at FIG. 4 a scenario associated with each of FIGS. 3A-3E wherein the interrupting user 102L’s hand gesture is shown and the hand gesture is dynamically changed. 
Hassan teaches at Paragraph 0038 that as shown in FIG. 3B through FIG. 3D, the notification 120 increases over time. This can include causing the graphical element 120 to increase in size over time while the subsequent input indicates that User 4 10D is speaking over User 2 who is the designated presenter. The rate of the size increase and the size of the graphical element is based on a priority of first participant or a characteristic of the subsequent input. For example, if the designated presenter, User 2, is a higher ranking employee over User 4, and/or User 4 has spoken for more than a threshold period of time or is speaking over a threshold volume, the system can cause the notification 120 to increase at a faster rate or to a larger size versus a scenario where User 4 does not rank higher and/or does not meet the speaking criteria.
Hassan teaches at Paragraph 0058 that the user 1 raises their hand. This can be done with a hand-raise button, gesture captured by a camera indicating that the person raised their hand or mentioned they would like to speak, or by the use of an AI model analyzing a transcript and configured to grant a speaker role when the person raises an interest to speak.
Hassan teaches at Paragraph [0049] FIG. 5 shows an embodiment of the system 100 that can be used to identify an interjecting participant using a large language model. This embodiment includes a selector module 105 that can be used to identify portions of a live meeting transcript 104. The selector module identifies portions of the transcript based on the activity of the meeting participants. For example, if a meeting includes a person who is a designated presenter, and the system detects that a second person's live audio signal indicates that the second person, who is not a designated presenter, has started to speak, the selector can identify a set of segments 116 from the transcript 104 to be sent as select content 106 to a large language model (LLM) 109. The select content 106 can be sent to the LLM with parameters 107 such as query prompts and policies. The query prompts and policies can include instructions that cause the LLM to determine if one person's statements are actually interrupting another person's statements. The LLM can then generate notification instructions 1101, which may cause the system to generate a notification 120 if the LLM determines that one person's statements interrupted a designated presenter. 
Hassan teaches at Paragraph 0065 that FIGS. 3A-3D show an example where an enhancement could be an animation, e.g., a bigger size, or other transformation of the UI element. The operations can include a graphical element 120 that increases in size over time while the subsequent input indicates that the second participant 10D is speaking, wherein a rate of size increase and the size of the graphical element is based on a priority of first participant or a characteristic of the subsequent input. 
Inkpen teaches at FIG. 2 and Paragraph 0041 that the plurality of scenarios 260-270-280 includes a message for a caption including “1. Choices for conference location”, “2. Travel cost and living expenses” and “3. Climate in London”. Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”. 
Inkpen teaches at Paragraph 0057 that the segment image generator 308 may generate an animated avatar for a participant based on dialog from the participant, for example, so that the avatar's mouth appears to match the dialog. Animations may be generated to highlight facial expressions of a participant, actions taken by the participant. 
Inkpen teaches at Paragraph 0058 that the segment image generator 308 is configured to use a facial recognition algorithm (e.g., an instance of the neural network model 126) to extract a suitable image from the media stream. In one example, the segment image generator 308 extracts an image when the facial recognition algorithm indicates that the participant is smiling, laughing, frowning, or making another expression that represents the content of the corresponding segment. Inkpen teaches at Paragraph 0042 that each of Alice, Bob, Cher, and Katie contributed by providing at least a city name and the media stream processor 122 generates the segment image 260 for the first segment to include an avatar for each user, along with dialog provided by the corresponding user. The dialog may be displayed within a dialog bubble, such as dialog bubble 264, in a separate text box, or other suitable manner).  
Re Claim 14: 
The claim 14 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that processing the first prompt that includes safety criteria.  
White and Graham teach the claim limitation that processing the first prompt that includes safety criteria (
White teaches at Paragraph 0052 that the mathematical formula 318B has been blurred to prevent disclosure. In other aspects, mathematical formula 318A may be redacted entirely from whiteboard 316 (not shown). Further, the title 320B of the book has been redacted and infilled to blend with the book binding. In other aspects, the title 320A may be blurred or otherwise obscured to prevent disclosure. As further illustrated, the spoken text 322A uttered by the third participant 306 has been modified to eliminate the term “Thunderstorm,” such that the spoken text 322B transmitted to the second participant 306 comprises, “Oh, you mean the . . . project?” In aspects, the term “Thunderstorm” may be replaced by a slight pause or a beep, for instance (represented by the ellipsis). If the term is muted resulting in a slight pause, the second participant 306 may be unaware of the deletion of the term; whereas if the term is replaced by another sound (e.g., a beep or bell), the second participant 306 may be aware of the deletion of the term. In some aspects, if the second participant 306 records the conferencing session, the content captured by the recording may correspond to the content authorized for viewing by the second participant 306 during the conferencing session. In other aspects, a recording of a conferencing session may be evaluated retrospectively. That is, the recording may be analyzed based on a user confidentiality level associated with a recipient of the forwarded recording rather than the participant of the meeting who recorded it. In this way, when a recipient has a lower confidentiality level than the participant who recorded the meeting, the multimodal ML model may be triggered to further modify detected confidential content in the recording to prevent disclosure to the recipient. In aspects, the multimodal ML model may apply modifications to either an original unmodified recording or to a modified copy.
 Graham teaches at Paragraph 0030 that the predefined set of prompts can be updated periodically as new prompts are discovered or to delete or otherwise modify existing ones of the predefined set of prompts. In some examples, the model(s) 100 can be constrained by the prompt data 106(3) including the predefined set of prompts, as opposed to allowing any and all prompts to trigger generative output from the model(s) 100. Additionally, or alternatively, the model(s) 100 can be constrained by a front-end filter that filters incoming prompts against a list of the predefined set of prompts, and if the incoming prompt is not on the list, a processor(s) may refrain from running the model(s) 100,
Graham teaches at Paragraph 0076 that this iterative feedback loop 120 can aid in fine-tuning the generation of photoreal synthetic content (e.g., by the AI model(s) 100, 200, 300 learning, through iterative feedback, to generate temporally-coherent output data 104,
Graham teaches at Paragraph [0084] that, additional user-provided prompt data 112, 212 may be received by the processor(s) after one or more iterations through the process 600, and in this scenario, the video content 108 displayed at block 616 may iteratively feature additional types of synthetic content 102 prompted by the user 110, 210, 310 (e.g., a synthetic face 102(1), followed by a synthetic body 102(2), followed by a synthetic background 102(3), etc.). In this manner, the additional prompts may be provided by the user 110, 210, 310 to build upon one or more previously-provided prompts. As such, the user 110, 210, 310 can continue to prompt the AI model(s) 100, 200, 300 with additional prompts in order to iteratively build upon an original synthetic manipulation of a source performance and/or a real-world scene that is being, or was, captured by a video capture device 202. 
Graham teaches at Paragraph [0074] At 508, in some examples, the processor(s) may generate, using the trained AI model(s) 100, 200, 300 based at least in part on the user-provided prompt data 112, 212, output data representing a synthetic face 102(1) and/or synthetic body part (e.g., synthetic body 102(2)). For example, the synthetic face 102(1) may be used to de-age an actor 206 through swapping the actor's 206 face with their own face when they were younger. As another example, the synthetic body 102(2) may be used to artificially dress the actor 206 in different clothes/attire). 
Re Claim 15: 
The claim 15 encompasses the same scope of invention as that of the claim 13 except additional claim limitation that generating a third prompt comprising the selected idea and the description of the scene of the individual scenario with a request to revise the description of the scene to include additional details and increase safety; and
processing the third prompt by the LLM to generate a revised description of the scene, wherein the additional prompt comprises the revised description and is used to generate the new image. 
Hassan and Inkpen further teach the claim limitation that generating a third prompt comprising the selected idea and the description of the scene of the individual scenario with a request to revise the description of the scene to include additional details and increase safety; and
processing the third prompt by the LLM to generate a revised description of the scene, wherein the additional prompt comprises the revised description and is used to generate the new image (
Hassan teaches at Paragraph [0049] FIG. 5 shows an embodiment of the system 100 that can be used to identify an interjecting participant using a large language model. This embodiment includes a selector module 105 that can be used to identify portions of a live meeting transcript 104. The selector module identifies portions of the transcript based on the activity of the meeting participants. For example, if a meeting includes a person who is a designated presenter, and the system detects that a second person's live audio signal indicates that the second person, who is not a designated presenter, has started to speak, the selector can identify a set of segments 116 from the transcript 104 to be sent as select content 106 to a large language model (LLM) 109. The select content 106 can be sent to the LLM with parameters 107 such as query prompts and policies. The query prompts and policies can include instructions that cause the LLM to determine if one person's statements are actually interrupting another person's statements. The LLM can then generate notification instructions 1101, which may cause the system to generate a notification 120 if the LLM determines that one person's statements interrupted a designated presenter. 
Hansan shows at FIG. 4 a scenario associated with each of FIGS. 3A-3E wherein the interrupting user 102L’s hand gesture is shown and the hand gesture is dynamically changed. 
Hassan teaches at Paragraph 0038 that as shown in FIG. 3B through FIG. 3D, the notification 120 increases over time. This can include causing the graphical element 120 to increase in size over time while the subsequent input indicates that User 4 10D is speaking over User 2 who is the designated presenter. The rate of the size increase and the size of the graphical element is based on a priority of first participant or a characteristic of the subsequent input. For example, if the designated presenter, User 2, is a higher ranking employee over User 4, and/or User 4 has spoken for more than a threshold period of time or is speaking over a threshold volume, the system can cause the notification 120 to increase at a faster rate or to a larger size versus a scenario where User 4 does not rank higher and/or does not meet the speaking criteria.
Inkpen teaches at FIG. 2 and Paragraph 0041 that the plurality of scenarios 260-270-280 includes a message for a caption including “1. Choices for conference location”, “2. Travel cost and living expenses” and “3. Climate in London”. Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”. 
Inkpen teaches at Paragraph 0057 that the segment image generator 308 may generate an animated avatar for a participant based on dialog from the participant, for example, so that the avatar's mouth appears to match the dialog. Animations may be generated to highlight facial expressions of a participant, actions taken by the participant. 
Inkpen teaches at Paragraph 0058 that the segment image generator 308 is configured to use a facial recognition algorithm (e.g., an instance of the neural network model 126) to extract a suitable image from the media stream. In one example, the segment image generator 308 extracts an image when the facial recognition algorithm indicates that the participant is smiling, laughing, frowning, or making another expression that represents the content of the corresponding segment. Inkpen teaches at Paragraph 0042 that each of Alice, Bob, Cher, and Katie contributed by providing at least a city name and the media stream processor 122 generates the segment image 260 for the first segment to include an avatar for each user, along with dialog provided by the corresponding user. The dialog may be displayed within a dialog bubble, such as dialog bubble 264, in a separate text box, or other suitable manner).  
Re Claim 16: 
The claim 16 encompasses the same scope of invention as that of the claim 12 except additional claim limitation that randomly selecting the second scenario from the plurality of scenarios; determining that the second scenario corresponds to the first and second persons, wherein the first person is the user and the second person is a friend of the user; and
generating an additional prompt with instructions for the LLM to generate a new image that depicts the second scenario using first and second images of faces of the user and the friend. 
Hassan and Inkpen further teach the claim limitation that randomly selecting the second scenario from the plurality of scenarios; determining that the second scenario corresponds to the first and second persons, wherein the first person is the user and the second person is a friend of the user; and generating an additional prompt with instructions for the LLM to generate a new image that depicts the second scenario using first and second images of faces of the user and the friend (Hassan teaches at Paragraph [0049] FIG. 5 shows an embodiment of the system 100 that can be used to identify an interjecting participant using a large language model. This embodiment includes a selector module 105 that can be used to identify portions of a live meeting transcript 104. The selector module identifies portions of the transcript based on the activity of the meeting participants. For example, if a meeting includes a person who is a designated presenter, and the system detects that a second person's live audio signal indicates that the second person, who is not a designated presenter, has started to speak, the selector can identify a set of segments 116 from the transcript 104 to be sent as select content 106 to a large language model (LLM) 109. The select content 106 can be sent to the LLM with parameters 107 such as query prompts and policies. The query prompts and policies can include instructions that cause the LLM to determine if one person's statements are actually interrupting another person's statements. The LLM can then generate notification instructions 1101, which may cause the system to generate a notification 120 if the LLM determines that one person's statements interrupted a designated presenter. 
Hansan shows at FIG. 4 a scenario associated with each of FIGS. 3A-3E wherein the interrupting user 102L’s hand gesture is shown and the hand gesture is dynamically changed. 
Hassan teaches at Paragraph 0038 that as shown in FIG. 3B through FIG. 3D, the notification 120 increases over time. This can include causing the graphical element 120 to increase in size over time while the subsequent input indicates that User 4 10D is speaking over User 2 who is the designated presenter. The rate of the size increase and the size of the graphical element is based on a priority of first participant or a characteristic of the subsequent input. For example, if the designated presenter, User 2, is a higher ranking employee over User 4, and/or User 4 has spoken for more than a threshold period of time or is speaking over a threshold volume, the system can cause the notification 120 to increase at a faster rate or to a larger size versus a scenario where User 4 does not rank higher and/or does not meet the speaking criteria.
Inkpen teaches at FIG. 2 and Paragraph 0041 that the plurality of scenarios 260-270-280 includes a message for a caption including “1. Choices for conference location”, “2. Travel cost and living expenses” and “3. Climate in London”. Inkpen teaches at Paragraph 0041 that the media stream processor 122 may identify a first segment from 00:00 to 00:30 and generate a segment label of “Choices for conference location”, a second segment from 00:30 to 01:22 with segment label “Travel cost and living expenses”, and a third segment from 01:12 to 01:35 with segment label “Climate in London”. 
Inkpen teaches at Paragraph 0057 that the segment image generator 308 may generate an animated avatar for a participant based on dialog from the participant, for example, so that the avatar's mouth appears to match the dialog. Animations may be generated to highlight facial expressions of a participant, actions taken by the participant. 
Inkpen teaches at Paragraph 0058 that the segment image generator 308 is configured to use a facial recognition algorithm (e.g., an instance of the neural network model 126) to extract a suitable image from the media stream. In one example, the segment image generator 308 extracts an image when the facial recognition algorithm indicates that the participant is smiling, laughing, frowning, or making another expression that represents the content of the corresponding segment. Inkpen teaches at Paragraph 0042 that each of Alice, Bob, Cher, and Katie contributed by providing at least a city name and the media stream processor 122 generates the segment image 260 for the first segment to include an avatar for each user, along with dialog provided by the corresponding user. The dialog may be displayed within a dialog bubble, such as dialog bubble 264, in a separate text box, or other suitable manner).  
Re Claim 17: 
The claim 17 encompasses the same scope of invention as that of the claim 1 except additional claim limitation generating a modification that the sharable content item feed includes a new sharable content item; in response to user selection of the notification, presenting the sharable content item feed including the individual content item; providing a user interface control enabling sharing of the individual content item; detecting a swipe gesture on a display presenting the individual content item; in response to detecting a swipe gesture, retrieving a next modified content item from the shareable item feed; and sequentially presenting the next modified content item, wherein the sequential presentation continues until no more content items remain in the shareable content item feed. 
White further teaches the claim limitation that generating a modification that the sharable content item feed includes a new sharable content item; in response to user selection of the notification, presenting the sharable content item feed including the individual content item; providing a user interface control enabling sharing of the individual content item; detecting a swipe gesture on a display presenting the individual content item; in response to detecting a swipe gesture, retrieving a next modified content item from the shareable item feed; and sequentially presenting the next modified content item, wherein the sequential presentation continues until no more content items remain in the shareable content item feed (White teaches at FIGS. 3A-3C and Paragraph 0054 that text boxes are presented as prompts from one or more participants. White teaches at Paragraph 0052 that the spoken text 322A uttered by the third participant 306 has been modified to eliminate the term “Thunderstorm,” such that the spoken text 322B transmitted to the second participant 306 comprises, “Oh, you mean the . . . project?”. 
White teaches at Paragraph 0092 generative model package 504 is pre-trained according to a variety of inputs (e.g., a variety of human languages, a variety of programming languages, and/or a variety of content types) and therefore need not be finetuned or trained for a specific scenario. Rather, generative model package 504 may be more generally pre-trained, such that conference input 502 includes a prompt that is generated, selected, or otherwise engineered to induce generative model package 504 to produce certain generative model output 506. For example, a prompt includes a context and/or one or more completion prefixes that thus preload generative model package 504 accordingly. As a result, generative model package 504 is induced to generate output based on the prompt that includes a predicted sequence of tokens (e.g., up to a token limit of generative model package 504) relating to the prompt. 
White teaches at Paragraph 0030 that machine learning framework 120 and/or machine learning interface 128 manages the evaluation of the conference input (e.g., generating subsequent requests to machine learning service 102 for subsequent detection and/or modification of confidential content) according to pre-designated samples of confidential content and at Paragraph 0033 that natural language input may be provided via the user interface (e.g., as text input and/or as voice input), which may be processed according to aspects described herein and used to detect and/or modify confidential content in a conference input accordingly.  
). 
Re Claim 18: 
The claim 18 is in parallel with the claim 1 in the form of an apparatus. The claim 18 is subject to the same rationale of rejection as the claim 1. 
Moreover, Inkpen further teaches a system comprising:
at least one processor; and
at least one memory component having instructions stored thereon that, when executed by the at least one processor, cause the at least one processor to perform operations [of the claim 1] ( 
Inkpen teaches at FIG. 9-10 and Paragraph 0105-0111 that a number of program modules and data files may be stored in the system memory 904. While executing on the processing unit 902, the program modules 906 (e.g., story board generation application 920) may perform processes including, but not limited to, the aspects, as described herein. The computer readable instructions are stored in the computer storage media including RAM, ROM, EEPROM, flash memory and the processor 1060 may be configured to execute one or more application programs 1066 stored in the non-volatile storage area 1068). 

Re Claim 19: 
The claim 19 encompasses the same scope of invention as that of the claim 19 except additional claim limitation that the second prompt includes safety criteria instructing the LLM to avoid generating scenarios that are not safe; and generating the individual content item comprises providing a third prompt to the LLM that includes one or more negative prompts to prevent generation of content items that fail to meet the safety criteria. 
White and Graham teach the claim limitation that the second prompt includes safety criteria instructing the LLM to avoid generating scenarios that are not safe; and generating the individual content item comprises providing a third prompt to the LLM that includes one or more negative prompts to prevent generation of content items that fail to meet the safety criteria (
White teaches at Paragraph 0052 that the mathematical formula 318B has been blurred to prevent disclosure. In other aspects, mathematical formula 318A may be redacted entirely from whiteboard 316 (not shown). Further, the title 320B of the book has been redacted and infilled to blend with the book binding. In other aspects, the title 320A may be blurred or otherwise obscured to prevent disclosure. As further illustrated, the spoken text 322A uttered by the third participant 306 has been modified to eliminate the term “Thunderstorm,” such that the spoken text 322B transmitted to the second participant 306 comprises, “Oh, you mean the . . . project?” In aspects, the term “Thunderstorm” may be replaced by a slight pause or a beep, for instance (represented by the ellipsis). If the term is muted resulting in a slight pause, the second participant 306 may be unaware of the deletion of the term; whereas if the term is replaced by another sound (e.g., a beep or bell), the second participant 306 may be aware of the deletion of the term. In some aspects, if the second participant 306 records the conferencing session, the content captured by the recording may correspond to the content authorized for viewing by the second participant 306 during the conferencing session. In other aspects, a recording of a conferencing session may be evaluated retrospectively. That is, the recording may be analyzed based on a user confidentiality level associated with a recipient of the forwarded recording rather than the participant of the meeting who recorded it. In this way, when a recipient has a lower confidentiality level than the participant who recorded the meeting, the multimodal ML model may be triggered to further modify detected confidential content in the recording to prevent disclosure to the recipient. In aspects, the multimodal ML model may apply modifications to either an original unmodified recording or to a modified copy.
 Graham teaches at Paragraph 0030 that the predefined set of prompts can be updated periodically as new prompts are discovered or to delete or otherwise modify existing ones of the predefined set of prompts. In some examples, the model(s) 100 can be constrained by the prompt data 106(3) including the predefined set of prompts, as opposed to allowing any and all prompts to trigger generative output from the model(s) 100. Additionally, or alternatively, the model(s) 100 can be constrained by a front-end filter that filters incoming prompts against a list of the predefined set of prompts, and if the incoming prompt is not on the list, a processor(s) may refrain from running the model(s) 100,
Graham teaches at Paragraph 0076 that this iterative feedback loop 120 can aid in fine-tuning the generation of photoreal synthetic content (e.g., by the AI model(s) 100, 200, 300 learning, through iterative feedback, to generate temporally-coherent output data 104,
Graham teaches at Paragraph [0084] that, additional user-provided prompt data 112, 212 may be received by the processor(s) after one or more iterations through the process 600, and in this scenario, the video content 108 displayed at block 616 may iteratively feature additional types of synthetic content 102 prompted by the user 110, 210, 310 (e.g., a synthetic face 102(1), followed by a synthetic body 102(2), followed by a synthetic background 102(3), etc.). In this manner, the additional prompts may be provided by the user 110, 210, 310 to build upon one or more previously-provided prompts. As such, the user 110, 210, 310 can continue to prompt the AI model(s) 100, 200, 300 with additional prompts in order to iteratively build upon an original synthetic manipulation of a source performance and/or a real-world scene that is being, or was, captured by a video capture device 202. 
Graham teaches at Paragraph [0074] At 508, in some examples, the processor(s) may generate, using the trained AI model(s) 100, 200, 300 based at least in part on the user-provided prompt data 112, 212, output data representing a synthetic face 102(1) and/or synthetic body part (e.g., synthetic body 102(2)). For example, the synthetic face 102(1) may be used to de-age an actor 206 through swapping the actor's 206 face with their own face when they were younger. As another example, the synthetic body 102(2) may be used to artificially dress the actor 206 in different clothes/attire). 

Re Claim 20: 
The claim 20 is in parallel with the claim 1 in the form of a computer program product. The claim 20 is subject to the same rationale of rejection as the claim 1. 
Moreover, Inkpen further teaches a non-transitory computer-readable storage medium having stored thereon instructions that, when executed by at least one processor, cause the at least one processor to perform operations [of the claim 1] ( 
Inkpen teaches at FIG. 9-10 and Paragraph 0105-0111 that a number of program modules and data files may be stored in the system memory 904. While executing on the processing unit 902, the program modules 906 (e.g., story board generation application 920) may perform processes including, but not limited to, the aspects, as described herein. The computer readable instructions are stored in the computer storage media including RAM, ROM, EEPROM, flash memory and the processor 1060 may be configured to execute one or more application programs 1066 stored in the non-volatile storage area 1068). 

Claim 2-4 are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. US-PGPUB No. 2024/0290331 (hereinafter LU) in view of Mikutel et al. US-PGPUB No. 2024/0311576 (hereinafter Mikutel); Inkpen et al. US-PGPUB No. 2025/0157103 (hereinafter Inkpen); Rivera-Rodriguez US-Patent No. 12,063,123 (hereinafter Rodriguez); Hassan et al. US-PGPUB No. 2025/0260770 (hereinafter Hasan); Dillon et al. US-PGPUB No. 2024/0289556 (hereinafter Dillon); 
Graham et al. US-PGPUB No. 2024/0346731 (hereinafter Graham); 
White US-PGPUB No. 2024/0411906 (hereinafter White); 
Paliarush et al. US-PGPUB No. 2025/0078323 (hereinafter Paliarush); 
and Pandey et al. US-PGPUB No. 2024/0427852 (hereinafter Pandey).  
Re Claim 2: 
The claim 2 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that determining that the individual scenario comprises first and second participants, the first participant corresponding to the user of the device; identifying a subset of friends associated with the user; and selecting as a friend to associate with the second participant an individual friend of the subset of friends. 
Mikutel in view of Inkpen and Dillon et al. US-PGPUB No. 2024/0289556 (hereinafter Dillon) and Pandey et al. US-PGPUB No. 2024/0427852 (hereinafter Pandey) teach the claim limitation that determining that the individual scenario comprises first and second participants, the first participant corresponding to the user of the device; identifying a subset of friends associated with the user; and selecting as a friend to associate with the second participant an individual friend of the subset of friends (
Pandey teaches at Paragraph 0054 that the media application may execute one or more natural language processing (NLP) algorithms (e.g., locally and/or remotely via the server 202 or another remote device) to identify the user intent or request. For example, the media application may transcribe an audio message, and the audio transcription may be analyzed in an analogous manner. In some embodiments, the media application may execute one or more audio analysis algorithms that directly analyze an audio message without transcribing the audio. In this example, the media application may generate a recommendation indicating the content item 212 for sharing (e.g., based on the second user profile and/or a context of the first message 226). As a non-limiting example, a text from Friend A may include a request, such as “Send me the photo that we took last night.” The media application, via one or more NLP techniques, may determine that one or more keywords from the text correspond to one or more metadata categories. In this example, the media application may determine that (i) “last night” indicates a metadata value for a time and/or date, (ii) “me” refers to Friend A, and (iii) “we” refers to a plurality of users including Friend A and the receiver. Based on the NLP analysis, the media application may generate one or more criteria for searching content having the matching metadata values (e.g., date: last night; user tags/face IDs: Friend A, first user device; content type: photo, image, picture). 
Inkpen teaches at Paragraph [0041] that the transcript 200 includes dialog content during a meeting among users Alice, Bob, Cher, and Katie as they discuss a location for an annual retreat. It is clearly understood that Alice, Bob, Cher and Katie are friends. 
Inkpen teaches at FIGS. 2 and Paragraph 0055 that each individual scenario comprises first and second participants (Alice and Cher), the first participant (Alice) corresponding to the user of the device; selecting a subset of participants (friends) associated with the user Alice to be depicted in an image. The other participants are friends of the user Alice. 
Dillon teaches at Paragraph 0051 that for instance, the system may narrow the topic of the active communication by searching stored chat history. In another example, the system may determine the context of the active communication based on the participants (e.g., personal, family, friends, public, etc.).). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have narrowed the topic of the transcript to the friend participants after the system determines the content of the active communication based on the friend participants. One of the ordinary skill in the art would have been motivated to have modified the participants to be limited to friends only. 
Re Claim 3: 
The claim 3 encompasses the same scope of invention as that of the claim 1 except additional claim limitation wherein generating the second prompt comprises: an instruction for the LLM to describe each of the plurality of scenarios with unique details that are relevant to the selected idea; an instruction for the LLM to keep qualifying attributes in each scenario description that hint at style or aesthetic and details of activity associated with the scenario; a character limit for each scenario description; an instruction for the LLM to include in each scenario description words describing framing, facial expressions and outfits of the one or more participants and an instruction for the LLM to exclude eye mask and face coverings from the plurality of scenarios. 
Mikutel in view of Graham/White, Inkpen; Paliarush and Dillon et al. US-PGPUB No. 2024/0289556 (hereinafter Dillon) and Pandey et al. US-PGPUB No. 2024/0427852 (hereinafter Pandey) teach the claim limitation that wherein generating the second prompt comprises: an instruction for the LLM to describe each of the plurality of scenarios with unique details that are relevant to the selected idea; an instruction for the LLM to keep qualifying attributes in each scenario description that hint at style or aesthetic and details of activity associated with the scenario; a character limit for each scenario description; an instruction for the LLM to include in each scenario description words describing framing, facial expressions and outfits of the one or more participants and an instruction for the LLM to exclude eye mask and face coverings from the plurality of scenarios (
White teaches at FIGS. 3A-3C and Paragraph 0054 that text boxes are presented as prompts from one or more participants. White teaches at Paragraph 0052 that the spoken text 322A uttered by the third participant 306 has been modified to eliminate the term “Thunderstorm,” such that the spoken text 322B transmitted to the second participant 306 comprises, “Oh, you mean the . . . project?”. 
White teaches at Paragraph 0092 generative model package 504 is pre-trained according to a variety of inputs (e.g., a variety of human languages, a variety of programming languages, and/or a variety of content types) and therefore need not be finetuned or trained for a specific scenario. Rather, generative model package 504 may be more generally pre-trained, such that conference input 502 includes a prompt that is generated, selected, or otherwise engineered to induce generative model package 504 to produce certain generative model output 506. For example, a prompt includes a context and/or one or more completion prefixes that thus preload generative model package 504 accordingly. As a result, generative model package 504 is induced to generate output based on the prompt that includes a predicted sequence of tokens (e.g., up to a token limit of generative model package 504) relating to the prompt. 
White teaches at Paragraph 0030 that machine learning framework 120 and/or machine learning interface 128 manages the evaluation of the conference input (e.g., generating subsequent requests to machine learning service 102 for subsequent detection and/or modification of confidential content) according to pre-designated samples of confidential content and at Paragraph 0033 that natural language input may be provided via the user interface (e.g., as text input and/or as voice input), which may be processed according to aspects described herein and used to detect and/or modify confidential content in a conference input accordingly.  
Graham teaches at Paragraph 0015 that a director may provide live prompts to the trained AI model to cause the model to generate output data representing synthetic content and at Paragraph 0032 that the user 110 may capture multiple images of their face exhibiting facial expressions and at Paragraph 0033 that the machine learning model 110 may be trained using sparse data pertaining to the subject and at Paragraph 0034 the AI model that is used to generate the synthetic data 106(4) utilizes at least some of the sparse data pertaining to the subject. 
Pariarush teaches at Paragraph 0051-0058 that the image generation module 108 generates face images 140 that can be combined with object images 120 to produce synthetic images based on multiple prompts including the selected prompt “young man in the street of Paris” where geographic-specific presentations of synthetic images 110 are retrieved and displayed to users who access the online platform 124 from the relevant geographic regions. Pariarush teaches at Paragraph 0061 that the image generation AI model is configured to interpret the prompt and generate a face based on the prompt and if a face exists in an object image and the face in an object image is not protected, the image generative AI model can automatically generates a face mask for a face in the object image. The face in an object image may be protected against masking, and the face then may not be regenerated. The image generative AI model generates a face depicting a young man in Paris and if the face is not protected, the image generative AI model regenerates the face in the object image representing a young man in Paris in addition to generating the background depicting the street of Paris in the object image. 
Pandey teaches at Paragraph 0054 that the media application may execute one or more natural language processing (NLP) algorithms (e.g., locally and/or remotely via the server 202 or another remote device) to identify the user intent or request. For example, the media application may transcribe an audio message, and the audio transcription may be analyzed in an analogous manner. In some embodiments, the media application may execute one or more audio analysis algorithms that directly analyze an audio message without transcribing the audio. In this example, the media application may generate a recommendation indicating the content item 212 for sharing (e.g., based on the second user profile and/or a context of the first message 226). As a non-limiting example, a text from Friend A may include a request, such as “Send me the photo that we took last night.” The media application, via one or more NLP techniques, may determine that one or more keywords from the text correspond to one or more metadata categories. In this example, the media application may determine that (i) “last night” indicates a metadata value for a time and/or date, (ii) “me” refers to Friend A, and (iii) “we” refers to a plurality of users including Friend A and the receiver. Based on the NLP analysis, the media application may generate one or more criteria for searching content having the matching metadata values (e.g., date: last night; user tags/face IDs: Friend A, first user device; content type: photo, image, picture). 
Inkpen teaches at Paragraph [0041] that the transcript 200 includes dialog content during a meeting among users Alice, Bob, Cher, and Katie as they discuss a location for an annual retreat. It is clearly understood that Alice, Bob, Cher and Katie are friends or best friends. 
Inkpen teaches at FIGS. 2 and Paragraph 0055 that each individual scenario comprises first and second participants (Alice and Cher), the first participant (Alice) corresponding to the user of the device; selecting a subset of participants (friends) associated with the user Alice to be depicted in an image. The other participants are friends of the user Alice. 
Dillon teaches at Paragraph 0051 that for instance, the system may narrow the topic of the active communication by searching stored chat history. In another example, the system may determine the context of the active communication based on the participants (e.g., personal, family, friends, public, etc.).). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have narrowed the topic of the transcript to the friend participants after the system determines the content of the active communication based on the friend participants. One of the ordinary skill in the art would have been motivated to have modified the participants to be limited to friends only. 
Re Claim 4: 
The claim 4 encompasses the same scope of invention as that of the claim 2 except additional claim limitation that accessing a chat history associated with the user;
identifying, in the chat history, a set of messages that were exchanged within a specified time interval; and selecting at least a portion of the subset of friends by identifying one or more friends that were involved in the exchange of the set of messages.
Mikutel in view of Inkpen and Dillon et al. US-PGPUB No. 2024/0289556 (hereinafter Dillon) and Pandey et al. US-PGPUB No. 2024/0427852 (hereinafter Pandey) teach the claim limitation that accessing a chat history associated with the user;
identifying, in the chat history, a set of messages that were exchanged within a specified time interval; and selecting at least a portion of the subset of friends by identifying one or more friends that were involved in the exchange of the set of messages (
Pandey teaches at Paragraph 0054 that the media application may execute one or more natural language processing (NLP) algorithms (e.g., locally and/or remotely via the server 202 or another remote device) to identify the user intent or request. For example, the media application may transcribe an audio message, and the audio transcription may be analyzed in an analogous manner. In some embodiments, the media application may execute one or more audio analysis algorithms that directly analyze an audio message without transcribing the audio. In this example, the media application may generate a recommendation indicating the content item 212 for sharing (e.g., based on the second user profile and/or a context of the first message 226). As a non-limiting example, a text from Friend A may include a request, such as “Send me the photo that we took last night.” The media application, via one or more NLP techniques, may determine that one or more keywords from the text correspond to one or more metadata categories. In this example, the media application may determine that (i) “last night” indicates a metadata value for a time and/or date, (ii) “me” refers to Friend A, and (iii) “we” refers to a plurality of users including Friend A and the receiver. Based on the NLP analysis, the media application may generate one or more criteria for searching content having the matching metadata values (e.g., date: last night; user tags/face IDs: Friend A, first user device; content type: photo, image, picture). 
Inkpen teaches at Paragraph [0041] that the transcript 200 includes dialog content during a meeting among users Alice, Bob, Cher, and Katie as they discuss a location for an annual retreat. It is clearly understood that Alice, Bob, Cher and Katie are friends or best friends. 
Inkpen teaches at FIGS. 2 and Paragraph 0055 that each individual scenario comprises first and second participants (Alice and Cher), the first participant (Alice) corresponding to the user of the device; selecting a subset of participants (friends) associated with the user Alice to be depicted in an image. The other participants are friends of the user Alice. 
Dillon teaches at Paragraph 0051 that for instance, the system may narrow the topic of the active communication by searching stored chat history. In another example, the system may determine the context of the active communication based on the participants (e.g., personal, family, friends, public, etc.).). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have narrowed the topic of the transcript to the friend participants after the system determines the content of the active communication based on the friend participants. One of the ordinary skill in the art would have been motivated to have modified the participants to be limited to friends only. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIN CHENG WANG whose telephone number is (571)272-7665. The examiner can normally be reached Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at 571-270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIN CHENG WANG/Primary Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Apr 12, 2024
Application Filed
Oct 31, 2025
Non-Final Rejection — §103
Dec 11, 2025
Response Filed
Mar 14, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/270,926
Patent 12594883
DISPLAY DEVICE FOR DISPLAYING PATHS OF A VEHICLE
2y 5m to grant Granted Apr 07, 2026
16/703,494
Patent 12597086
Tile Region Protection in a Graphics Processing System
2y 5m to grant Granted Apr 07, 2026
18/291,702
Patent 12592012
METHOD, APPARATUS, ELECTRONIC DEVICE AND READABLE MEDIUM FOR COLLAGE MAKING
2y 5m to grant Granted Mar 31, 2026
17/655,739
Patent 12586270
GENERATING AND MODIFYING DIGITAL IMAGES USING A JOINT FEATURE STYLE LATENT SPACE OF A GENERATIVE NEURAL NETWORK
2y 5m to grant Granted Mar 24, 2026
17/888,216
Patent 12579709
IMAGE SPECIAL EFFECT PROCESSING METHOD AND APPARATUS
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
59%
Grant Probability
69%
With Interview (+10.3%)
3y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 832 resolved cases by this examiner. Grant probability derived from career allow rate.