Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This non-final office action is in response to the Application filed on 11/14/2023.
Claim(s) 1-20 are pending for examination. Claim(s) 1, 12, 17 is/are independent claim(s).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 8, 9, 10, 11, 12, 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grillo; Joseph et al. US Pub. No. 2025/0063140 (Grillo) in view of Truong; Anh Lan et al. US Pub. No. 20250140292 (Truong).
Claim 1:
Grillo teaches:
A method for generating storyboards [¶ 0005] (generate a smart topic from one or more transcripts of video calls that relate to a label or category of subject matter within the transcripts), comprising:
providing an extraction prompt to a first generative neural network model, wherein the extraction prompt is a text-based prompt that instructs the first generative neural network model how to identify timestamps of segments having related content within transcripts according to dialog within the transcripts [¶ 0025, 43, 93, ] (large language model (LLM) prompt) [¶ 0041] (generative adversarial neural network) [¶ 0064, 93, 113] (identify timestamps) [¶ 0025-38] (transcript refers to a digitized text version of data captured during a phone call or a video call);
providing a transcript of a meeting as an input to the first generative neural network model [¶ 0082-83] (Fig. 3, transcript is provided to LLM);
receiving, from the first generative neural network model, segment timestamps for identified segments within the meeting based on the extraction prompt and the transcript [¶ 0064] (determine timestamps where a particular topic was mentioned or to identify all video calls where a particular user account mentioned a certain topic) and to identify portions of transcript 204 that relate to the objective and process (e.g., break downs or separate) transcript 204 into portions that relate to the objective); and …
Grillo does not appear to explicitly disclose “generating segment images for the identified segments using a second generative neural network model”.
However, the disclosure of Truong teaches:
generating segment images for the identified segments using a second generative neural network model, wherein each of the segment images represents segment content within a corresponding identified segment [¶ 0051, 105, 109] (video editing application causes a generative language model to select scenes from the scene captions of the visual scenes that match each sentence of the summary) [¶ 0044, 90] (examples of visual scenes with corresponding scene captions is shown in FIG. 3) [¶ 0045, 91] (FIG. 4 illustrates an example of a diarized transcript and word-level timing with corresponding frames of each visual scene).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong, with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of generating segment images in Truong could be applied to the topic generation in Grillo. Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, for making video editing less tedious and challenging [Truong: ¶ 0002-03].
Claim 8:
The claim(s) above are obvious over Grillo and Truong, which teach all the elements of the claim(s) as shown above. The claim would be further obvious over the teachings of Grillo, which teaches:
The method of claim 4, further comprising:
… a link …
augmenting the segment image to include the link [¶ 0045] (link).
… a timestamp … [¶ 0064, 93, 113] (generate a list of timestamps where certain topics were discussed across one or more video calls)
Grillo does not appear to explicitly disclose “or playback of a media stream of the meeting”.
However, the disclosure of Truong teaches:
generating … for playback of a media stream of the meeting at a timestamp corresponding to the text from the dialog within the identified segment [¶ 0109] (selected video segments are assembled in to a shortened video 238 of the input video 204 and output 240 to the end user for display and/or editing) [¶ 0045] (visual scenes and corresponding scene captions may be extracted from the input video and associated with an extracted diarized and timestamped transcript to generate an augmented transcript); and …
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong, with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of generating segment images at a timestamp in Truong could be applied to the topic generation in Grillo. Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, for making video editing less tedious and challenging [Truong: ¶ 0002-03].
Claim 9:
The claim(s) above are obvious over Grillo and Truong, which teach all the elements of the claim(s) as shown above. The claim would be further obvious over the teachings of Truong, which teaches:
The method of claim 1, wherein the second generative neural network model is trained to generate images from text [¶ 0066, 69, 71, 125, 129, 131] (prompt a generative AI model to retrieve an image(s) and/or a video(s) from a library and/or generate an image(s) and/or a video(s) that is relevant to the identified phrase and/or words so that the video editing application can insert the retrieved and/or generated image and/or video into the video segment for additional emphasis of the identified phrase and/or words).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong, with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of generating segment images at a timestamp in Truong could be applied to the topic generation in Grillo. Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, for making video editing less tedious and challenging [Truong: ¶ 0002-03].
Claim 10:
Grillo teaches:
The method of claim 1, wherein the first generative neural network model is a generative large language model [¶ 0025, 43, 93, ] (large language model (LLM) prompt) [¶ 0041] (generative adversarial neural network).
Claim 11:
The claim(s) above are obvious over Grillo and Truong, which teach all the elements of the claim(s) as shown above. The claim would be further obvious over the teachings of Truong, which teaches:
The method of claim 1, wherein the transcript of the meeting is extracted from a media stream of the meeting that comprises audio and video [¶ 0051, 105, 109] (video editing application causes a generative language model to select scenes from the scene captions of the visual scenes that match each sentence of the summary) [¶ 0044, 90] (examples of visual scenes with corresponding scene captions is shown in FIG. 3) [¶ 0045, 91] (FIG. 4 illustrates an example of a diarized transcript and word-level timing with corresponding frames of each visual scene).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong, with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of generating segment images at a timestamp in Truong could be applied to the topic generation in Grillo. Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, for making video editing less tedious and challenging [Truong: ¶ 0002-03].
Claim(s) 2, 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grillo; Joseph et al. US Pub. No. 2025/0063140 (Grillo) in view of Truong; Anh Lan et al. US Pub. No. 20250140292 (Truong) in view of Malan; Gerald et al. US Pub. No. 2024/0073368 (Malan).
Claim 2:
Grillo and Truong teach all the elements of the claims as shown above.
Truong also teaches: [¶ 0108] (input desired length into prompt tool), this could be “how to identify segment labels”, but Malan is more explicit in user defined labels.
Grillo and Truong do not appear to explicitly disclose “prompt further instructs the first generative neural network model how to identify segment labels”.
However, the disclosure of Malan teaches:
The method of claim 1, wherein the extraction prompt further instructs the first generative neural network model how to identify segment labels according to content within identified segments, the method further comprising: receiving, from the first generative neural network model, segment labels for the identified segments based on the extraction prompt and the transcript [¶ 0009, 132, 146, 148] (labels are both system default defined and user-defined, user defined could mean “how to identify segment labels”) [¶ 0008, 98, 130, 135, 190] (labels are inserted into the text and metadata segments in connection with the portion of the transcript containing the trigger words. Then the labels and other embedded information are employed to extract user intent via a large language model (LLM)); and
labeling the segment images with a corresponding segment label [¶ 0128-139] (process of labeling, labeling feature is used to denote parts of a meeting that are most useful for summarization, sharing, and integrating into an organization's business processes).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong and the method of , with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of could be applied to generating segment images in Truong and the topic generation in Grillo. Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, “to make it easy to quickly find features by a meeting participant after the meeting concludes” [Malan: ¶ 0163].
Claim(s) 3-7, 14-16, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grillo; Joseph et al. US Pub. No. 2025/0063140 (Grillo) in view of Truong; Anh Lan et al. US Pub. No. 20250140292 (Truong) in view of Sen; Susanto et al. US Pub. No. 2021/0407158 (Sen).
Claim 3:
The claims above are obvious over Grillo and Truong which teach all the elements of the claims as shown above.
Grillo and Truong do not appear to explicitly disclose “an avatar for a source user of dialog within the identified segment”.
However, the disclosure of Sen teaches:
The method of claim 1, wherein generating the segment images comprises generating, within a segment image for an identified segment, an avatar for a source user of dialog within the identified segment [¶ 0021-23] (transcribed text with a chat bubble nest to avatar).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong and the method of avatar text bubbles in Sen, with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of avatar chat bubbles in Sen could be applied to generating segment images in Truong and the topic generation in Grillo. Sen, Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, to easily associate speech or text with a person or avatar [Sen: ¶ 0006, 21].
Claim 4:
The claim(s) above are obvious over Grillo, Truong and Sen, which teach all the elements of the claim(s) as shown above. The claim would be further obvious over the teachings of Truong, which teaches:
The method of claim 3, further comprising augmenting the segment image for the identified segment with text from the dialog within the identified segment [¶ 0044-45, 90-91] (transcribed text with extracted scene, Figs. 3-4).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong and the method of avatar text bubbles in Sen, with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of generating segment images at a timestamp in Truong could be applied to the topic generation in Grillo and the method of avatar text bubbles in Sen. Sen, Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, for making video editing less tedious and challenging [Truong: ¶ 0002-03].
Claim 5:
The claim(s) above are obvious over Grillo, Truong and Sen, which teach all the elements of the claim(s) as shown above. The claim would be further obvious over the teachings of Sen, which teaches:
The method of claim 4, wherein the text from the dialog within the identified segment is depicted within a dialog bubble for the avatar [¶ 0021-23] (transcribed text with a chat bubble nest to avatar, Fig. 1).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong and the method of avatar text bubbles in Sen, with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of avatar chat bubbles in Sen could be applied to generating segment images in Truong and the topic generation in Grillo. Sen, Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, to easily associate speech or text with a person or avatar [Sen: ¶ 0006, 21].
Claim 6:
The claim(s) above are obvious over Grillo, Truong and Sen, which teach all the elements of the claim(s) as shown above. The claim would be further obvious over the teachings of Sen, which teaches:
The method of claim 5, wherein the avatar is a captured image portion from a media stream of the meeting [¶ 0021-23] (transcribed text with a chat bubble nest to avatar).
Claim 7:
The claim(s) above are obvious over Grillo, Truong and Sen, which teach all the elements of the claim(s) as shown above. The claim would be further obvious over the teachings of Sen, which teaches:
The method of claim 5, wherein the avatar is generated based on a likeness of the source user [¶ 0020] (Fig. 1, virtual representations of other users, the avatars are different and represent the users).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the method of topic generation for video calls using a LLM in Grillo and the method of segmenting and augmenting videos using generative AI in Truong and the method of avatar text bubbles in Sen, with a reasonable expectation of success.
The motivation for doing so would have been the use of known technique to improve similar devices (methods, or products) in the same way; (See KSR Int’l Co. v. Teleflex Inc., 550 US 398, 82 USPQ2d 1385, 1396 (U.S. 2007) and MPEP § 2143(D)).
The know technique of avatar chat bubbles in Sen could be applied to generating segment images in Truong and the topic generation in Grillo. Sen, Truong and Grillo are similar devices because each use and LLM to analyze meetings. One of ordinary skill in the art would have recognized that applying the known technique would improve the similar devices and resulted in an improved system, with a reasonable expectation of success, to easily associate speech or text with a person or avatar [Sen: ¶ 0006, 21].
Claims 12-20:
Claim(s) 12, 17 is/are substantially similar to claim 1 and is/are rejected using the same prior art and the same reason, rationale and/or motivation as used above.
Claim 1 is a “method” claim, claim 12 is a “system” claim and claim 17 is a “method” claim with broader limitation and missing the “timestamp” of claim 1, but the steps or elements of each claim are essentially the same.
Claim(s) 13 is/are substantially similar to claim 2 and is/are rejected using the same prior art and the same reason, rationale and/or motivation as used above.
Claim(s) 14, 19 is/are substantially similar to claim 3 and is/are rejected using the same prior art and the same reason, rationale and/or motivation as used above.
Claim(s) 15 is/are substantially similar to claims 4-5 and is/are rejected using the same prior art and the same reason, rationale and/or motivation as used above.
Claim(s) 16, 18 is/are substantially similar to claim 6 and is/are rejected using the same prior art and the same reason, rationale and/or motivation as used above.
Claim(s) 20 is/are substantially similar to claim 4 and is/are rejected using the same prior art and the same reason, rationale and/or motivation as used above.
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Please See PTO-892: Notice of References Cited.
Evidence of the level skill of an ordinary person in the art for Claim 1:
Aarabi; Pegah US 20200035234 generative, transcript, timestamp, label, prompt, generative adversarial units; 0039-text transcriptions and/or metadata which indicates topics, gestures, and/or other content of the audio-video recordings, timestamp and label.
Chand; Jesse et al. US 20220284650 transcript, text bubble, avatar, transcription is generated based on the audio data, where the transcription includes a text string and causes display of a presentation of the text string within a messaging interface associated with the communication session or within a text bubble that extends from a 3D avatar; Animating the 3D avatar can cause the 3D avatar to display facial expressions and gestures of the user, based on the mesh representation of the face of the user.
Citations to Prior Art
A reference to specific paragraphs, columns, pages, or figures in a cited prior art reference is not limited to preferred embodiments or any specific examples. It is well settled that a prior art reference, in its entirety, must be considered for all that it expressly teaches and fairly suggests to one having ordinary skill in the art. Stated differently, a prior art disclosure reading on a limitation of Applicant's claim cannot be ignored on the ground that other embodiments disclosed were instead cited. Therefore, the Examiner's citation to a specific portion of a single prior art reference is not intended to exclusively dictate, but rather, to demonstrate an exemplary disclosure commensurate with the specific limitations being addressed. In re Heck, 699 F.2d 1331, 1332-33,216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (CCPA 1968". In re: Upsher-Smith Labs. v. Pamlab, LLC, 412 F.3d 1319, 1323,75 USPQ2d 1213,1215 (Fed. Cir. 2005); In re Fritch, 972 F.2d 1260, 1264,23 USPQ2d 1780, 1782 (Fed. Cir. 1992); Merck & Co. v. Biocraft Labs., Inc., 874 F.2d 804, 807,10 USPQ2d 1843, 1846 (Fed. Cir. 1989); In re Fracalossi, 681 F.2d 792,794 n.1, 215 USPQ 569, 570 n.1 (CCPA 1982); In re Lamberti, 545 F.2d 747, 750, 192 USPQ 278, 280 (CCPA 1976); In re Bozek, 416 F.2d 1385,1390,163 USPQ 545, 549 (CCPA 1969).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN J SMITH whose telephone number is (571)270-3825. The examiner can normally be reached Monday - Friday 11:00 - 7:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ADAM QUELER can be reached at (571) 272-4140. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Benjamin Smith/Primary Examiner, Art Unit 2172 Direct Phone: 571-270-3825
Direct Fax: 571-270-4825
Email: benjamin.smith@uspto.gov