Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 5-6, 9-13, 17, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Rivera-Rodriguez (US 12,063,123) in view of Barbieri et al. (US 2023/0262293) and further in view of Chen et al. (US 2016/0014482).
Regarding claim 1, Rivera-Rodriguez teaches a computer-implemented method for automatic video generation (Rivera-Rodriguez teaches in col. 7, lines 9-13 video generation), the method comprising:
obtaining, by a computing system comprising one or more computing devices, a source video (Fig. 2, video streams being obtained through the system);
extracting, by the computing system, one or more sets of textual content associated with the source video (col. 4, lines 35-45, col. 5, lines 1 through col. 7, line 47);
generating, by the computing system, an input prompt based on the one or more sets of textual content, wherein the input prompt comprises an instruction for a generative sequence processing model (col. 16, lines 1-38 teaches a similar language modeling system that takes a text based transcript and generates a prompt to be used as an instruction for another model).
processing, by the computing system, the input prompt with the generative sequence processing model to generate, as an output of the generative sequence processing model, additional textual content for a support video (col. 16, lines 1-38 teaches a similar language modeling system that takes a text based transcript and generates a prompt to be used as an instruction for another model resulting in another set of textual outputs from the LLM);
However, while Rivera-Rodriguez teaches the claimed as discussed above, fails to explicitly teach, but Barbieri et al. teaches the claimed comprising:
inputting, by the computing system, the additional textual content to a video generation algorithm to automatically generate the support video (paragraph 27).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Barbieri into the system of Rivera-Rodriguez such that the output of the LLM can be utilized by the system of Barbieri to generate a video output because such an incorporation allows for the benefit of improving the user’s understanding of a long version video by improving the robustness of the textual representation and the diversity of video generations (paragraph 20).
Furthermore, while Rivera-Rodriguez and Barbieri teaches the claimed as discussed above, fails to teach, but Chen teaches a similar text processing system including:
While Rivera-Rodriguez teaches the generation of a prompt, fails to teach specifically the usage of prompt into a generative sequence processing model, however, Barbieri teaches “providing, by the computing system, an interactive interface comprising a display window that displays playback of the source video and one or more user selectable chips associated with the support video (Fig. 20A and supporting disclosure, wherein much like Rivera-Rodriguez and Barbieri, Chen utilizes textual data from a source video to generate a playlist which includes a listing of other video items related to the original source video).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Chen into the proposed combination of Rivera-Rodriguez and Barbieri such that the related video generated via Barbieri may also be incorporated as a “chip”/playlist item of Chen because such an incorporation allows for the benefit of increasing the user’s awareness towards a relationship between an original video and another video clip (which Barbieri teaches to a generative model generated additional video) as supported by paragraph 62 and 63 of Chen.
Regarding claim 2, Rivera-Rodriguez teaches the claimed wherein extracting, by the computing system, the one or more sets of textual content associated with the source video comprises extracting, by the computing system, a textual transcript of speech included within the source video (col. 4, lines 1-34, col. 5, lines 1 through col. 7, line 47 teaches textual transcript).
Regarding claim 5, Rivera-Rodriguez teaches the claimed wherein processing, by the computing system, the one or more sets of textual content with the generative sequence processing model comprises processing, by the computing system, the one or more sets of textual content together and a prompt with the generative sequence processing model (col. 4, lines 35-45, col. 5, lines 1 through col. 7, line 47 and col. 16, lines 1-38 teaches a similar language modeling system that takes a text based transcript and generates a prompt to be used as an instruction for another model resulting in another set of textual outputs from the LLM).
Regarding claim 6, Rivera-Rodriguez teaches the claimed wherein the prompt comprises an instruction to summarize the one or more sets of textual content (col. 16, lines 1-38 teaches a similar language modeling system that takes a text based transcript and generates a prompt to be used as an instruction for another model resulting in another set of textual outputs from the LLM).
Regarding claim 9, The computer-implemented method of claim 1, further comprising:
analyzing, by the computing system, one or more frames of the source video to generate one or more sets of visual content data (at least Figs. 4, 8, col. 9, lines 25-3, col. 12, lines 8-23 and col. 15, lines 15-42);
wherein inputting, by the computing system, the additional textual content to the video generation algorithm to automatically generate the support video comprises inputting, by the computing system, the additional textual content and the one or more sets of visual content data (at least Figs. 4, 8, col. 9, lines 25-3, col. 12, lines 8-23 and col. 15, lines 15-42).
Regarding claim 10, in the proposed combination above, Chen teaches wherein analyzing, by the computing system, the one or more frames of the source video to generate one or more sets of visual content data comprises processing, by the computing system, the one or more frames of the source video with a machine-learned face detection model to detect one or more faces in the one or more frames (paragraph 96).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Chen into the proposed combination of Rivera-Rodriguez and Barbieri because said incorporation allows for the benefit of detecting important parts of a video clip (paragraph 63).
Regarding claim 11, in the proposed combination above, Chen teaches wherein analyzing, by the computing system, the one or more frames of the source video to generate one or more sets of visual content data comprises detecting, by the computing system, one or more video shots in the one or more frames (paragraphs 63 and 88 teaches determining video segmentation boundaries for shots).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Chen into the proposed combination of Rivera-Rodriguez and Barbieri because said incorporation allows for the benefit of detecting important parts of a video clip (paragraph 63).
Regarding claims 12 and 13, in the proposed combination above, Chen teaches wherein analyzing, by the computing system, the one or more frames of the source video to generate one or more sets of visual content data comprises detecting, by the computing system, one or more logos or icons in the one or more frames (paragraphs 100-104 teaches determining logos in the source video).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Chen into the proposed combination of Rivera-Rodriguez and Barbieri because said incorporation allows for the benefit of detecting important boundaries/parts of a video clip (paragraph 63).
Regarding claim 17, Rivera-Rodriguez teaches the claimed further comprising: associating, by the computing system, the support video with one or more timestamps of the source video, wherein the one or more timestamps correspond to the one or more sets of textual content (at least Fig. 7 and col. 13, lines 57-64).
Regarding claim 19, Rivera-Rodriguez teaches the claimed wherein the generative sequence processing model comprises a transformer language model (at least col. 7, lines 14-34).
Claim 20 is rejected for the same reasons as discussed in claim 1 above, col. 21, lines 20-51 teaches computer readable storage media that stores instructions for executing the programming as claimed in claim 1 above.
Claims 3-4 are rejected under 35 U.S.C. 103 as being unpatentable over Rivera-Rodriguez (US 12,063,123) in view of Barbieri et al. (US 2023/0262293) further in view of Chen et al. (US 2016/0014482) and further in view of Baughman et al. (US 2022/0027550).
Regarding claim 3, Rivera-Rodriguez teaches the claimed as discussed above and while many textual sources for the video is implied, fails to explicitly teach, however, Baughman teaches wherein extracting, by the computing system, the one or more sets of textual content associated with the source video comprises extracting, by the computing system, a code snippet associated with the source video (paragraphs 63-65 teaches code associated with the video that allows for identifying the text data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Baughman into the proposed combination of Rivera-Rodriguez, Barbieri and Chen because such an incorporation allows for the benefit of improving the system by providing additional methods of related text retrieval to allow improvements int relevance and completeness.
Regarding claim 4, Rivera-Rodriguez teaches the claimed as discussed above and while many textual sources for the video is implied, fails to explicitly teach, however, Baughman teaches the claimed wherein extracting, by the computing system, the one or more sets of textual content associated with the source video comprises extracting, by the computing system, a linked document associated with the source video (paragraphs 63-65 teaches link documents associated with the video that allows for identifying the text data).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Baughman into the proposed combination of Rivera-Rodriguez, Barbieri and Chen because such an incorporation allows for the benefit of improving the system by providing additional methods of related text retrieval to allow improvements int relevance and completeness.
Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Rivera-Rodriguez (US 12,063,123) in view of Barbieri et al. (US 2023/0262293), further in view of Chen et al. (US 2016/0014482) and further in view of Pooja (US 2020/0364463).
Regarding claim 7, the proposed combination of Rivera-Rodriguez, Barbieri and Chen teaches the claimed as discussed above, however fails to, but Pooja teaches the claimed wherein the prompt comprises an instruction to explain one or more concepts included in the one or more sets of textual content (paragraphs 21, 51 and Figs. 2-3 teaches wherein a system is utilized to explain the video using transcribed audio text in a summary/outline video wherein the short form text is explained in long form).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Pooja into the proposed combination of Rivera-Rodriguez, Barbieri and Chen because said incorporation allows for the benefit of improving the accuracy and flexibility of digital summarizations (paragraphs 22-27).
Regarding claim 14, the proposed combination of Rivera-Rodriguez, Barbieri and Chen teaches the claimed as discussed above, however fails to, but Pooja teaches the claimed wherein analyzing, by the computing system, the one or more frames of the source video to generate one or more sets of visual content data comprises detecting, by the computing system, one or more sets of text or code in the one or more frames (paragraphs 21, 51 and Figs. 2-3 teaches wherein a system uses OCR to detect text or code in the video frames).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Pooja into the proposed combination of Rivera-Rodriguez, Barbieri and Chen because said incorporation allows for the benefit of improving the accuracy and flexibility of digital summarizations (paragraphs 22-27).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Rivera-Rodriguez (US 12,063,123) in view of Barbieri et al. (US 2023/0262293) further in view of Chen et al. (US 2016/0014482) and further in view of Gopalkrishna et al. (US 2024/0152767).
Regarding claim 8, the proposed combination of Rivera-Rodriguez, Barbieri and Chen teaches the claimed as discussed above, however fails to, but Gopalkrishna teaches wherein the prompt comprises an instruction to generate one or more pairs of questions and answers regarding one or more concepts included in the one or more sets of textual content (paragraph 6 teaches that in the prior art, text queries results in a generation of question and answer pairs).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Gopalkrishna into the proposed combination of Rivera-Rodriguez, Barbieri and Chen because said incorporation allows for the benefit of improving training of a model by self-generating question and answer pairs based on a text query (abstract and paragraph 6).
Claims 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Rivera-Rodriguez (US 12,063,123) in view of Barbieri et al. (US 2023/0262293) further in view of Chen et al. (US 2016/0014482) and further in view of Spoor et al. (US 2019/0332400).
Regarding claim 15, the proposed combination of Rivera-Rodriguez, Barbieri and Chen teaches the claimed as discussed above, however fails to, but Spoor teaches the claimed wherein the method further comprises performing, by the computing system, the video generation algorithm, wherein performing, by the computing system, the video generation algorithm comprises performing text- to-speech on the additional textual content to generate speech content for inclusion in the support video (paragraph 40 teaches a text-to-speech engine used to read out a response).
While the proposed combination teaches the generation of a support video using a generative model, Spoor would allow for the modification by using the text in the text factoid so that a text-to-speech engine may generate an audible output.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Spoor into the proposed combination of Rivera-Rodriguez, Barbieri and Chen because said incorporation allows for improving user experience by making it easier to comprehend textual content (Spoor: paragraphs 3-4).
Regarding claim 16, the proposed combination of Rivera-Rodriguez, Barbieri and Chen teaches the claimed as discussed above, however fails to, but Spoor teaches wherein performing, by the computing system, the video generation algorithm further comprises generating a synthetic talking head that corresponds to the speech content (paragraph 31).
While the proposed combination teaches the generation of a support video using a generative model, Spoor would allow for the modification by using the text in the text factoid so that a text-to-speech engine may generate an audible output along with an animated talking head (see Fig. 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the teachings of Spoor into the proposed combination of Rivera-Rodriguez, Barbieri and Chen because said incorporation allows for improving user experience by making it more interactive.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Rivera-Rodriguez (US 12,063,123) in view of Barbieri et al. (US 2023/0262293) further in view of Chen et al. (US 2016/0014482) and further in view of Story et al. (US 9,158,765).
Regarding claim 18, the proposed combination of Rivera-Rodriguez, Barbieri and Chen teaches the claimed as discussed above, however fails to, but Story teaches the claimed further comprising:
providing, by the computing system and during playback of the source video at the one or more timestamps, a user interface element that enables viewing of the support video (Figs. 1 and 4, wherein when an original “content item” (which is a video/movie in claim 2 and col. 5, lines 5-8 and col. 16, lines 33-41) is displayed at a particular point in time during the playback, an option for the user to playback a shorter abridged version is displayed on the user interface display).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to incorporate the proposed combination of Rivera-Rodriguez, Barbieri and Chen because said incorporation allows for the benefit of enhancing the user experience (col. 16, lines 33-50).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GELEK W TOPGYAL whose telephone number is (571)272-8891. The examiner can normally be reached M-F (9:30-6 PST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, William Vaughn can be reached at 571-272-3922. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GELEK W TOPGYAL/ Primary Examiner, Art Unit 2481