DETAILED ACTION
The action is responsive to the Application filed on 04/10/2024. Claims 1-26 are pending in the case. Claims 1, 25 and 26 are independent claims.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 2 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. As to claim 2, the claim recites “the training data” which lacks antecedent basis therefore making the claim indefinite. For the purposes of examination, Examiner assumed claim 2 to recite “
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-14, 17 and 19-26 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by D'Auria et al. (US 20240232937 A1, hereinafter D'Auria).
As to claim 1, D’Auria discloses a computer-implemented method for video editing comprising:
accessing a first text template, wherein the first text template is readable by a large language model (LLM) neural network, and wherein the first text template includes one or more control parameters ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188; "FIG. 27 demonstrates the web portal rendering an outline for a new TV commercial. This is generated by the operation of a generative model within output generation 116. For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline," D'Auria paragraph 0190; “Generative model 514 takes the form of a Large Language Model (LLM) pre-trained on large volumes of public data," D'Auria paragraph 0125, displaying concepts to a user for selection where each concept has an additional guidance parameter);
populating the first text template, wherein the populating includes information from within a website ("FIG. 27 demonstrates the web portal rendering an outline for a new TV commercial. This is generated by the operation of a generative model within output generation 116. For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline," D'Auria paragraph 0190; D’Auria Figure 27 “web portal”, populating a selected concept with additional guidance info where the guidance info is entered by the user via a web portal (i.e., is populated with info from a website));
submitting a request, to the LLM neural network, wherein the request includes the first text template that was populated ("Moving to FIG. 22, the process involves a similar approach but specifically focuses on generating an optimized TV commercial script. The guidance prompt 2202, such as “generate a 15-second television commercial script predicted to yield the highest score,” is input into the generative model 2204. The model then produces an optimized multimedia output 2206, which, in this case, is a script for a television commercial. This script is devised to align with the highest scoring prediction based on the model's understanding of return on ad spend or other targeted performance metrics," D'Auria paragraph 0183; "Lastly, FIG. 29 shows the screen following a user's selection of a concept via a create button consistent with the process in FIG. 22. The user inputs like target duration for a TV commercial are provided, and upon submission, the portal displays a script generated for the selected concept," D'Auria paragraph 0192; “For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline,” D’Auria paragraph 0190);
generating, by the LLM neural network, a first video script ("The output, therefore, is not just any commercial script but one that is fine-tuned to meet specific strategic goals, reflecting the sophisticated capabilities of the generative model in processing and interpreting complex input data to produce highly targeted multimedia content," D'Auria paragraph 0183);
creating a first short-form video, wherein the first short-form video is based on the first video script that was generated ("The user inputs like target duration for a TV commercial are provided, and upon submission, the portal displays a script generated for the selected concept. Additionally, a corresponding video may be generated as part of the output, showcasing the portal's ability to render comprehensive multimedia content," D'Auria paragraph 0192);
evaluating the first short-form video, wherein the evaluating is based on one or more performance metrics ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188); and
creating a second text template based on the one or more performance metrics ("While it is feasible to conclude the process at this stage, using scores 1610 as part of output generation, the system allows for further refinement. Scores 1610 can be inputted into a subsequent generative model 1614, along with a new guidance prompt 1612. This iterative process, where the same or a different generative model (like a large language model) is used, is exemplified in FIG. 16. The guidance prompt 1612 in this iteration could be, “Here is a list of television commercial concepts and associated scores; generate new television concepts for me that are predicted to score even higher,” D'Auria paragraph 0171, iterating on the prompt template to create concepts that score higher to create better scripts and better videos).
As to claim 2, D’Auria further discloses the method of claim 1 further comprising training a machine learning model, wherein training data includes the first text template that was populated, the first short-form video that was created, the information from within the website, and the evaluating ("Model 1608, trained on datasets similar to those in FIGS. 7 to 9, may employ various model architectures such as XGBoost, LightGBM, or Random Forest. These architectures are adept at recognizing patterns between input and target variables. The output from fitted model 1608 is represented by scores 1610, which in this example are floating values bounded between zero and one. Higher values within these scores indicate television concepts with a greater predicted return on investment," D'Auria paragraph 0170; "Leveraging scores 1610 and guidance prompt 1612, the generative model 1614 outputs new concepts 1616. These concepts are then provided as inputs to fitted model 1618, In this case, fitted model 1618 is the same as fitted model 1608 with only the input differing. The output of fitted model 1618 is represented by scores 1620. Notably, the second element of scores 1620 exhibits a higher predicted value than any scores achieved in score 1610, indicating a concept with a potentially higher return on investment. For instance, the concept “dog and woman sitting inside” from concepts 1616 is predicted to perform the best," D'Auria paragraph 0172; "In the context of FIG. 16 and its associated iterative performance refinement process, it is important to note that while the example focuses on the use of concepts as inputs for scoring by fitted models, the system is not limited to this type of input alone. As previously discussed and demonstrated with earlier figures, the architecture of the system is versatile and can accommodate a wide range of input formats. This includes, but is not limited to, encoded numeric data, image captioning data, and various forms of multimedia inputs," D'Auria paragraph 0173).
As to claim 3, D’Auria further discloses the method of claim 2 wherein the creating of the second text template is accomplished by the machine learning model ("The generative model 514 is designed to describe elements within the audiovisual data 502 and can encompass various AI model forms, including a visual question answering model as seen in FIG. 2 (VQA Model 212), a captioning model as shown in FIG. 3 (Pre-Trained Generative Model 304), or any AI model capable of outputting descriptive elements of audiovisual data. In some embodiments, Generative model 514 takes the form of a Large Language Model (LLM) pre-trained on large volumes of public data," D'Auria paragraph 0125).
As to claim 4, D’Auria further discloses the method of claim 3 further comprising removing, by the machine learning model, a control parameter from the one or more control parameters ("For instance, the generative model 1604 might be a sophisticated language model, and the guidance prompt 1602 could be a request for “10 concepts for a new television commercial”. The resulting concepts 1606 are then fed into the fitted model 1608," D'Auria paragraph 0169; "While it is feasible to conclude the process at this stage, using scores 1610 as part of output generation, the system allows for further refinement. Scores 1610 can be inputted into a subsequent generative model 1614, along with a new guidance prompt 1612. This iterative process, where the same or a different generative model (like a large language model) is used, is exemplified in FIG. 16. The guidance prompt 1612 in this iteration could be, “Here is a list of television commercial concepts and associated scores; generate new television concepts for me that are predicted to score even higher," D'Auria paragraph 0171, removing the "10 concepts" parameter).
As to claim 5, D’Auria further discloses the method of claim 3 further comprising adding, by the machine learning model, a new control parameter ("For instance, the generative model 1604 might be a sophisticated language model, and the guidance prompt 1602 could be a request for “10 concepts for a new television commercial”. The resulting concepts 1606 are then fed into the fitted model 1608," D'Auria paragraph 0169; "While it is feasible to conclude the process at this stage, using scores 1610 as part of output generation, the system allows for further refinement. Scores 1610 can be inputted into a subsequent generative model 1614, along with a new guidance prompt 1612. This iterative process, where the same or a different generative model (like a large language model) is used, is exemplified in FIG. 16. The guidance prompt 1612 in this iteration could be, “Here is a list of television commercial concepts and associated scores; generate new television concepts for me that are predicted to score even higher," D'Auria paragraph 0171, adding the "generate new television concepts that are predicted to score even higher" and "associated scores" parameters).
As to claim 6, D’Auria further discloses the method of claim 3 further comprising including, by the machine learning model, at least one natural language instruction ("For instance, the generative model 1604 might be a sophisticated language model, and the guidance prompt 1602 could be a request for “10 concepts for a new television commercial”. The resulting concepts 1606 are then fed into the fitted model 1608," D'Auria paragraph 0169; "While it is feasible to conclude the process at this stage, using scores 1610 as part of output generation, the system allows for further refinement. Scores 1610 can be inputted into a subsequent generative model 1614, along with a new guidance prompt 1612. This iterative process, where the same or a different generative model (like a large language model) is used, is exemplified in FIG. 16. The guidance prompt 1612 in this iteration could be, “Here is a list of television commercial concepts and associated scores; generate new television concepts for me that are predicted to score even higher," D'Auria paragraph 0171, adding the "generate new television concepts that are predicted to score even higher" and "associated scores" natural language instructions).
As to claim 7, D’Auria further discloses the method of claim 3 wherein the populating includes the second text template, and wherein the populating is accomplished by the machine learning model ("The utility of the method disclosed in FIG. 16 lies in its ability to identify and elevate novel concepts that exhibit higher utility than any initial concept provided to fitted model 1608. This iterative refinement process enhances the predictive power of the system, enabling the identification of highly effective concepts for television commercials, thereby optimizing potential returns on investment," D'Auria paragraph 0174).
As to claim 8, D’Auria further discloses the method of claim 7 wherein the generating further comprises producing a second video script ("Moving to FIG. 22, the process involves a similar approach but specifically focuses on generating an optimized TV commercial script. The guidance prompt 2202, such as “generate a 15-second television commercial script predicted to yield the highest score,” is input into the generative model 2204. The model then produces an optimized multimedia output 2206, which, in this case, is a script for a television commercial. This script is devised to align with the highest scoring prediction based on the model's understanding of return on ad spend or other targeted performance metrics," D'Auria paragraph 0183; "Lastly, FIG. 29 shows the screen following a user's selection of a concept via a create button consistent with the process in FIG. 22. The user inputs like target duration for a TV commercial are provided, and upon submission, the portal displays a script generated for the selected concept," D'Auria paragraph 0192).
As to claim 9, D’Auria further discloses the method of claim 8 wherein the creating further comprises producing a second short-form video ("The user inputs like target duration for a TV commercial are provided, and upon submission, the portal displays a script generated for the selected concept. Additionally, a corresponding video may be generated as part of the output, showcasing the portal's ability to render comprehensive multimedia content," D'Auria paragraph 0192).
As to claim 10, D’Auria further discloses the method of claim 9 wherein the evaluating includes the second short-form video ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188).
As to claim 11, D’Auria further discloses the method of claim 10 wherein the evaluating is accomplished by the machine learning model ("Model 1608, trained on datasets similar to those in FIGS. 7 to 9, may employ various model architectures such as XGBoost, LightGBM, or Random Forest. These architectures are adept at recognizing patterns between input and target variables. The output from fitted model 1608 is represented by scores 1610, which in this example are floating values bounded between zero and one. Higher values within these scores indicate television concepts with a greater predicted return on investment," D'Auria paragraph 0170).
As to claim 12, D’Auria further discloses the method of claim 10 wherein the training data includes the second text template that was populated, the second short-form video that was created, and the one or more performance metrics ("Model 1608, trained on datasets similar to those in FIGS. 7 to 9, may employ various model architectures such as XGBoost, LightGBM, or Random Forest. These architectures are adept at recognizing patterns between input and target variables. The output from fitted model 1608 is represented by scores 1610, which in this example are floating values bounded between zero and one. Higher values within these scores indicate television concepts with a greater predicted return on investment," D'Auria paragraph 0170; "Leveraging scores 1610 and guidance prompt 1612, the generative model 1614 outputs new concepts 1616. These concepts are then provided as inputs to fitted model 1618, In this case, fitted model 1618 is the same as fitted model 1608 with only the input differing. The output of fitted model 1618 is represented by scores 1620. Notably, the second element of scores 1620 exhibits a higher predicted value than any scores achieved in score 1610, indicating a concept with a potentially higher return on investment. For instance, the concept “dog and woman sitting inside” from concepts 1616 is predicted to perform the best," D'Auria paragraph 0172; "In the context of FIG. 16 and its associated iterative performance refinement process, it is important to note that while the example focuses on the use of concepts as inputs for scoring by fitted models, the system is not limited to this type of input alone. As previously discussed and demonstrated with earlier figures, the architecture of the system is versatile and can accommodate a wide range of input formats. This includes, but is not limited to, encoded numeric data, image captioning data, and various forms of multimedia inputs," D'Auria paragraph 0173).
As to claim 13, D’Auria further discloses the method of claim 12 wherein the creating includes a third text template ("The generative model 514 is designed to describe elements within the audiovisual data 502 and can encompass various AI model forms, including a visual question answering model as seen in FIG. 2 (VQA Model 212), a captioning model as shown in FIG. 3 (Pre-Trained Generative Model 304), or any AI model capable of outputting descriptive elements of audiovisual data. In some embodiments, Generative model 514 takes the form of a Large Language Model (LLM) pre-trained on large volumes of public data," D'Auria paragraph 0125).
As to claim 14, D’Auria further discloses the method of claim 2 wherein the training is accomplished using a genetic algorithm ("Leveraging scores 1610 and guidance prompt 1612, the generative model 1614 outputs new concepts 1616. These concepts are then provided as inputs to fitted model 1618, In this case, fitted model 1618 is the same as fitted model 1608 with only the input differing. The output of fitted model 1618 is represented by scores 1620. Notably, the second element of scores 1620 exhibits a higher predicted value than any scores achieved in score 1610, indicating a concept with a potentially higher return on investment," D'Auria paragraph 0172, using a model to output the best concept for "use in a video script in order to produce the most effective short-form video" (Specification paragraph 0034)).
As to claim 17, D’Auria further discloses the method of claim 1 wherein the one or more control parameters include one or more media instructions ("An example of a guidance prompt might be, “using the input model scores and associated audiovisual content, output a descriptive guidance for a new television commercial that is likely to yield the greatest return on investment.” The output guidance 1708, as generated by the model, provides strategic insights for creating effective television commercials," D'Auria paragraph 0176).
As to claim 19, D’Auria further discloses the method of claim 17 wherein the one or more media instructions include a voice-over ("An example of a guidance prompt might be, “using the input model scores and associated audiovisual content, output a descriptive guidance for a new television commercial that is likely to yield the greatest return on investment.” The output guidance 1708, as generated by the model, provides strategic insights for creating effective television commercials," D'Auria paragraph 0176).
As to claim 20, D’Auria further discloses the method of claim 17 wherein the one or more media instructions include a number of images ("An example of a guidance prompt might be, “using the input model scores and associated audiovisual content, output a descriptive guidance for a new television commercial that is likely to yield the greatest return on investment.” The output guidance 1708, as generated by the model, provides strategic insights for creating effective television commercials," D'Auria paragraph 0176).
As to claim 21, D’Auria further discloses the method of claim 1 wherein the evaluating further comprises rendering, to one or more viewers, the first short-form video that was created, wherein the rendering includes an ecommerce environment (D'Auria Figure 29 "Watch Video").
As to claim 22, D’Auria further discloses the method of claim 21 wherein the one or more performance metrics include an engagement metric ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188).
As to claim 23, D’Auria further discloses the method of claim 22 wherein the engagement metric is used to update the one or more control parameters within the first text template ("While it is feasible to conclude the process at this stage, using scores 1610 as part of output generation, the system allows for further refinement. Scores 1610 can be inputted into a subsequent generative model 1614, along with a new guidance prompt 1612. This iterative process, where the same or a different generative model (like a large language model) is used, is exemplified in FIG. 16. The guidance prompt 1612 in this iteration could be, “Here is a list of television commercial concepts and associated scores; generate new television concepts for me that are predicted to score even higher,” D'Auria paragraph 0171, iterating on the prompt template to create concepts that score higher to create better scripts and better videos).
As to claim 24, D’Auria further discloses the method of claim 23 wherein the one or more control parameters are obtained from a library of templates ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188, displaying concepts to a user for selection).
As to claim 25, D’Auria discloses a computer program product embodied in a non-transitory computer readable medium for video editing, the computer program product comprising code which causes one or more processors to perform operations (“At the core of this system lies Processor 100, which is the primary computing unit responsible for executing instructions, processing data, and managing the operations of the system. The Processor 100 is connected to Memory 102, a storage component that retains both the instructions for the operations of the Processor 100 and the data necessary for these operations,” D’Auria paragraph 0092) of:
accessing a first text template, wherein the first text template is readable by a large language model (LLM) neural network, and wherein the first text template includes one or more control parameters ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188; "FIG. 27 demonstrates the web portal rendering an outline for a new TV commercial. This is generated by the operation of a generative model within output generation 116. For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline," D'Auria paragraph 0190; “Generative model 514 takes the form of a Large Language Model (LLM) pre-trained on large volumes of public data," D'Auria paragraph 0125, displaying concepts to a user for selection where each concept has an additional guidance parameter);
populating the first text template, wherein the populating includes information from within a website ("FIG. 27 demonstrates the web portal rendering an outline for a new TV commercial. This is generated by the operation of a generative model within output generation 116. For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline," D'Auria paragraph 0190; D’Auria Figure 27 “web portal”, populating a selected concept with additional guidance info where the guidance info is entered by the user via a web portal (i.e., is populated with info from a website));
submitting a request, to the LLM neural network, wherein the request includes the first text template that was populated ("Moving to FIG. 22, the process involves a similar approach but specifically focuses on generating an optimized TV commercial script. The guidance prompt 2202, such as “generate a 15-second television commercial script predicted to yield the highest score,” is input into the generative model 2204. The model then produces an optimized multimedia output 2206, which, in this case, is a script for a television commercial. This script is devised to align with the highest scoring prediction based on the model's understanding of return on ad spend or other targeted performance metrics," D'Auria paragraph 0183; "Lastly, FIG. 29 shows the screen following a user's selection of a concept via a create button consistent with the process in FIG. 22. The user inputs like target duration for a TV commercial are provided, and upon submission, the portal displays a script generated for the selected concept," D'Auria paragraph 0192; “For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline,” D’Auria paragraph 0190);
generating, by the LLM neural network, a first video script ("The output, therefore, is not just any commercial script but one that is fine-tuned to meet specific strategic goals, reflecting the sophisticated capabilities of the generative model in processing and interpreting complex input data to produce highly targeted multimedia content," D'Auria paragraph 0183);
creating a first short-form video, wherein the first short-form video is based on the first video script that was generated ("The user inputs like target duration for a TV commercial are provided, and upon submission, the portal displays a script generated for the selected concept. Additionally, a corresponding video may be generated as part of the output, showcasing the portal's ability to render comprehensive multimedia content," D'Auria paragraph 0192);
evaluating the first short-form video, wherein the evaluating is based on one or more performance metrics ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188); and
creating a second text template based on the one or more performance metrics ("While it is feasible to conclude the process at this stage, using scores 1610 as part of output generation, the system allows for further refinement. Scores 1610 can be inputted into a subsequent generative model 1614, along with a new guidance prompt 1612. This iterative process, where the same or a different generative model (like a large language model) is used, is exemplified in FIG. 16. The guidance prompt 1612 in this iteration could be, “Here is a list of television commercial concepts and associated scores; generate new television concepts for me that are predicted to score even higher,” D'Auria paragraph 0171, iterating on the prompt template to create concepts that score higher to create better scripts and better videos).
As to claim 26, D’Auria discloses a computer system for video editing comprising:
a memory which stores instructions (“At the core of this system lies Processor 100, which is the primary computing unit responsible for executing instructions, processing data, and managing the operations of the system. The Processor 100 is connected to Memory 102, a storage component that retains both the instructions for the operations of the Processor 100 and the data necessary for these operations,” D’Auria paragraph 0092);
one or more processors coupled to the memory wherein the one or more processors, when executing the instructions which are stored, are configured (“At the core of this system lies Processor 100, which is the primary computing unit responsible for executing instructions, processing data, and managing the operations of the system. The Processor 100 is connected to Memory 102, a storage component that retains both the instructions for the operations of the Processor 100 and the data necessary for these operations,” D’Auria paragraph 0092) to:
access a first text template, wherein the first text template is readable by a large language model (LLM) neural network, and wherein the first text template includes one or more control parameters ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188; "FIG. 27 demonstrates the web portal rendering an outline for a new TV commercial. This is generated by the operation of a generative model within output generation 116. For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline," D'Auria paragraph 0190; “Generative model 514 takes the form of a Large Language Model (LLM) pre-trained on large volumes of public data," D'Auria paragraph 0125, displaying concepts to a user for selection where each concept has an additional guidance parameter);
populate the first text template, wherein populating includes information from within a website ("FIG. 27 demonstrates the web portal rendering an outline for a new TV commercial. This is generated by the operation of a generative model within output generation 116. For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline," D'Auria paragraph 0190; D’Auria Figure 27 “web portal”, populating a selected concept with additional guidance info where the guidance info is entered by the user via a web portal (i.e., is populated with info from a website));
submit a request, to the LLM neural network, wherein the request includes the first text template that was populated ("Moving to FIG. 22, the process involves a similar approach but specifically focuses on generating an optimized TV commercial script. The guidance prompt 2202, such as “generate a 15-second television commercial script predicted to yield the highest score,” is input into the generative model 2204. The model then produces an optimized multimedia output 2206, which, in this case, is a script for a television commercial. This script is devised to align with the highest scoring prediction based on the model's understanding of return on ad spend or other targeted performance metrics," D'Auria paragraph 0183; "Lastly, FIG. 29 shows the screen following a user's selection of a concept via a create button consistent with the process in FIG. 22. The user inputs like target duration for a TV commercial are provided, and upon submission, the portal displays a script generated for the selected concept," D'Auria paragraph 0192; “For example, after a user selects a concept they find intriguing, they might click a link to generate an outline for this optimized video topic. The user is then presented with options to input additional guidance, which, along with the selected video concept, is fed into a generative model to create a video outline,” D’Auria paragraph 0190);
generate, by the LLM neural network, a first video script ("The output, therefore, is not just any commercial script but one that is fine-tuned to meet specific strategic goals, reflecting the sophisticated capabilities of the generative model in processing and interpreting complex input data to produce highly targeted multimedia content," D'Auria paragraph 0183);
create a first short-form video, wherein the first short-form video is based on the first video script that was generated ("The user inputs like target duration for a TV commercial are provided, and upon submission, the portal displays a script generated for the selected concept. Additionally, a corresponding video may be generated as part of the output, showcasing the portal's ability to render comprehensive multimedia content," D'Auria paragraph 0192);
evaluate the first short-form video, wherein evaluating is based on one or more performance metrics ("FIG. 25 displays the web portal's capability to present optimized video content recommendations ordered by model scores. This process aligns with the ranking mechanism depicted in FIG. 15, where concepts for new television commercials are generated, likely to perform best based on ROI. An example could include generating concepts for an online retail pharmacy's television commercials, utilizing a process similar to that in FIG. 16," D'Auria paragraph 0188); and
create a second text template based on the one or more performance metrics ("While it is feasible to conclude the process at this stage, using scores 1610 as part of output generation, the system allows for further refinement. Scores 1610 can be inputted into a subsequent generative model 1614, along with a new guidance prompt 1612. This iterative process, where the same or a different generative model (like a large language model) is used, is exemplified in FIG. 16. The guidance prompt 1612 in this iteration could be, “Here is a list of television commercial concepts and associated scores; generate new television concepts for me that are predicted to score even higher,” D'Auria paragraph 0171, iterating on the prompt template to create concepts that score higher to create better scripts and better videos).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over D'Auria et al. (US 20240232937 A1, hereinafter D'Auria) in view of Grant et al. (US 20240371407 A1, hereinafter Grant).
As to claim 15, D’Auria discloses the method of claim 1 however D’Auria does not appear to explicitly disclose a limitation wherein the one or more control parameters include a tone.
Grant teaches a limitation wherein the one or more control parameters include a tone ("By way of example, the prompt generation data may include set text components such as:," Grant paragraph 0064; "A tone component, which includes text that cues the machine learning model to generate cohesion information that has a particular tone, character or quality (e.g. happy, fun, playful, serious, business, dark, scary, or any other tone/character/quality)," Grant paragraph 0064).
Accordingly it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of D’Auria to allow the user to input a tone control parameter as taught by Grant. One would have been motivated to make such a combination so that the user could have more control over the input parameters thus allowing more tools for the user to more accurately create a video according to his/her vision.
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over D'Auria et al. (US 20240232937 A1, hereinafter D'Auria) in view of Orton et al. (US 20240346709 A1, hereinafter Orton).
As to claim 16, D’Auria discloses the method of claim 1 however D’Auria does not appear to explicitly disclose a limitation wherein the one or more control parameters include a target audience.
Orton teaches a limitation wherein the one or more control parameters include a target audience ("The input parameter recommender 202 uses input from the user database 316 about the observed interactions and it also uses the prompt from the user (text, an image description, a video description, a 3D model description, a visual content item description, a target audience or a product description)," Orton paragraph 0042).
Accordingly it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of D’Auria to allow the user to input a target audience control parameter as taught by Orton. One would have been motivated to make such a combination so that the user could have more control over the input parameters thus allowing more tools for the user to more accurately create a video according to his/her vision.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over D'Auria et al. (US 20240232937 A1, hereinafter D'Auria) in view of Lyons et al. (US 20240362968 A1, hereinafter Lyons).
As to claim 18, D’Auria discloses the method of claim 17 however D’Auria does not appear to explicitly disclose a limitation wherein the one or more media instructions include a camera, an exposure, or a f-stop.
Lyons teaches a limitation wherein the one or more media instructions include a camera, an exposure, or a f-stop ("In other examples, the processor can detect prompt-related data in other ways. For example, prompt assistors can suggest well-known game designs or designer styles to select from, famous music or musician names/styles, examples of famous artwork or artist name/styles, etc. In addition, the processor can allow for specific modifiers to be used, created, customized, etc., such as certain types of photography filters, lighting settings, cinematic angles, close-ups, lengths or durations, camera movements, art type (e.g., realistic, cartoon, etc.), negative prompts (e.g., “not,” “no redundancy,” “no common answers,” “no text in images,” etc.), and so forth. The processor can also present a search feature, such as a search box that can suggest and/or auto-fill with possible matching prompts," Lyons paragraph 0064).
Accordingly it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the method of D’Auria to allow the user to input a camera control parameter as taught by Lyons. One would have been motivated to make such a combination so that the user could have more control over the input parameters thus allowing more tools for the user to more accurately create a video according to his/her vision.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20150143413 A1 to Hall et al. discloses a method and system for automatically generating interstitial material related to video content where a script is generated to create video content;
US 20150332666 A1 to Dayan et al. discloses a method for automatically transforming text into video where a script is generated and video content is created based on the script;
US 20160005436 A1 to Axen et al. discloses automatic generation of video from structured content where website data is used to generate a script and that script is used to generate video content;
US 20170169516 A1 to Sohoni et al. discloses methods and systems for automatic generation of medias from financial/corporate information where financial information is used to generate a script and using the generated script to generate video content;
US 20190087870 A1 to Gardyne et al. discloses a personal video commercial studio system where a script template is filled in with information and is used to create a video; and
US 20240212716 A1 to Ramesh et al. discloses a method and system for generating synthetic video advertisements where videos are generated from a generated script and where the videos are regenerated if the video does not score high enough.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL SAMWEL whose telephone number is (313)446-6549. The examiner can normally be reached Monday through Thursday 8:00-6:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kieu Vu can be reached at (571) 272-4057. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DANIEL SAMWEL/ Primary Examiner, Art Unit 2171