DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.
Status of the Claims
Prior to entry of the amendment(s) and/or consideration of the argument(s), the status of the claims is as follows.
Claim(s) 1-20 is/are pending.
Claims 2 and 12-13 are objected to because of informalities.
Claim(s) 1-8, and 11-14 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Chepkwony (U.S. Pat. App. Pub. No. 2024/0289360, hereinafter Chepkwony).
Claims 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chepkwony as applied to claim 1 above, and further in view of Yang (U.S. Pat. App. Pub. No. 2024/0257420, hereinafter Yang).
Claims 15-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chepkwony as applied to claim 14 above, and further in view of Ferrucci (U.S. Pat. App. Pub. No. 2022/0261817, hereinafter Ferrucci).
Claims 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chepkwony in view of Non-patent literature to Shahinian (Shahinian, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A. and Schulman, J., 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35, pp.27730-27744., hereinafter Shahinian).
Response to Amendments
Applicant’s amendment filed on 14 January 2026 has been entered.
In view of the amendment to the claim(s), the amendment of claim(s) 1-2, 8, 10, 12-14, and 16; the cancellation of claim(s) 15 and 19-20; and the addition of claim(s) 21-23 have been acknowledged and entered.
After entry of the amendments, claims 1-14, 16-18, and 21-23 remain pending.
In view of the amendment to claim(s) 2 and 12-13, the objection to claim(s) 2 and 12-13 is withdrawn.
In view of the amendment to claim(s) 1 and 14 and the cancellation of claim(s) 15 and 19-20, the rejection of claims 1-20 under 35 U.S.C. §102 and 35 U.S.C. §103 is withdrawn.
In light of the amended and newly added claims, new grounds for rejection under 35 U.S.C. §103 are provided in the action below.
Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §102 and 103, see pages 10-11 of the Response to Non-Final Office Action dated 22 October 2025, which was received on 14 January 2026 (hereinafter Response and Office Action, respectively), have been fully considered.
With respect to the rejection(s) of claim(s) 1 and 14 under 35 U.S.C. §102 as being anticipated by Chepkwony, applicant asserts that Chepkwony fails to teach or suggest all limitations of claims 1 and 14, as amended. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 1 and 14 under 35 U.S.C. §102 are withdrawn.
Claim(s) 15 and 19-20 are cancelled in this response, rendering the rejection of claims 15 and 19-20 moot. Therefore, the rejection of claims 15 and 19-20 under 35 U.S.C. §103 is withdrawn
Applicant further argues that the rejection(s) of dependent claims 2-13, and 16-18 should be withdrawn for at least the same reasons as independent claims 1 and 14. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 2-13, and 16-18 under 35 U.S.C. §102 and 35 U.S.C. §103 are withdrawn.
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of Chepkwony, Yang, and Ferrucci, and newly cited references Yu (U.S. Pat. App. Pub. No. 2022/0180052, hereinafter Yu) and Shahinian (U.S. Pat. App. Pub. No. 2024/0126981, hereinafter Shahinian).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-8, 11-14, and 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chepkwony in view of Yu.
Regarding claim 1, Chepkwony discloses A method implemented by one or more processors (Systems and methods for providing new content generation as implemented in an example system 100 including a processing device; Chepkwony, ¶ [0014]), the method comprising: receiving natural language (NL) based input associated with a client device of a user (The system receives data communication 205 from the user which includes “a prompt input field 310 is provided in the application UI 106. The prompt input field 310 receives user input (referred to as prompt input 312), such as a typed or uttered phrase or individual key words”; Chepkwony, ¶ [0023]-[0024], [0041]), the NL based input being indicative of a request for a set of slides to be generated (the user input is “corresponding to content wanted by the user for inclusion in new slides 304”; Chepkwony, ¶ [0041]); receiving a document associated with the request for the set of slides (“the preprocessor 202 further combines the context object {receiving a document...} with the received prompt input to generate a text query as input for the LLM 108” where “the context object is a data structure that includes information that can be used to understand context about the existing content of the slide presentation document 222 {...associated with the request for the set of slides}”; Chepkwony, ¶ [0025]-[0026]); generating a multi-modal response that is responsive to the NL based input (“prompt input 312 is received and combined with the extracted string content to form a text query” which is “transmitted to the LLM 108 as input,” where “LLM 108 generates a response including text output 324” and “the generated query” can further be modified to “request images”; Chepkwony, ¶ [0051], [0053]), the multi-modal response comprising a generated set of slides, (“text output 324” which is used by “the content generator 110” to generate “one or more prospective slides 322 based on the text output 324”; Chepkwony, ¶ [0054]-[0055]) wherein generating the multi-modal response that is responsive to the NL based input comprises: processing, using a large language model (LLM), LLM input to generate LLM output (To generate the prospective slides, “At operation 408, the text query is transmitted to the LLM 108 as input,” which is processed by the LLM 108 to “generate a response including text output 324 {LLM output}”; Chepkwony, ¶ [0054]), the LLM input including at least the NL based input and the document (The text input, as received by the LLM, is generated, in part, by “the preprocessor 202” which “combines the context object with the received prompt input to generate a text query as input for the LLM 108 {the LLM input...},” where the received prompt input includes the user input {...including at least the NL based input}; Chepkwony, ¶ [0026], [0054]); determining, based on the LLM output, and for each slide of the generated set of slides, textual content for inclusion in the multi-modal response (“the content generator 110 generates a one or more prospective slides 322 based on the text output 324 (and/or images) received in operation 410. In some examples, the text output 324 is parsed and separated into multiple prospective slides 322 based on one or more delimiters in the text output 324.”; Chepkwony, ¶ [0055]) and one or both of: (“In some implementations, the content generator 110 further includes a plurality of images 326 or other graphical elements in one or more prospective slides 322 where such images are included.”; Chepkwony, ¶ [0055]) a multimedia content tag that is indicative of multimedia content that is to be included in the multi-modal response (“the postprocessor 206 uses the text output from the LLM 108 to include in one or more graphical elements (e.g., images, animations, graphs) in one or more prospective slides” where, in response to the text output, “the postprocessor 206 is in communication with one or more other resources 224 to obtain or generate graphical elements for the prospective slides. As an example, the postprocessor 206 is in communication with a search engine to obtain a photograph, clip art, or other type of image relevant to the text output.” In this instance, the string content is understood as an graphical element tag {multimedia content tag} indicative of an graphical element {multimedia content} that is to be included in the prospective slides {multimodal response}.; Chepkwony, ¶ [0033], [0055]), or a generative multimedia content prompt that is indicative of generative multimedia content that is to be included in the multi-modal response (“the postprocessor 206 uses the text output from the LLM 108 to include in one or more graphical elements (e.g., images, animations, graphs) in one or more prospective slides” where the LLM output is received by the postprocessor, and “the postprocessor 206 is in communication with a resource 224, such as an ML image generation model, where the postprocessor 206 generates a text query {a generative multimedia content prompt} and... the AI art generation model generates and provides” the graphical element “relevant to text output of the LLM 108 {...that is indicative of generative multimedia content that is to be included in the multi-modal response}.”; Chepkwony, ¶ [0033], [0035], [0055]); and obtaining, based on the multimedia content tag and/or the generative multimedia content prompt, the multimedia content for inclusion in the multi- modal response (As read in combination, “the postprocessor 206 is in communication with a search engine to obtain a photograph, clip art, or other type of image relevant to the text output,” and/or “the postprocessor 206 generates a text query and... the AI art generation model generates and provides an image relevant to text output of the LLM 108” each of which is based on one or more components in “the text output from the LLM 108” which elicited the communication with the search engine and/or communication with the “AI art generation model.”; Chepkwony, ¶ [0033], [0035], [0055]); and causing the multi-modal response to be rendered at the client device of the user, (“graphical representations of the prospective slides 322 are presented in the application UI 106.”; Chepkwony, ¶ [0056]). However, Chepkwony fails to expressly recite wherein causing the multi-modal response to be rendered at the client device of the user comprises: causing the textual content to be rendered while the multimedia content is being obtained; and causing, responsive to the multimedia content being obtained, the multimedia content to be rendered.
Yu teaches systems and methods for “management of presentation content… where presentation content interacts with live feeds.. (Yu, ¶ Abstract). Regarding claim 1, Yu teaches wherein causing the multi-modal response to be rendered at the client device of the user comprises: causing the textual content to be rendered (“Method 250 begins at processing operation 252, where a slide-based presentation deck is displayed (or rendered) in a GUI of an application or service,” which establishes that the underlying slide, which contains the aforementioned textual content, is actively rendered first.; Yu, ¶ [0066]) while the multimedia content is being obtained (“During presentation of a slide-based presentation deck, flow of method 250 may proceed to processing operation 254. At processing operation 254, the presentation feed management component may be configured to detect access to a slide, of the slide-based presentation deck, that comprises, within the slide, a GUI object configured to activate a live camera feed,” thus concurrent with the rendering of the text on the display, the multimedia content (in this example, a live camera feed) is being obtained. Of note, this can include the generation of AI content as well as retrieval of existing video content.; Yu, ¶ [0067]); and causing, responsive to the multimedia content being obtained, the multimedia content to be rendered (Using the retrieved multimedia content the system then renders “a representation of one or more live camera feeds into presentation content”; Yu, ¶ [0067]-[0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony to incorporate the teachings of Yu to include wherein causing the multi-modal response to be rendered at the client device of the user comprises: causing the textual content to be rendered while the multimedia content is being obtained; and causing, responsive to the multimedia content being obtained, the multimedia content to be rendered. The systems described in Yu allows for “dynamic management” of “presentation content including interactions with live camera feeds during a presentation that is being conducted in real-time (or near real-time),” thus allowing for rendering of content as received with simultaneous planning for positioning and size of live or on-demand content through GUI objects, which avoids the known problem of disruptions to other rendered content, as recognized by Yu. (Yu, ¶ [0067]-[0068]).
Regarding claim 2, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses further comprising: receiving configuration data associated with the set of slides to be generated, (Data communication 205 can further include “an option or constraint may be provided that allows the user to select a desired word and/or slide count. For instance, the word and/or slide count is used to determine a maximum word count property in the query that causes the LLM 108 to generate a response within the maximum word count. In other examples, the content options correspond to one or more formatting properties of the existing content.”; Chepkwony, ¶ [0023], [0029]) wherein the LLM input processed by the LLM to generate LLM output includes the configuration data (Data communication 205, as depicted in FIG. 2, includes both option and constraint selection and/or other NL based input; Chepkwony, ¶ [0023]-[0024]; FIG. 2).
Regarding claim 3, the rejection of claim 2 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein the configuration data is extracted from the NL based input (In examples where prompt input includes configuration data “a text query is generated with the following string, ‘Given the following context:’+context_Object+’Do the following:’+input_prompt (e.g., “More details,” “Make this longer,” “Summarize this,” “Complete this,” or “Insert a story”),” where each is an example of configuration data which is extracted from the uttered phrases of the prompt input (Explained as part of the continuing example “the user may type or utter a phrase such as, ‘More details,’ ‘Make this longer,’ ‘Summarize this,’ ‘Complete this,’ or ‘Insert a story,’ which is received as the prompt input.”).; Chepkwony, ¶ [0024], [0026]).
Regarding claim 4, the rejection of claim 2 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein the configuration data is indicative of one or more of: a presentation duration, a number of slides to be included in the set of slides to be generated, an amount of multimedia content to include in the set of slides to be generated relative to the textual content included in the set of slides to be generated, and one or more types of multimedia content to include in the set of slides to be generated (In examples where prompt input includes configuration data “a text query is generated with the following string, ‘Given the following context:’+context_Object+’Do the following:’+input_prompt (e.g., “More details,” “Make this longer,” “Summarize this,” “Complete this,” or “Insert a story”)”; further discloses “a slide number selector 316, where the user is able to enter to select a number of desired slides”; “limiting a length of text returned and/or a length of text for each slide returned.”; Chepkwony, ¶ [0024], [0026], [0042]).
Regarding claim 5, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses further comprising: outputting, to the client device, the multi-modal response in a format suitable for opening by a presentation application (“graphical representations of the prospective slides 322 are presented in the application UI 106” and the “application UI 106 may be presented on the display screen 104 {outputting, to the client device, the multi-modal response...}” where the “content generator 110” can be “included in one or more productivity applications 112” or “a separate module that is communicatively integrated into one or more productivity applications 112 via an application programming interface (API)” and the application UI 106 is part of “productivity application 112”; Chepkwony, ¶ [0016]-[0017], [0056]).
Regarding claim 6, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses further comprising: receiving further NL based input associated with the client device (“a determination is made at decision operation 422 as to whether a [second] prompt is received” where “If a [second] prompt is received, the method 400 flows to operation 424 where a second text query is generated.”; Chepkwony, ¶ [0058]), the further NL based input indicative of a request for a modification to the generated set of slides (“As an example, selection of the expand option in association with a selected prospective slide 322 provides a prompt input to the content generator 110, such as an expand instruction” which can be received as natural language speech or text; Chepkwony, ¶ [0038], [0058]); generating a modified set of slides based on processing, using the LLM, the generated set of slides and the further NL based input (“When the expand option is selected or another prompt input is received, at operation 424, string content of the selected prospective slide 322 is extracted and included in a context object of a second text query and the prompt input is combined with the context object” where the system returns to operation 408 to repeat the listed operations from FIG. 4 including “generat[ing] one or more prospective slides 322 based on the text output 324 (and/or images) received in operation 410”; Chepkwony, ¶ [0054], [0058], FIG. 4); and causing the modified set of slides to be rendered at the client device of the user (As indicated previously, “graphical representations of the prospective slides 322” which in this case will include modified slides for the second prompt based on the generated set of slides “are presented in the application UI 106.”; Chepkwony, ¶ [0056], [0058], FIG. 4).
Regarding claim 7, the rejection of claim 6 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein the further NL based input is indicative of a request for a modification to one or more of the multimedia content items included in the generated set of slides (“If a [second] prompt is received, the method 400 flows to operation 424 where a second text query is generated” where “the generated [second] query may also be modified to also, or alternatively, request images.”; Chepkwony, ¶ [0053], [0058], FIG. 4); and wherein the modified set of slides are modified to include the modification to the one or more of the multimedia content items (The generated second query includes “string content of the selected prospective slide 322 is extracted and included in a context object of a second text query and the prompt input is combined with the context object,” where “the content generator 110 generates one or more prospective slides 322 based on the text output 324 (and/or images) received in operation 410.” As such, the selected prospective slide 322 is modified to include the requested images.; Chepkwony, ¶ [0055], [0058], FIG. 4).
Regarding claim 8, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein causing the multi-modal response to be rendered at the client device of the user comprises, for a given slide of the generated set of slides: causing textual content associated with the given slide to be visually rendered in a first portion of a graphical user interface (GUI) rendered on a display of the client device (Relying on the example provided in FIG. 3B, 322A-C represent depictions of three (3) slides generated by the method described in Chepkwony. As shown at 322b, a first section with textual content associated with the given slide is visually rendered (e.g., “Violet” “Faithfulness”) in a first portion of a GUI (the left side of the slide) on a display of a client device; Chepkwony, ¶ [0016], [0045], FIG. 3B); and causing multimedia content associated with the given slide to be rendered in a second portion of the GUI, (In the example provided in FIG. 3B, at 322b, a second section with multimedia content associated with the given slide is visually rendered (e.g., a visual depiction of a flower) in a second portion of a GUI (the right side of the slide) on a display of a client device; Chepkwony, ¶ [0016], [0045], FIG. 3B) wherein the first portion and the second portion form part of the given slide (The right side of the slide, having the multimedia content, and the left side of the slide, having the textual content, form part of said slide.; Chepkwony, ¶ [0045], FIG. 3B).
Regarding claim 11, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein generating the multi-modal response that is responsive to the NL based input further comprises: determining whether the multi-modal response should include the generated set of slides (“a determination is made as to whether a selection of an insert option 330 is received to insert the selected prospective slides 322 into the slide presentation document 222. When the insert option 330 is selected, the selected prospective slides 322 are generated and inserted into the slide presentation document 222 as new slides,” where the generated set of slides are understood as a defined object, by determining to change the constituents of the generated set of slides, the system has determined whether to include the generated set of slides.; Chepkwony, ¶ [0057]).
Regarding claim 12, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein generating the multi-modal response that is responsive to the NL based input further comprises: determining whether to generate a multi-modal response including both textual content and multimedia content (“the content generator 110 generates a one or more prospective slides 322 based on the text output 324 (and/or images) received in operation 410” where “In some implementations, the content generator 110 further includes a plurality of images 326 or other graphical elements in one or more prospective slides 322 where such images are included,” the inclusion of images {multimedia content} is not mandatory and said inclusion is determined based on the prompt input; Chepkwony, ¶ [0051], [0055]).
Regarding claim 13, the rejection of claim 12 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein generating the multi-modal response that is responsive to the NL based input further comprises: responsive to determining to generate a multi-modal response including both textual content and multimedia content (In the case “where such images are included,” “the content generator 110 further includes a plurality of images 326 or other graphical elements in one or more prospective slides 322”; Chepkwony, ¶ [0055]), determining whether the multimedia content should be generative multimedia content or non-generative multimedia content (“the postprocessor 206 uses the text output from the LLM 108 to include in one or more graphical elements (e.g., images, animations, graphs) in one or more prospective slides” where, in response to the text output, the postprocessor 206 determines to “obtain or generate graphical elements for the prospective slides” using “a search engine to obtain a photograph, clip art, or other type of image relevant to the text output” or using “the AI art generation model generates and provides an image relevant to text output of the LLM 108.”; Chepkwony, ¶ [0033], [0055]).
Regarding claim 14, Chepkwony discloses A method implemented by one or more processors (Systems and methods for providing new content generation as implemented in an example system 100 including a processing device; Chepkwony, ¶ [0014]), the method comprising: receiving natural language (NL) based input associated with a client device of a user (The system receives data communication 205 from the user which includes “a prompt input field 310 is provided in the application UI 106. The prompt input field 310 receives user input (referred to as prompt input 312), such as a typed or uttered phrase or individual key words”; Chepkwony, ¶ [0023]-[0024], [0041]), the NL based input being indicative of a request for assistance with completing a particular task (the user input is “corresponding to content wanted by the user for inclusion in new slides 304” where the particular task may be giving a presentation.; Chepkwony, ¶ [0041]); receiving a document including instructions associated with the particular task (“the preprocessor 202 further combines the context object {receiving a document...} with the received prompt input to generate a text query as input for the LLM 108” where “the context object is a data structure that includes information that can be used to understand context about the existing content of the slide presentation document 222 {...associated with the particular task}” which may be “prepended by an instruction string that indicates an action to be taken;” Chepkwony, ¶ [0025]-[0026], [0041]); generating a multi-modal response that is responsive to the NL based input (“prompt input 312 is received and combined with the extracted string content to form a text query” which is “transmitted to the LLM 108 as input,” where “LLM 108 generates a response including text output 324” and “the generated query” can further be modified to “request images”; Chepkwony, ¶ [0051], [0053]), the multi- modal response comprising assistive content for assisting the user in performing the particular task, (“text output 324” which is used by “the content generator 110” to generate “one or more prospective slides 322 based on the text output 324”; Chepkwony, ¶ [0054]-[0055]); generating a multi-modal response that is responsive to the NL based input (“prompt input 312 is received and combined with the extracted string content to form a text query” which is “transmitted to the LLM 108 as input,” where “LLM 108 generates a response including text output 324” and “the generated query” can further be modified to “request images”; Chepkwony, ¶ [0051], [0053]), the multi- modal response comprising assistive content for assisting the user in performing the particular task, (“text output 324” which is used by “the content generator 110” to generate “one or more prospective slides 322 based on the text output 324”; Chepkwony, ¶ [0054]-[0055]) wherein generating the multi-modal response that is responsive to the NL based input comprises: processing, using a large language model (LLM), LLM input to generate LLM output (To generate the prospective slides, “At operation 408, the text query is transmitted to the LLM 108 as input,” which is processed by the LLM 108 to “generate a response including text output 324 {LLM output}”; Chepkwony, ¶ [0054]), the LLM input including at least the NL based input and the document (The text input, as received by the LLM, is generated, in part, by “the preprocessor 202” which “combines the context object {the document} with the received prompt input to generate a text query as input for the LLM 108 {the LLM input...},” where the received prompt input includes the user input {...including at least the NL based input}; Chepkwony, ¶ [0026], [0054]); determining, based on the LLM output, textual content for inclusion in the multi-modal response (“the content generator 110 generates a one or more prospective slides 322 based on the text output 324 (and/or images) received in operation 410. In some examples, the text output 324 is parsed and separated into multiple prospective slides 322 based on one or more delimiters in the text output 324.”; Chepkwony, ¶ [0055]). However, Chepkwony fails to expressly recite wherein causing the multi-modal response to be rendered at the client device of the user comprises: causing the textual content to be rendered while the multimedia content is being obtained; and causing, responsive to the multimedia content being obtained, the multimedia content to be rendered.
The relevance of Yang is described above with relation to claim 1. Regarding claim 14, Yu teaches wherein causing the multi-modal response to be rendered at the client device of the user comprises: causing the textual content to be rendered (“Method 250 begins at processing operation 252, where a slide-based presentation deck is displayed (or rendered) in a GUI of an application or service,” which establishes that the underlying slide, which contains the aforementioned textual content, is actively rendered first.; Yu, ¶ [0066]) while the multimedia content is being obtained (“During presentation of a slide-based presentation deck, flow of method 250 may proceed to processing operation 254. At processing operation 254, the presentation feed management component may be configured to detect access to a slide, of the slide-based presentation deck, that comprises, within the slide, a GUI object configured to activate a live camera feed,” thus concurrent with the rendering of the text on the display, the multimedia content (in this example, a live camera feed) is being obtained. Of note, this can include the generation of AI content as well as retrieval of existing video content.; Yu, ¶ [0067]); and causing, responsive to the multimedia content being obtained, the multimedia content to be rendered (Using the retrieved multimedia content the system then renders “a representation of one or more live camera feeds into presentation content”; Yu, ¶ [0067]-[0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony to incorporate the teachings of Yu to include wherein causing the multi-modal response to be rendered at the client device of the user comprises: causing the textual content to be rendered while the multimedia content is being obtained; and causing, responsive to the multimedia content being obtained, the multimedia content to be rendered. The systems described in Yu allows for “dynamic management” of “presentation content including interactions with live camera feeds during a presentation that is being conducted in real-time (or near real-time),” thus allowing for rendering of content as received with simultaneous planning for positioning and size of live or on-demand content through GUI objects, which avoids the known problem of disruptions to other rendered content, as recognized by Yu. (Yu, ¶ [0067]-[0068]).
Regarding claim 22, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. However, Chepkwony fails to expressly recite wherein causing the multimedia content to be rendered responsive to the multimedia content being obtained comprises: causing, while the multimedia content is being obtained, a placeholder for where the multimedia content will be inserted into the multi-modal response to be rendered; and causing, responsive to the multimedia content being obtained, the multimedia content to replace the placeholder.
The relevance of Yang is described above with relation to claim 1. Regarding claim 22, Yu teaches wherein causing the multimedia content to be rendered responsive to the multimedia content being obtained comprises: causing, while the multimedia content is being obtained, a placeholder for where the multimedia content will be inserted into the multi-modal response to be rendered (“During presentation of a slide-based presentation deck” the system can “detect access to a slide, of the slide-based presentation deck, that comprises, within the slide, a GUI object configured to activate a live camera feed,” thus concurrent with the rendering of the text on the display, including the GUI object, the multimedia content (in this example, a live camera feed) is being obtained, and said multimedia content will replace the GUI object, based on the size and shape of said GUI object.; Yu, ¶ [0067]-[0068]); and causing, responsive to the multimedia content being obtained, the multimedia content to replace the placeholder (The “size and/or formatting of a live camera feed” is modified “to match the attributes of a GUI object as presented in a slide-based template for a displayed slide” and the live camera feed is presented in the slide, upon being obtained (retrieved, formatted, edited, etc.).; Yu, ¶ [0068]-[0069]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony to incorporate the teachings of Yu to include wherein causing the multimedia content to be rendered responsive to the multimedia content being obtained comprises: causing, while the multimedia content is being obtained, a placeholder for where the multimedia content will be inserted into the multi-modal response to be rendered; and causing, responsive to the multimedia content being obtained, the multimedia content to replace the placeholder. The systems described in Yu allows for “dynamic management” of “presentation content including interactions with live camera feeds during a presentation that is being conducted in real-time (or near real-time),” thus allowing for rendering of content as received with simultaneous planning for positioning and size of live or on-demand content through GUI objects, which avoids the known problem of disruptions to other rendered content, as recognized by Yu. (Yu, ¶ [0067]-[0068]).
Regarding claim 23, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein the NL based input is received via a software application accessible via the client device (The system receives data communication 205 from the user which includes “a prompt input field 310 is provided in the application UI 106. The prompt input field 310 receives user input (referred to as prompt input 312), such as a typed or uttered phrase or individual key words”; Chepkwony, ¶ [0023]-[0024], [0041]). However, Chepkwony fails to expressly recite wherein the software application differs from the presentation application.
The relevance of Yang is described above with relation to claim 1. Regarding claim 22, Yu teaches wherein the NL based input is received via a software application accessible via the client device (Discloses “natural language understanding processing including transcribing audio signals received from users/participants of a user communication” as received via the “user computing device 102” through an “application service component 104” for receiving user input.; Yu, ¶ [0030], [0045], [0075]), and wherein the software application differs from the presentation application (“a representation of presentation content” being generated using a first application (e.g., “utilizing a presentation application or service to create/design presentation content”) “may be rendered in a GUI window of another application/service such as a collaborative communication application or service that is used to execute a user communication such as an electronic meeting.”; Yu, ¶ [0027], [0066]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony to incorporate the teachings of Yu to include wherein the software application differs from the presentation application. The systems described in Yu allows for “dynamic management” of “presentation content including interactions with live camera feeds during a presentation that is being conducted in real-time (or near real-time),” specifically including the modification of the presentation in a first software and the presentation of the presentation in a second application, thus allowing for rendering of content as received with simultaneous planning for positioning and size of live or on-demand content through GUI objects, which avoids the known problem of disruptions to other rendered content, while simultaneously allowing for these modifications in real-time to avoid disruption to an ongoing presentation, as recognized by Yu. (Yu, ¶ [0067]-[0068]).
Claims 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chepkwony and Yu as applied to claim 1 above, and further in view of Yang.
Regarding claim 9, the rejection of claim 1 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein the multi-modal response includes, for a given slide of the generated set of slides, given textual content and/or given multimedia content to be included on the given slide when it is presented (Relying on the example provided in FIG. 3B, 322A-C represent depictions of three (3) slides generated by the method described in Chepkwony. As shown at 322b, a first section with textual content associated with the given slide is visually rendered (e.g., “Violet” “Faithfulness”) in a first portion of a GUI (the left side of the slide), and a second section with multimedia content associated with the given slide is visually rendered (e.g., a visual depiction of a flower) in a second portion of a GUI (the right side of the slide), each of which are included on the given slide when it is presented using the described productivity application.; Chepkwony, ¶ [0016], [0037], [0045], FIG. 3B). However, Chepkwony fails to expressly recite given additional textual content which is not included on the given slide when it is presented.
Yang teaches systems and methods for “generating presentation content using AI-generated and/or non-AI-generated content.” (Yang, ¶ [0019]). Regarding claim 9, Yang teaches given additional textual content which is not included on the given slide when it is presented (“The prompts model 575 can be used to provide the presenter with prompts that can be included in the notes section of the slides, such as those shown in FIGS. 4A-4G,” where text in the notes section of a presentation application are not included on a given slide when it is presented.; Yang, ¶ [0073]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony to incorporate the teachings of Yang to include given additional textual content which is not included on the given slide when it is presented. The generated presentation content of Yang allows for automated “drafting textual content, selecting imagery, and laying out this textual content and imagery,” which can provide “a compelling and interesting content for the intended audience of the presentation,” and which reduces the prior art burdens of manual distillation of content and the known struggle to find appropriate content and imagery, as recognized by Yang. (Yang, ¶ [0001], [0016]).
Regarding claim 10, the rejection of claim 9 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. Chepkwony further discloses wherein causing the multi-modal response to be rendered at the client device of the user comprises, for a given slide of the generated set of slides: causing the given textual content associated with the given slide to be visually rendered in a first portion of a graphical user interface (GUI) rendered on a display of the client device (Relying on the example provided in FIG. 3B, 322A-C represent depictions of three (3) slides generated by the method described in Chepkwony. As shown at 322b, a first section with textual content associated with the given slide is visually rendered (e.g., “Violet” “Faithfulness”) in a first portion of a GUI (the left side of the slide) on a display of a client device; Chepkwony, ¶ [0016], [0045], FIG. 3B); and causing the given multimedia content associated with the given slide to be rendered in a second portion of the GUI, (In the example provided in FIG. 3B, at 322b, a second section with multimedia content associated with the given slide is visually rendered (e.g., a visual depiction of a flower) in a second portion of a GUI (the right side of the slide) on a display of a client device; Chepkwony, ¶ [0016], [0045], FIG. 3B) wherein the first portion and the second portion form part of the given slide (The right side of the slide, having the multimedia content, and the left side of the slide, having the textual content, form part of said slide.; Chepkwony, ¶ [0045], FIG. 3B). However, Chepkwony fail(s) to expressly recite and causing the given additional textual content associated with the given slide to be visually rendered in a third portion of the GUI, wherein the third portion is distinct from both the first portion of the GUI and the second portion of the GUI.
The relevance of Yang is described above with relation to claim 9. Regarding claim 10, Yang teaches and causing the given additional textual content associated with the given slide to be visually rendered in a third portion of the GUI, (“The prompts model 575 can be used to provide the presenter with prompts that can be included in the notes section of the slides, such as those shown in FIGS. 4A-4G,” where text in the notes section of a presentation application are not included on a given slide when it is presented.; Yang, ¶ [0073]) wherein the third portion is distinct from both the first portion of the GUI and the second portion of the GUI (As shown in FIG. 4A, using a typical presentation application layout, the third portion, labeled here as “Notes”, is a portion of the GUI which is outside of the slide itself (which corresponds to reference numeral 415). This area is both distinct from the left side and right side of the slide and is not shown in a presentation application when in presentation mode.; Yang, ¶ [0073], FIG. 4A).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony to incorporate the teachings of Yang to include and causing the given additional textual content associated with the given slide to be visually rendered in a third portion of the GUI, wherein the third portion is distinct from both the first portion of the GUI and the second portion of the GUI. The generated presentation content of Yang allows for automated “drafting textual content, selecting imagery, and laying out this textual content and imagery,” which can provide “a compelling and interesting content for the intended audience of the presentation,” and which reduces the prior art burdens of manual distillation of content and the known struggle to find appropriate content and imagery, as recognized by Yang. (Yang, ¶ [0001], [0016]).
Claims 16-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chepkwony as applied to claim 14 above, and further in view of Ferrucci (U.S. Pat. App. Pub. No. 2022/0261817, hereinafter Ferrucci).
Regarding claim 16, the rejection of claim 14 is incorporated. Chepkwony and Yu disclose all of the elements of the current invention as stated above. However, Chepkwony fail(s) to expressly recite wherein receiving the document including the instructions associated with the particular task comprises: generating, based on the NL based input, a search query that includes a request for the document including the instructions associated with the particular task; submitting, to one or more search systems, the search query; and in response to submitting the search query to the one or more search systems: receiving the document including the instructions associated with the particular task.
Ferrucci teaches “a collaborative user support system including a user portal and domain models to receive support requests, render visual aids, and provide suggestions.” (Ferrucci, ¶ [0020]). Regarding claim 16, Ferrucci teaches wherein receiving the document including the instructions associated with the particular task comprises: generating, based on the NL based input, a search query that includes a request for the document including the instructions associated with the particular task (“The semantic search engine 120 may perform a search in an associated domain text corpus,” using a generated search query, where the search query “may include keyword(s) (e.g., the input components and/or relations between components) search in documentations and passages for terms beyond explicit keyword(s) and may include search for terms based on semantic similarity to the keyword(s)” where the documentations and passages are documents including instructions associated with the particular task; Ferrucci, ¶ [0049]); submitting, to one or more search systems, the search query (Though described in combination, it is understood that the semantic search engine 120 receives a search query and performs a search in light of said search query.; Ferrucci, ¶ [0049]); and in response to submitting the search query to the one or more search systems: receiving the document including the instructions associated with the particular task (“The semantic search engine 120 may output search results, including one or more evidentiary passages” such as a “passage in a user manual”; Ferrucci, ¶ [0034], [0049]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony, as modified by the presentation content management systems of Yu, to incorporate the teachings of Ferrucci to include wherein receiving the document including the instructions associated with the particular task comprises: generating, based on the NL based input, a search query that includes a request for the document including the instructions associated with the particular task; submitting, to one or more search systems, the search query; and in response to submitting the search query to the one or more search systems: receiving the document including the instructions associated with the particular task. The support systems of Ferrucci are “able to engage in complex problem-solving tasks,” including tasks that require “taking into account relevant information about a customer's individual goals and circumstances” which overcomes deficiencies of machine agents in providing useful assistance based on specific user circumstances, and reduces the need for secondary assistance to resolve the initial problem or achieve the initial goal, as recognized by Ferrucci. (Ferrucci, ¶ [0006], [0022]).
Regarding claim 17, the rejection of claim 16 is incorporated. Chepkwony, Yu, and Ferrucci disclose all of the elements of the current invention as stated above. However, Chepkwony fail(s) to expressly recite wherein the LLM is associated with a first-party entity, and wherein the one or more search systems are also associated with the first-party entity.
The relevance of Ferrucci is described above with relation to claim 15. Regarding claim 17, Ferrucci teaches wherein the LLM is associated with a first-party entity (The system includes a generative model, where generative models includes generative language models (e.g., LLMs) and as the generative model is included in the system, it is also associated with the first party entity; Ferrucci, ¶ [0105]), and wherein the one or more search systems are also associated with the first-party entity (In the described embodiment, the “semantic search engine 120 may perform a search in an associated domain text corpus” where the semantic search engine is part of the system which includes the generative model {associated with the first-party entity}; Ferrucci, ¶ [0049], [0105]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony, as modified by the presentation content management systems of Yu, to incorporate the teachings of Ferrucci to include wherein the LLM is associated with a first-party entity, and wherein the one or more search systems are also associated with the first-party entity. The support systems of Ferrucci are “able to engage in complex problem-solving tasks,” including tasks that require “taking into account relevant information about a customer's individual goals and circumstances” which overcomes deficiencies of machine agents in providing useful assistance based on specific user circumstances, and reduces the need for secondary assistance to resolve the initial problem or achieve the initial goal, as recognized by Ferrucci. (Ferrucci, ¶ [0006], [0022]).
Regarding claim 18, the rejection of claim 16 is incorporated. Chepkwony and Ferrucci disclose all of the elements of the current invention as stated above. However, Chepkwony fail(s) to expressly recite wherein the LLM is associated with a first-party entity, wherein the one or more search systems are associated with the third-party entity, and wherein the third-party entity is distinct from the first-party entity.
The relevance of Ferrucci is described above with relation to claim 15. Regarding claim 18, Ferrucci teaches wherein the LLM is associated with a first-party entity, (The system includes a generative model, where generative models includes generative language models (e.g., LLMs) and as the generative model is included in the system, it is also associated with the first party entity; Ferrucci, ¶ [0105]) wherein the one or more search systems are associated with the third-party entity (“The search may include keyword(s) (e.g., the input search concept and/or relations between concepts) search in documentations and passages, web search, and embedded search for terms beyond explicit keywords” where the “web search” is not a part of or related to the described system and thus is understood as a third party entity.; Ferrucci, ¶ [0075], [0105]), and wherein the third-party entity is distinct from the first-party entity (As the third party entity is understood as all entities which are not the first party entity, the third party entity is necessarily distinct from the first party entity.; Ferrucci, ¶ [0075], [0105]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony, as modified by the presentation content management systems of Yu, to incorporate the teachings of Ferrucci to include wherein the LLM is associated with a first-party entity, wherein the one or more search systems are associated with the third-party entity, and wherein the third-party entity is distinct from the first-party entity. The support systems of Ferrucci are “able to engage in complex problem-solving tasks,” including tasks that require “taking into account relevant information about a customer's individual goals and circumstances” which overcomes deficiencies of machine agents in providing useful assistance based on specific user circumstances, and reduces the need for secondary assistance to resolve the initial problem or achieve the initial goal, as recognized by Ferrucci. (Ferrucci, ¶ [0006], [0022]).
Claim(s) 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chepkwony and Yu as applied to claim 1, and further in view of Shahinian.
Regarding claim 21, the rejection of claim 1 is incorporated. Chepkwony and Yu discloses all of the elements of the current invention as stated above. Chepkwony further discloses wherein determining the textual content for inclusion in the multi-modal response comprises: identifying, based on processing the LLM input using the LLM, a first portion of information from the document to be included as visible textual content on a given slide (“prompt input 312 is received and combined with the extracted string content to form a text query” which is “transmitted to the LLM 108 as input,” where “where the preprocessor 202 receives the prompt input and extracts existing data from the slide presentation document 222” such as “string content from the selected slides in the slide presentation document 222” which is used by “LLM 108” to “generate a response including text output 324”; Chepkwony, ¶ [0025], [0051], [0053]). However, Chepkwony fail(s) to expressly recite identifying, based on processing the LLM input using the LLM, a second portion of information from the document to be included as speaker notes associated with the given slide wherein the speaker notes are distinct from the visible textual content.
Shahinian teaches systems and methods for “applying machine learning to presentation software applications for extracting content from slides.” (Shahinian, ¶ [0002]). Regarding claim 21, Shahinian teaches identifying, based on processing the LLM input using the LLM, a first portion of information from the document to be included as visible textual content on a given slide; and identifying, based on processing the LLM input using the LLM, a second portion of information from the document to be included as speaker notes associated with the given slide, (“The slide generation skill 1654 updates the actual slide during new presentation generation” as generated using “generative AI technology,” where generative AI is understood in the relevant art as including large language models, in response to the “user’s input {based on processing the LLM input},” which is used to “recommend improvements to the wording used in a slide” as well as “paraphrasing and grammar checks for the content on each slide” which are incorporated based on user approval, and “changes from original content to updated content are documented in the notes section {included in the speaker notes} of the new slide {...of a given slide},” where the notes section of a slide is understood in the art as the presenter or speaker notes, and the analysis of changes between the slides is a second portion of information from the document (as the changes are derived from the document).; Shahinian, ¶ [0095], [0099], [0120], [0138]) wherein the speaker notes are distinct from the visible textual content (An analysis indicating the changes to the text of a slide is inherently distinct from the actual changes made.; Shahinian, ¶ [0095], [0099], [0120], [0138]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the productivity content generation systems of Chepkwony to incorporate the teachings of Shahinian to include identifying, based on processing the LLM input using the LLM, a second portion of information from the document to be included as speaker notes associated with the given slide wherein the speaker notes are distinct from the visible textual content. Shahinian discloses systems and methods for extracting information including analysis regarding the changes made to the slide provided as speaker notes, which provides the known benefit of aiding in tracking the number of changes and provides logical context for the presenter/speaker in determining how to present the newly modified slides, as recognized by Shahinian. (Shahinian, ¶ [0137]-[0138]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Sean E Serraguard/Primary Examiner, Art Unit 2657