DETAILED ACTION
This office action is responsive to communication(s) filed on 3/1/2024.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Restriction to one of the following inventions is required under 35 U.S.C. 121:
1. Claims 1-14, drawn to a method and medium, classified in G06F40/00.
2. Claims 15-20, drawn to a system, classified in G06N3/02.
The inventions are independent or distinct, each from the other because:
Inventions 1 and 2 are directed to related methods/functions. The related inventions are distinct if: (1) the inventions as claimed are either not capable of use together or can have a materially different design, mode of operation, function, or effect; (2) the inventions do not overlap in scope, i.e., are mutually exclusive; and (3) the inventions as claimed are not obvious variants. See MPEP § 806.05(j).
In the instant case, the inventions as claimed have a different mode and effect of generating a document summary using a summarizer model, wherein the summary includes “a generated background imagery based on the text summary, and wherein the multimedia summary document includes at least a portion of the text summary which is placed within the multimedia summary document based on the structured representation of the text summary” (Group 1) and generating a document summary using a machine learning model, wherein the summary includes “wherein the multimedia summary document is generated based on the summary page generation type, the image generation prompt, and the structured representation”.
The Inventions are also mutually exclusive. Namely, Group 1 includes “generating, using a diffusion model” and “wherein the multimedia summary document includes a generated background imagery based on the text summary, and wherein the multimedia summary document includes at least a portion of the text summary which is placed within the multimedia summary document based on the structured representation of the text summary” with Group 2 doesn’t require. And Group 2 requires “a user selection of a summary page generation type” a “first” and “second machine learning model”, and the generating of a summary “wherein the multimedia summary document is generated based on the summary page generation type, the image generation prompt, and the structured representation” which Group 1 doesn’t require.
Furthermore, the inventions as claimed do not encompass overlapping subject matter and there is nothing of record to show them to be obvious variants.
Restriction for examination purposes as indicated is proper because all the inventions listed in this action are independent or distinct for the reasons given above and there would be a serious search and/or examination burden if restriction were not required because one or more of the following reasons apply:
Separate classification thereof: As shown above, Group 1 is classified in G06F40/00 and Group 2 is classified in G06N3/02. This shows that each invention has attained recognition in the art as a separate subject for inventive effort, and also a separate field of search. Patents need not be cited to show separate classification.
A different field of search: As mentioned above, separate classifications also show a separate field of search. However, even if they were not classified together, examining both groups would require employing different search queries to find the mutually exclusive limitations mentioned above. Where it is necessary to search for one of the inventions in a manner that is not likely to result in finding art pertinent to the other invention(s) (e.g., searching different classes/subclasses or electronic resources, or employing different search queries, a different field of search is shown), even though the two are classified together. The indicated different field of search must in fact be pertinent to the type of subject matter covered by the claims. Patents need not be cited to show different fields of search.
Applicant is advised that the reply to this requirement to be complete must include (i) an election of an invention to be examined even though the requirement may be traversed (37 CFR 1.143) and (ii) identification of the claims encompassing the elected invention.
The election of an invention may be made with or without traverse. To reserve a right to petition, the election must be made with traverse. If the reply does not distinctly and specifically point out supposed errors in the restriction requirement, the election shall be treated as an election without traverse. Traversal must be presented at the time of election in order to be considered timely. Failure to timely traverse the requirement will result in the loss of right to petition under 37 CFR 1.144. If claims are added after the election, applicant must indicate which of these claims are readable upon the elected invention.
Should applicant traverse on the ground that the inventions are not patentably distinct, applicant should submit evidence or identify such evidence now of record showing the inventions to be obvious variants or clearly admit on the record that this is the case. In either instance, if the examiner finds one of the inventions unpatentable over the prior art, the evidence or admission may be used in a rejection under 35 U.S.C. 103 or pre-AIA 35 U.S.C. 103(a) of the other invention.
During a telephone conversation with attorney of record Matthew Rojanakiathavorn on 10/24/2025, a provisional election was made with traverse to prosecute the invention of Group 1, claims 1-14. Affirmation of this election must be made by applicant in replying to this Office action. Claims 15-20 are withdrawn from further consideration by the examiner, 37 CFR 1.142(b), as being drawn to a non-elected invention.
Applicant is reminded that upon the cancelation of claims to a non-elected invention, the inventorship must be corrected in compliance with 37 CFR 1.48(a) if one or more of the currently named inventors is no longer an inventor of at least one claim remaining in the application. A request to correct inventorship under 37 CFR 1.48(a) must be accompanied by an application data sheet in accordance with 37 CFR 1.76 that identifies each inventor by his or her legal name and by the processing fee required under 37 CFR 1.17(i).
Claims Status
Claims 1-20 are pending, of which Claims 1-14 are currently being examined.
Claims 1, 9 and 15 are independent.
Claims 15-20 are withdrawn for being directed to a non-elected invention.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 5, 7, 9 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gourley; Sean William-Joseph et al. (hereinafter Gourley – US 20190130031 A1) in view of Zhang; Lvmin et al. (hereinafter Zhang, Non-Patent Literature [NPL], “Adding Conditional Control to Text-to-Image Diffusion Models”, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3836-3847).
Independent Claim 1:
The rejection of claim 1 is incorporated. Gourley teaches:
A method comprising:
receiving a text document; (an object, such as text news story, a document, etc., about an event are obtained [receiving] by summary service 110, ¶ 17 and fig. 1),
generating, by a document summarizer model, a text summary based on the text document and a structured representation of the text summary; (a text based summary of the event is provided [generating], ¶¶ 22-24. The summary is generated based on characteristics [structured representation of the text summary] of objects that are determined to identify an event(s) that is being summarized, Par 27. This process involves an intermediate, underlying logical step of distilling the original text into its core components [structured representation] before rendering the final output in the desired format.)
[…];
and generating […] a multimedia summary document corresponding to the text document, (“This summary may include data points and other information derived from the data objects associated with the event, wherein the summary may comprise a text based summary, a graph based summary, an image based summary, or some other type of summary including combinations thereof”, ¶ 36. That is, the summary may include a text based summary combined with an image based summary, that is, a multimedia summary document)
wherein the multimedia summary document includes a […] background imagery […], (supplemental sources are used to provide background information for the event, ¶ 20, and this information can include images [background imagery], ¶ 60, as mentioned above, these images can be combined with the text summary)
and wherein the multimedia summary document includes at least a portion of the text summary which is placed within the multimedia summary document (¶ 36. That is, the summary may include a text based summary combined with an image based summary, that is, a multimedia summary document)
based on the structured representation of the text summary. (textual summary is placed/inserted into the mixed output based on the identified event(s), which is/are identified based characteristics of the objects [structured representation], ¶ 27)
Gourley does not appear to expressly teach, but Zhang teaches:
generating, by a prompt generator, an image generation prompt based on the text summary and the structured representation of the text summary (a diffusion model is used to transform text inputs [based on the text summary] into latent vectors [structured representation of the text summary] to generate state-of-the art images, Page 3838. It was well within the capabilities of a person having ordinary skill in the art to have realized that the textual summary of Gourley may be used as the text input for generating the image based portions of the summary.).
that the generating of the multimedia summary document is done using “using a diffusion model and the image generation prompt” (the features are achieved by a diffusion model in a neural network that uses the vector form of the text as input [prompt] for a machine learning model for generating and outputting images, Pages 3836-3838)
that the background imagery is a “generated” background imagery “based on the text summary” (a machine learning model output images based on the vector representation of the text, Pages 3837-3838)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to modify the method of Gourley to include generating, by a prompt generator, an image generation prompt based on the text summary and the structured representation of the text summary, that the generating of the multimedia summary document is done using “using a diffusion model and the image generation prompt”, that the background imagery is a “generated” background imagery “based on the text summary”, as taught by Zhang.
One would have been motivated to make such a combination in order to improve efficiency and quality of the method by generating high-quality images and reducing the computation cost of generating the images, Zhang Page 3836.
Claim 5:
The rejection of claim 1 is incorporated. Gourley-Zhang further teaches:
wherein generating, using the diffusion model and the image generation prompt, the multimedia summary document corresponding to the text document, further comprises:
generating a canvas using the structured representation; (as explained above, Zhang teaches that the diffusion model is used to transform text inputs [based on the text summary] into latent vectors [structured representation of the text summary] to generate state-of-the art images, Page 3838, and Gourley teaches that the summary may include a text based summary combined with an image based summary, that is, a multimedia summary document, as explained above. The use of a diffusion model to transform textual content into latent vectors, which then generates an image based summary combined with the text based summary reflects generating a canvas using a structured representation, e.g., in the background workspace representation of a canvas wherein the combination of the two occurs, because the latent vector serves as a compressed, organized blueprint of the text's meaning, which then systematically guides the image generation process)
and generating, by the diffusion model, the multimedia summary document using the canvas and the image generation prompt. (as explained above, the use of a diffusion model to transform textual content into latent vectors, which then generates an image based summary combined with the text based summary reflects generating a canvas using a structured representation, e.g., in the background workspace representation of a canvas wherein the combination of the two occurs, because the latent vector serves as a compressed, organized blueprint of the text's meaning, which then systematically guides the image generation process. This image generation occurs based on image generation prompt, as explained above)
Claim 7:
The rejection of claim 5 is incorporated. Gourley-Zhang further teaches:
wherein the diffusion model is a ControlNet diffusion model. (Zhang Page 3836)
Independent Claim 9:
Claim(s) 9 is directed to a medium for accomplishing the steps of the method in claim 1, and is rejected using similar rationale(s).
Claim 13:
The rejection of claim 9 is incorporated. Claim(s) 13 is directed to a medium for accomplishing the steps of the method in claim 5, and is rejected using similar rationale(s).
Claim(s) 2 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gourley (US 20190130031 A1) in view of Zhang (NPL, “Adding Conditional Control to Text-to-Image Diffusion Models”), as applied to claims 1 and 9 above, and further in view of Amit; Aviel et al. (hereinafter Amit – US 20060103667 A1).
Claim 2:
The rejection of claim 1 is incorporated. Gourley-Zhang further teaches:
wherein generating, using the diffusion model and the image generation prompt, the multimedia summary document corresponding to the text document, further comprises:
generating, by the diffusion model, the generated background imagery using the image generation prompt; (efficient generating of images using the diffusion model, as explained above for claim 1)
Gourley-Zhang does not appear to expressly teach, but Amit teaches:
determining, using a genetic algorithm, a position of the portion of the text summary; (a layout [position] of visual media, ¶ 1, is determined/selected using a using a selection engine such as a genetic algorithm, ¶ 36, and includes dynamic templates, ¶ 154, or consistency of the design, ¶ 170)
determining a font color of the portion of the text summary; (layout templates also include font style and size information, text color information [font color], white space or gutter color information, gutter size information, background color information, ¶ 169)
determining a font style of the portion of the text summary; (layout templates also include font style and size information, text color information [font color], white space or gutter color information, gutter size information, background color information, ¶ 169)
and determining a font size of the portion of the text summary (layout templates also include font style and size information [font size], text color information [font color], white space or gutter color information, gutter size information, background color information, ¶ 169).
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to further modify the method of Gourley to include determining, using a genetic algorithm, a position of the portion of the text summary; determining a font color of the portion of the text summary; determining a font style of the portion of the text summary; and determining a font size of the portion of the text summary, as taught by Amit.
One would have been motivated to make such a combination in order to improve the flexibility and functionalities of the method by including the ability to consistent, dynamic media layout in an easy manning that preserve the creative concept of the designer, Amit ¶¶ 14, 18 and 170.
Claim 10:
The rejection of claim 9 is incorporated. Claim(s) 10 is directed to a medium for accomplishing the steps of the method in claim 2, and is rejected using similar rationale(s).
Claim(s) 3 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gourley (US 20190130031 A1) in view of Zhang (NPL, “Adding Conditional Control to Text-to-Image Diffusion Models”), and Amit (US 20060103667 A1), as applied to claims 2 and 10 above, and further in view of Maruo; Akito et al. (hereinafter Maruo – US 20220180210 A1).
Claim 3:
The rejection of claim 2 is incorporated. Gourley-Zhang-Amit does not appear to expressly teach, but Maruo teaches:
wherein the genetic algorithm minimizes an energy function […]. (setting values of parameters in a genetic algorithm to appropriate values to that provides the lowest energy value and represent the result of the optimization, ¶¶ 135 and 267)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to further modify the method of Gourley to include wherein the genetic algorithm minimizes an energy function, as taught by Maruo.
One would have been motivated to make such a combination in order to improve the efficiency of the method by optimizing selection of the layout, Maruo ¶ 8 and Amit ¶¶ 36, 154 and 170.
Gourley-Zhang-Amit-Maruo does not appear to expressly teach, but Official Notice teaches:
that the parameters optimized are a visual saliency loss, an alignment loss, an overlap loss, and a reading order loss (the examiner takes Official Notice that each of these parameters of visual saliency, alignment, overlap and reading order are well-known parameters of document design. It was well within the capabilities of a person having ordinary skill in the art to have realized that the optimization discussed above could be used in combination with any known design parameter that a designer desires to optimize).
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to further modify the method of Gourley to include that the parameters optimized are a visual saliency loss, an alignment loss, an overlap loss, and a reading order loss, as taught by Official Notice.
One would have been motivated to make such a combination in order to improve the practicality and flexibility of the method to apply layout optimization/selection based on any known parameter(s) or combinations thereof.
Claim 11:
The rejection of claim 10 is incorporated. Claim(s) 11 is directed to a medium for accomplishing the steps of the method in claim 3, and is rejected using similar rationale(s).
Claim(s) 4 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gourley (US 20190130031 A1) in view of Zhang (NPL, “Adding Conditional Control to Text-to-Image Diffusion Models”), Amit (US 20060103667 A1) and Maruo (US 20220180210 A1), as applied to claims 3 and 11 above, and further in view of Dontcheva; Lubomira A. et al. (hereinafter Dontcheva – US 20130073952 A1).
Claim 4:
The rejection of claim 3 is incorporated. Gourley-Zhang-Amit-Maruo does not appear to expressly teach, but Dontcheva teaches:
wherein the reading order loss is based on an order of text elements included in the structured representation. (a method includes applying a reading order algorithm to an arrangement of textual elements in order to provide an optimized path/order of textual elements, Dontcheva Claims 4 and 6)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to further modify the method of Gourley to include wherein the reading order loss is based on an order of text elements included in the structured representation, as taught by Dontcheva.
One would have been motivated to make such a combination in order to further improve the usability of the method by improving the readability of the textual elements, Dontcheva Claim 4.
Claim 12:
The rejection of claim 11 is incorporated. Claim(s) 12 is directed to a medium for accomplishing the steps of the method in claim 4, and is rejected using similar rationale(s).
Claim(s) 6 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gourley (US 20190130031 A1) in view of Zhang (NPL, “Adding Conditional Control to Text-to-Image Diffusion Models”), as applied to claims 1 and 9 above, and further in view of Kneubuehler; Dario et al. (hereinafter Kneubuehler – US 20250053619 A1).
Claim 6:
The rejection of claim 5 is incorporated. Gourley-Zhang does not appear to expressly teach, but Kneubuehler teaches:
wherein the diffusion model is trained using a triplet dataset, (An optimization method, ¶¶ 19 and 144, may include a combine different types of classifiers, such as diffusion and triplet loss based neural network classifiers, ¶ 54. Wherein the triplets are used to train the neural network, ¶ 145. The entire process described is called "triplet loss" and is a supervised method for training models to learn embeddings. A triplet loss is a loss function for machine learning algorithms where a reference input (called anchor) is compared to a matching input (called positive) and a non-matching input (called negative). The distance from the anchor to the positive is minimized, and the distance from the anchor to the negative input is maximized. Discussion of the anchor, positive and negative is reflected in, e.g., ¶¶ 145-146 and 150-151)
wherein the triplet dataset includes a training canvas, a training prompt, and a training summary page. (it was well within the capabilities of a person having ordinary skill in the art to have realized that in applying Kneubuehler to Gourley-Zhang, Kneubuehler ¶ 145, an algorithm may be trained based on the anchor images [training canvas], corresponding training prompt [text/keywords] or summary page [text/keywords] that are semantically related [positive] to the canvas [images], and different, unrelated [negative] training prompt [text/keywords] or summary page [text/keywords] that is not associated with the anchor canvas [images].)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to further modify the method of Gourley to include wherein the diffusion model is trained using a triplet dataset, wherein the triplet dataset includes a training canvas, a training prompt, and a training summary page, as taught by Kneubuehler.
One would have been motivated to make such a combination in order to improve the effectiveness of the method because by training on triplet loss, the neural network learns robust and discriminative features of meshes of objects and are effective at generating embeddings that can be utilized for similarity measurements, Kneubuehler ¶ 152.
Claim 14:
The rejection of claim 13 is incorporated. Claim(s) 14 is directed to a medium for accomplishing the steps of the method in claim 6, and is rejected using similar rationale(s).
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gourley (US 20190130031 A1) in view of Zhang (NPL, “Adding Conditional Control to Text-to-Image Diffusion Models”), as applied to claims 1 and 9 above, and further in view of Li; Ziliu et al. (hereinafter Li – US 20210374398 A1).
Claim 8:
The rejection of claim 1 is incorporated. Gourley, as explained for claim 1, teaches that the summary is generated based on characteristics [structured representation of the text summary] of objects that are determined to identify an event(s) that is being summarized, ¶ 27. Furthermore, these characteristics can be represented in the form of a webpage, see ¶ 20 and Gourley Claim 6.
Still, Gourley-Zhang does not appear to expressly teach, but Li teaches:
wherein the structured representation is an HTML format including tags corresponding to text content of the text document. (Conventionally, at least with respect to webpages, HTML tags assigned to text of a webpage have been employed in connection with identifying different sections of the webpage, ¶ 3).
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to further modify the method of Gourley to include wherein the structured representation is an HTML format including tags corresponding to text content of the text document, as taught by Li.
One would have been motivated to make such a combination in order to improve the practicality of the method by including webpages in a well-known format, including tags correspond to text content, Li ¶ 3, Gourley ¶ 20 and Gourley Claim 6.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Below is a list of these references, including why they are pertinent:
Sewak; Mohit et al. US 20200097569 A1, is pertinent to claim 1 for disclosing methods and systems for generating cognitive real-time pictorial summary scenes, Abstract.
Amid; David et al. US 20140136460 A1, is pertinent to claims 2 and 6 for disclosing a method of a visualizing multi objective designs, such as Pareto optimal solutions, which comply with a plurality of objectives in an objective space in a single presentation, according to some embodiments of the present invention, ¶ 8 and fig. 1.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GABRIEL S MERCADO whose telephone number is (408)918-7537. The examiner can normally be reached Mon-Fri 8am-5pm (Eastern Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Ell can be reached at (571) 270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Gabriel Mercado/Primary Examiner, Art Unit 2171