DETAILED ACTION
A. This action is in response to the following communications: Amendment filed 01/20/2026 . This action is made Final.
B. Claims 1-20 remain pending.
C. 35 USC 101 and 102 withdrawn due to amendment and remarks.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jindal, Nipun et al. (US Pub. 2025/0061610 A1), herein referred to as “Jindal” in view of Glenn E. Sugden, (US Pub. 2025/0095222 A1), herein referred to as “Sugden”.
As for claims 1,12 and 20, Jindal teaches. A data processing system and corresponding method of claim 12 increasing a design template library supporting a design recommendation feature in a productivity application comprising: a processor, and a memory storing executable instructions which, when executed by the processor, cause the processor alone or in combination with other processors to perform the following functions (par. 28 hardware and software environment for implementing text to image generation):
based on a list of design purposes, generate prompts requesting a Large Language Model (LLM) (par. 56 LLM examples) to produce corresponding prompts for input to a text-to-image model to generate a proposed design corresponding to each design purpose (par. 26 utilize both a character-level encoder and a prompt encoder. Embodiments extract scene text from a text prompt and encode the scene text at the character level, and combine this character-level encoding with the prompt encoding to condition image generation; par. 55 using Contrastive Language-Image Pre-Training (CLIP) which is a neural network that is trained to efficiently learn visual concepts from natural language supervision; thus concepts are equivalent to design purposes; par. 76 gives examples of building/creating/storing image features or “designs” these features are saved in the model for lookup to find designs related to user text prompts in the future: “ For example, during training, guided latent diffusion model 500 may take an original image 505 in a pixel space 510 as input and apply and image encoder 515 to convert original image 505 into original image features 520 in a latent space 525. Then, a forward diffusion process 530 gradually adds noise to the original image features 520 to obtain noisy features 535 (also in latent space 525) at various noise levels.”);
submit the prompts from the LLM to the text-to-image model (par. 67 image generation through text prompts);
receive the proposed designs from the text-to-image model (par. 67 one example of proposed design is to render scene text to an image; an image that includes the scene text based on the prompt embedding and the character-level embedding);
removing text generated by the text-to-image model from within the plurality of proposed designs in an image separation pipeline by: using an Optical Character Recognition (OCR) tool to identify the text in the plurality of proposed designs, using a Segment Anything Model (SAM) to identify a text mask for the text used to remove the text, and using an inpainting tool to fill in the plurality of prosed designs where the text was removed to produce a plurality of textless designs (par. 74 image inpainting to remove/add text to images; par. 79 use of segmentation map and/or mask to remove text or other content form an image using reverse diffusion process 540); and
and
return the plurality of editable designs via the UI to the user; and determine, by a quality control review, a subset of the plurality of editable designs that will be added to a design template library (par. 55 adding to the CLIP model/”design template library”; A CLIP model can be applied to nearly arbitrary visual classification tasks so that the model may predict the likelihood of a text description being paired with a particular image, removing the need for users to design their own classifiers and the need for task-specific training data. For example, a CLIP model can be applied to a new task by inputting names of the task's visual concepts to the model's text encoder. The model can then output a linear classifier of CLIP's visual representations; Also par. 74 Diffusion models are a class of generative neural networks which can be trained to generate new data with features similar to features found in training data; the image features are also function in same functionality as claim limitation a “design”).
Examiner notes that Jindal provides different examples of claim limitations and recommends an amendment that narrows the limitations specifically with functional language that is different than what Jindal teaches.
Jindal teaches a version of compiling text in at least paragraph 72 which states further include identifying an additional scene text. Some examples further include generating an additional character-level embedding based on the additional scene text, wherein the image is generated based on the additional character-level embedding. For example, the additional scene text may be another text object that is extracted from the prompt. A text object is a set of text that may include multiple lines and is intended to be placed in a same region based on the prompt. For example, a prompt describing a scene text including text to be displayed on a handheld sign and text to be displayed on a neon billboard may include two text objects; In an attempt to advance prosecution and in the same field of endeavor Sugden teaches compiling a list of design purposes received via a user interface (UI) to a design system either from a user or from user queries (par. 74 user inputs various design purposes, attributes that denote the final design of the image from the text prompt (e.g. abstract and green)); generate a prompt using a prompt generator instructing a Large Language Model (LLM) to produce a plurality of design prompts for input to a text-to-image model, the plurality of design prompts instructing the text-to-image model to generate a plurality of proposed designs corresponding to the list of design purposes (par. 75 the user may adjust and/or revise the image prompt in the image prompt reviewer 618. For example, the text box in which the image prompt is displayed may be an interactive text box. The user may add and/or remove text from the image prompt in the text box of the image prompt reviewer 618. This may allow the user to revise the prompt to fine-tune the resulting image based on the prompt);
Submit the prompt from the prompt generator to the LLM; submit the plurality of design prompts generated by the LLM to the text-to-image model; receive the plurality of proposed designs from the text-to-image model (fig. 6 and 7 depict the user input into the prompt generator and passing the prompt to the LLM for image generation).
Jindal teaches input the plurality of proposed designs to an image separation pipeline to produce a plurality of textless designs; input the plurality of textless designs to a text generation and placement model to produce a plurality of editable designs (par. 74 image inpainting to remove/add text to images; par. 79 segmentation and mask used to remove text ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Sugden into Jindal because Sugden suggests in paragraph 16 methods for generating consumable content, such as digital images (or simply “images”), using a generative artificial intelligence (AI) model. The generative AI model may generate images using a set of descriptors, such as input words. Using the descriptors, a prompt engine may generate an image prompt for the generative AI model.
As for claims 2 and 13. The system of claim 1, Jindal teaches, wherein the text-to-image model is a diffusion model (par. 76 diffusion model).
As for claims 3 and 13. The system of claim 1, Jindal teaches, wherein the LLM is a Generative Pretrained Transformer (GPT) model (par. 66, LLM GPT).
As for claims 4 and 14. The system of claim 1, Jindal teaches, wherein the instructions further cause the processor to remove text generated by the text-to-image model in the plurality of proposed designs (par. 72, generating an additional scene text and means to do so; par. 77 and 79 the reverse diffusion process gradually removes noise which include text through OCR recognition which saves text as encoded data; another example is par. 55-57 where the text to be rendered on the image is removed/extracted from the prompt by text decomposer and then later rendered on the image as scene text as input to the character-level encoder). Examiner recommends clarification amendment as this limitation can be interpreted multiple ways as shown by the prior art.
As for claims 5 and 15. The system of claim 4, Jindal teaches, wherein removing the text generated by the text-to-image model comprises:
using an Optical Character Recognition (OCR) tool to identify the text in the plurality of proposed designs (par. 53 and 72 using OCR component for text with images through training or generating images to create “scene text”);
using a Segment Anything Model (SAM) to identify a text mask for the text used to remove the text (par. 79 segmentation map is used to guide reverse diffusion process 540; a Segment Anything Model as known in the art is a foundational model that generates segmentation masks similar to a segmentation map); and
using an inpainting tool to fill in the proposed design where the text was removed (par. 74 image inpainting to remove/add text to images).
As for claims 6 and 16. The system of claim 4, Jindal teaches, wherein the executable instructions further cause the processor to use a text generation/placement model to add the text back to the plurality of proposed designs (par. 101 the system generates an image that includes the scene text based on the prompt embedding and the character-level embedding).
As for claims 7 and 17. The system of claim 6, Jindal teaches, wherein the text generation/placement model uses text attributes from the plurality of proposed designs as output by the text-to-image model (par. 92 user describes what the output image should include, which is scene text through guidance prompt).
As for claims 8 and 18. The system of claim 6, Jindal teaches, wherein the added text is in a text box that is editable (par. 101 support of iterative diffusion process allows for user it update/add to image outputted in GUI; par. 98 the system may display a prompt text field to a user via a GUI, and the user may input the prompt via the text prompt text field).
As for claims 9 and 19. The system of claim 6, Jindal teaches, wherein the text generation/placement model corrects typographical or other errors from text in the plurality of proposed designs as generated by the text-to-image model (par. 72 and 115 using OCR to aid in the error checking for spelling and legibility).
As for claim 10. The system of claim 1, Jindal teaches, wherein the executable instructions further cause the processor to associate metadata with the plurality of proposed designs added to the template library to facilitate retrieval of the plurality of proposed designs based on a user query (par. 55 and 72 utilization of CLIP for classification wherein classification functions similar to metadata/tagging).
As for claim 11. The system of claim 1, Jindal teaches, wherein the executable instructions further cause the processor to complete a quality control review workflow on the plurality of proposed designs before a subset of the plurality of proposed designs is added to the template library (par.72 and 77 the reverse diffusion process gradually removes noisy features (designs) at various noise levels in a latent space comparing versions of images to train the model is equivalent to a corrective quality control, as; wherein it is stated that “the denoised image features 545 are compared to the original image features 520 at each of the various noise levels, and parameters of the reverse diffusion process 540 of the diffusion model are updated based on the comparison.” ).
(Note :) It is noted that any citation to specific, pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006,1009, 158 USPQ 275, 277 (CCPA 1968)).
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Inquires
Any inquiry concerning this communication should be directed to NICHOLAS AUGUSTINE at telephone number (571)270-1056.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
PNG
media_image1.png
208
559
media_image1.png
Greyscale
/NICHOLAS AUGUSTINE/Primary Examiner, Art Unit 2178 May 5, 2026