DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This final office action is in response to the amendment filed 7 November 2025.
Claims 1-20 are pending. Claims 9-14 are withdrawn as being directed toward a non-elected invention. Claims 1, 9, and 15 are independent claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 8, 15-16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Tran et al. (US 2022/0147769, published 12 May 2022, hereafter Tran) and further in view of Hsu et al. (US 2021/0081180, published 18 March 2021, hereafter Hsu) and further in view of Lin et al. (US 2021/0027470, published 28 January 2021, hereafter Lin) and further in view of Ramesh et al. (US 11922550, filed 30 March 2023, hereafter Ramesh) and further in view of Yi (US 2022/0044288, published 10 February 2022, provided on IDS filed 24 November 2025).
As per independent claim 1, Tran discloses a method comprising:
obtaining a prompt including a document description describing a plurality of elements (paragraph 0003: Here, an input is received of at least one or more demographic labels. These demographic labels are used to generate a provide a facial image using a model)
encoding the prompt to obtain a text embedding (paragraphs 0051-0052: Here, the provided attributes are encoded into vectors by the facial image generator to use in generating the facial image), representing the plurality of elements (paragraph 0003: Here, the image assets correspond to the provided labels such as those relating to gender, age, and ethnicity (paragraph 0043))
generating a plurality of image assets based on the prompt using a generative neural network (paragraphs 0043, 0045, and 0049-0052: Here, an initial synthesized face image is generated based on a learnable tensor. The generative adversarial network (GAN) generator learns a tensor using intermediate layers)
generating a structured document matching the document description, wherein the structure document comprises data in a computer file format (paragraph 0051: Here, the image generated by the Facial Image Generator (FIG) process is a structured document includes a plurality of attributes. These attributes correspond to labels such as gender, age, and ethnicity of the generated face)
Tran fails to specifically disclose:
a tuple comprising a plurality of images corresponding to the plurality of image assets
wherein the structure document includes the plurality of assets and metadata describing a relationship between the plurality of image assets
However, Hsu, which is analogous to the claimed invention because it is directed toward creating an interactive graphical element, discloses a structured document including image assets and metadata (paragraph 0021). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Hsu with Tran, with a reasonable expectation of success, as it would have allowed organizing and managing assets. This would have allowed for an improved editing of assets (Hsu: paragraph 0023).
Additionally, Tran fails to specifically disclose wherein the image assets comprises a foreground layer, a background layer, and an alpha channel for the foreground layer. However, Lin, which is analogous to the claimed invention because it is directed toward generating a composite image from multiple layers (Figure 2), discloses:
a tuple comprising a plurality of images corresponding to the plurality of image assets (paragraph 0057)
wherein the image assets comprises a foreground layer (Figures 2-3; paragraph 0057), a background layer (Figures 2-3; paragraph 0057), and an alpha channel for the foreground layer (paragraph 0081: Here, a composite digital image is generated by merging a foreground image layer, a background layer, and an alpha channel of the foreground image by a neural network (paragraph 0057)) or a combination thereof
It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Lin with Tran-Hsu, with a reasonable expectation of success, as it would have allowed for a flexible, automatic, image composition system for generating image assets (Lin: paragraphs 0003 and 0005).
Tran fails to specifically disclose performing diffusion denoising to obtain a single latent representation representing the plurality of image assets and decoding the single latent representation to obtain a tuple comprising a plurality of images corresponding to the image assets.
However, Ramesh, which is analogous to the claimed invention because it is directed toward image generation, discloses performing diffusion denoising to obtain a single latent representation representing the plurality of image assets and decoding the single latent representation corresponding to the image assets (column 15, line 23- column 16, line 7: Here, a denoising diffusion is used to output a latent representation of images. This representation includes both text and image data for reconstructing output image data). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Ramesh with Tran-Hsu-Lin, with a reasonable expectation of success, as it would have allowed for manipulations of an image based upon textual input (Ramesh: column 16, line 3-7).
Finally, Tran fails to specifically disclose wherein a tuple is ordered or labeled according to a relationship between the plurality of image assets. However, Yi, which is analogous to the claimed invention because it is directed toward generation of composite images through machine learning technology, discloses wherein a tuple is ordered or labeled according to a relationship between the plurality of assets (Figure 3; paragraphs 0041-0042 and 0048: Here, a sentence is processed and a label is extracted. In the example presented with respect to Figure 3, the labels “beach,” “seals,” “boats,” and “sand rocks on the beach” are identified. These labels are fed into a machine learning technology by an image generation unit. The image generation unit receives a plurality of images and selects the image that has been received/selected the most from among the submitted image. This most selected image is associated with each of these labels. The image and label constitute a tuple)
It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Yi with Tran-Hsu-Lin-Ramesh, with a reasonable expectation of success, as it would have allowed for generation of image data corresponding to a sentence prompt (Yi: paragraph 0008).
As per dependent claim 2, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claims 1, and the same rejection is incorporated herein. Hsu discloses wherein the encoder comprises a transformer network (paragraph 0053: Here, the image asset is transformed from one format to another. For example, the generation module transforms the object from a JPEG image to a PNG image). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Hsu with Tran, with a reasonable expectation of success, as it would have allowed organizing and managing assets. This would have allowed for an improved editing of assets (Hsu: paragraph 0023).
As per dependent claim 8, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claim 1, and the same rejection is incorporated herein. Tran discloses wherein the plurality of image assets includes a background image (paragraph 0046: Here, background scenery is a background image) and a foreground image (Figure 7: Here, the generated facial image is in the foreground, while the background scenery is in the background).
Tran fails to specifically disclose a relationship comprising a layer ordering of the background image and the foreground image. However, the examiner takes official notice that it was notoriously well-known in the art at the time of the applicant’s effective filing date to specify layer ordering (z-order) to allow for overlaying images. It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined the well-known with Tran-Hsu, with a reasonable expectation of success, as it would have facilitated layering of images to provide an image having greater depth.
With respect to independent claim 15, the applicant discloses the limitations substantially similar to those in claim 1. Claim 15 is similarly rejected.
Further, Tran discloses at least one memory component (paragraph 0022) and at least one processing device coupled to the at least one memory component, wherein the processing device is configured to execute instructions stored in the at least one memory component (Figure 2, item 200; paragraph 0042).
As per dependent claim 16, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations substantially similar to those in claim 15, and the same rejection is incorporated herein. Tran discloses a decoder configured to decode a latent vector generated by the generative neural network to obtain the plurality of image assets (paragraphs 0050-0052: Here, the latent vector is processed to generate the facial image by the FIG).
With respect to dependent claim 18, the applicant discloses the limitations substantially similar to those in claim 2. Claim 18 is similarly rejected.
Claims 3-4 are rejected under 35 U.S.C. 103 as being unpatentable over Tran, Hsu, Lin, Ramesh, and Yi and further in view of Park et al. (US 11967124, filed 18 May 2021, hereafter Park).
As per dependent claim 3, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claim 1, and the same rejection is incorporated herein. Tran discloses:
initializing a noise vector in a latent space representing a plurality of document parts (paragraphs 0050-0052: Here, the basic generator acts as a function of the noise vector)
generating a latent vector representing the plurality of image assets based on the noise vector using the generative neural network (paragraphs 0050-00552: Here, the latent code is generated by the provided by the label embedding layer that is a function of the noise vector and the list of discrete labels (paragraph 0051))
decoding the latent vector to obtain the plurality of assets, wherein the plurality of image assets correspond to the plurality of document parts respectively (paragraphs 0050-0052: Here, the latent vector is processed to generate the facial image by the FIG)
Tran fails to specifically disclose denoising the noise vector based on using the generative neural network. However, Park, which is analogous to the claimed invention because it is directed toward classifying images, discloses denoising the noise vector based on using the generative neural network (claim 1: Here, the training uses a denoised training image for image classification). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Park with Tran-Hsu-Lin, with a reasonable expectation of success, as it would have allowed for improved image classification using denoised training images (Park: claim 1).
As per dependent claim 4, Tran, Hsu, Lin, Ramesh, Yi , and Park disclose the limitations similar to those in claim 3, and the same rejection is incorporated herein. Tran discloses decoding the latent vector to obtain a parameter for displaying an asset of the plurality of image assets (paragraphs 0050-00552: Here, the latent code is generated by the provided by the label embedding layer that is a function of the noise vector and the list of discrete labels (paragraph 0051). The FIG generates the facial image from this data).
Claims 5-6 are rejected under 35 U.S.C. 103 as being unpatentable over Tran, Hsu, Lin, Ramesh, and Yi and further in view of Liu et al. (US 2024/0153153, filed 4 November 2022, hereafter Liu).
As per dependent claim 5, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claim 3, and the same rejection is incorporated herein. Tran fails to specifically disclose wherein the latent vector is generated using a denoising diffusion implicit model (DDIM) process.
However, Liu, which is analogous to the claimed invention because it is directed toward generating images using input text, discloses using a denoising diffusion implicit model process (Figure 1, item 106; paragraphs 0035-0036: Here, the diffusion model, which is a type of latent model, is used to generate the output image). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Liu with Tran-Hsu, with a reasonable expectation of success, as it would have allowed for back propagating the image to improve the final output image (Liu: paragraph 0032). This would have provided a user the advantage of receiving a better final output image.
As per dependent claim 6, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claim 1, and the same rejection is incorporated herein. Hsu discloses the structured document including a plurality of image assets (paragraph 0021). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Hsu with Tran, with a reasonable expectation of success, as it would have allowed organizing and managing assets. This would have allowed for an improved editing of assets (Hsu: paragraph 0023).
Tran fails to specifically disclose generating an additional asset by providing one or more of the plurality of image assets as input to the generative neural network. However, Liu, which is analogous to the claimed invention because it is directed toward generating images using input text, discloses generating an additional asset by providing one or more of the plurality of image assets as input to the generative neural network (paragraph 0032: Here, an intermediate image is generated and backpropagated to the generative neural network. Then an updated is image is generated based upon retraining the model on this generated image. This process is performed for a predetermined number of iterations to improve the final output model). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Hsu with Tran, with a reasonable expectation of success, as it would have allowed organizing and managing assets. This would have allowed for an improved editing of assets (Hsu: paragraph 0023).
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Tran, Hsu, Lin, Ramesh, and Yi and further in view of Bean (US 2024/0320867, filed 20 March 2023).
As per dependent claim 7, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claim 6, and the same rejection is incorporated herein. Tran fails to specifically disclose obtaining an additional prompt, wherein the additional asset is generated based on the additional prompt.
However, Bean, which is analogous to the claimed invention because it is directed toward generating an image based upon a plurality of prompts, discloses obtaining an additional prompt, wherein the additional asset is generated based on the additional prompt (paragraph 0004: Here, a first generated image is provided in response to the initial text prompt. A second text prompt is then provided and used to generate a modified image). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Bean with Tran-Hsu-Liu, with a reasonable expectation of success, as it would have allowed for iterative image generation to improve the quality of the final image provided to a user.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Tran, Hsu, Lin, Ramesh, and Yi and further in view of Raffiee et al. (US 2024/0257407, filed 27 January 2023, hereafter Raffiee).
As per dependent claim 17, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claim 16, and the same rejection is incorporated herein. Tran fails to specifically disclose a variational auto-encoded (VAE) model.
However, Raffiee, which is analogous to the claimed invention because it is directed toward training a model, discloses a variational auto-encoder (VAE) model (paragraph 0017). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Raffiee with Tran-Hsu, with a reasonable expectation of success, as it would have allowed a user to optimize image generation across data points.
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Tran, Hsu, Lin, Ramesh, and Yi and further in view of Tavanaei et al. (US 12190060, filed 30 September 2022, hereafter Tavanaei).
As per dependent claim 19, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claim 18, and the same rejection is incorporated herein. Tran fails to specifically disclose a multimodal text encoder configure to encode text and images in a joint embedding space.
However, Tavanaei, which is analogous to the claimed invention because it is directed toward a multimodal encoder, discloses a multimodal text encoder configured to encode text and images in a joint embedding space (Figure 4; column 5, lines 32-59: Here, the encoder/decoder includes modalities for text and image data). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Tavanaei, with Tran-Hsu, with a reasonable expectation of success, as it would have allowed for encoding/decoding both images and textual data. This would have provided the advantage of using both modalities to improve the output data.
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Tran, Hsu, Lin, Ramesh, and Yi and further in view of Kugelman et al. (A comparison of deep learning U-Net architectures for posterior segment OCT retinal layer segmentation, 1 September 2022, hereafter Kugelman).
As per dependent claim 20, Tran, Hsu, Lin, Ramesh, and Yi disclose the limitations similar to those in claim 18, and the same rejection is incorporated herein. Tran fails to specifically disclose a diffusion model based on a UNet architecture.
However, Kugelman, which is analogous to the claimed invention because it is directed toward using a UNet architecture, discloses a diffusion model based on a UNet architecture (page 2: Here, the “majority of semantic segmentation methods adopt an encoder-decoder deep neural network structure, most of which base their architectures on the U-Net”). It would have been obvious to one of ordinary skill in the art at the time of the applicant’s effective filing date to have combined Kugelman with Tran-Hsu, with a reasonable expectation of success, as it would have allowed for improve image generation using image segmentation.
Response to Arguments
Applicant’s arguments have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground of rejection is made in view of Tran, Hsu, Lin, Ramesh, and Yi.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Liu et al. (US 2023/0118966): Discloses generating story text and story images from a user input prompt (Abstract). This includes using a sequence to sequence transformer to generate story text (Figure 4, item 404) and a diffusion model to generate images based upon the generated story text (Figure 4, item 406)
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYLE R STORK whose telephone number is (571)272-4130. The examiner can normally be reached 8am - 2pm; 4pm - 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at 571/272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KYLE R STORK/Primary Examiner, Art Unit 2128