DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
Claims 1-20 have been previously presented.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3 and 5-8 are rejected under 35 U.S.C. 103 as being unpatentable over Ma et al. (hereinafter Ma, “Text Style Transfer With Decorative Elements”, 2021 IEEE MIPR, pg. 330-336) and further in view of Xie et al (hereinafter Xie, “SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model”, pg. 1-10).
Regarding claim 1, Ma teaches a method (pg. 331 sec. III 1st para. lines 1-4) comprising: obtaining, via a user interface, an input text (pg. 335 sec. V 1st para. lines 5-8); obtaining, via a user interface, a text effect prompt that describes a text effect for the input text (pg. 335 sec. V 1st para. lines 5-8); and generating, by an image generation (Figs. 2 & 7), an output image depicting the input text with the text effect described by the text effect prompt (pg. 331 ‘Text stye transfer’ 1st para. lines 1-23 and Fig. 7). However, Ma fails to teach generating, by an image generation model. Xie teaches generating, by an image generation model (pg. 5, sec. 4.4 2nd para. lines 1-6). Therefore, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the text input styles of Ma with the image generation models of Xie to enable a more controlled application of the text effect described by the text effect prompt in the generated output image.
Regarding claim 2, Ma teaches identifying a font for the input text, wherein the output image is generated based on the font (pg. 330 3rd para. lines 7-12).
Regarding claim 3, Ma teaches identifying that indicates a degree to which the output image adheres to a shape of the input text (pg. 331 ‘Text stye transfer’ 1st para. lines 1-23). However, Ma fails to teach a fit parameter wherein the output image is generated based on the fit parameter. Xie teaches a fit parameter wherein the output image is generated based on the fit parameter (Xie, Page 4, 4.2 Shape Precision Control). Therefore, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the text input styles of Ma with the image generation models of Xie to enable a more controlled application of the text effect described by the text effect prompt in the generated output image.
Regarding claim 5, Ma teaches identifying a text color, wherein the output image is generated based on the text color (Fig. 7).
Regarding claim 6, Ma teaches generating a mask for each character of the input text (pg. 334 sec. D 1st para. lines 1-3 and Fig. 10); and generating a character image for each character of the input text based on the mask (pg. 334 sec. D 1st para. lines 1-3 and Fig. 10), wherein the output image includes the character image for each character of the input text (Fig. 7).
Regarding claim 7, Ma fails to teach encoding the text effect prompt to obtain a text effect embedding, wherein the output image is generated based on the text effect embedding. Xie teaches encoding the text effect prompt to obtain a text effect embedding, wherein the output image is generated based on the text effect embedding (pg. 2 left col. 2nd para. lines 3-10). Therefore, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the text input styles of Ma with the image generation models of Xie to enable a more controlled application of the text effect described by the text effect prompt in the generated output image.
Regarding claim 8, Ma teaches obtaining, via a styling interface, one or more styling parameters, wherein the output image is generated based on the one or more styling parameters (abst. lines 1-6 and pg. 330 3rd para. lines 7-12).
Claims 4 and 11-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ma, in view of Xie and further in view of Smetanin et al, U.S Patent No 12,205,207 (hereinafter “Smetanin”).
Regarding claim 4, Ma and Xie fail to teach identifying a background color, wherein the output image is based on the background color. Smetanin teaches identifying a background color, wherein the output image is based on the background color (col. 18, lines 59-63, Fig. 10, set background 1006). Therefore, it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma and Xie to incorporate the teachings of Smetanin for identifying a background color, wherein the output image is generated based on the background color. Doing so would provide the user with an additional parameter for customizing the output image.
Regarding claim 11, Ma and Xie fail to teach identifying at least a portion of the text effect prompt as a negative text; and encoding the negative text to obtain a negative text effect embedding, wherein the output image is generated based on the negative text effect embedding. However, Smetanin teaches identifying at least a portion of the text effect prompt as a negative text (Smetanin, Col. 13, Lines 56-65, content moderation engine 408); and encoding the negative text to obtain a negative text effect embedding, wherein the output image is generated based on the negative text effect embedding (col. 15, lines 41-50, modified prompt). Therefore, it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma and Xie to incorporate the teachings of Smetanin for identifying a background color, wherein the output image is generated based on the background color. Doing so would provide the user with an additional parameter for customizing the output image.
Regarding claims 12 and 16, Ma teaches a method (pg. 331 sec. III 1st para. lines 1-4). However, Ma fails to teach initializing an image generation model, receiving training data including a training input text, a training image depicting the training input text, and a training text effect prompt that describes a text effect for the training input text, and generating a mask for each character of the training input text. Xie teaches generating a mask for each character of the training input text (pg. 4, 4.2 Shape Precision Control);
However, Ma and Xie fail to teach initializing an image generation model, receiving training data including a training input text, a training image depicting the training input text, and a training text effect prompt that describes a text effect for the training input text, generating a mask for each character of the training input text; and training the image generation model to generate an output image based on the mask, wherein the output image comprises the text effect based on the training text effect prompt. Smetanin teaches initializing an image generation model (Col. 12, Lines 43-59, text-to-image machine learning model); receiving training data including a training input text, a training image depicting the training input text, and a training text effect prompt that describes a text effect for the training input text (col. 13, lines 8-19); and training the image generation model to generate an output image based on the mask, wherein the output image comprises the text effect based on the training text effect prompt (col. 24, lines 46-46; Ma, Page 2, Fig. 2). Therefore, it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma and Xie to incorporate the teachings of Smetanin for identifying a background color, wherein the output image is generated based on the background color. Doing so would provide the user with an additional parameter for customizing the output image.
Regarding claims 13 and 20, Ma teaches training a mask network to generate the mask for each character of the training input text (pg 5, Fig. 7, [a] style, [b] content, [i] ours), wherein the output image is generated based on the mask (Fig. 7).
Regarding claims 14 and 18, Ma and Xie fail to teach training a text effect encoder to encode at least a portion of the training text effect prompt to obtain a text effect embedding, wherein the output image is generated based on the text effect embedding. Smetanin teaches training a text effect encoder to encode at least a portion of the training text effect prompt to obtain a style embedding, wherein (col. 13, lines 31-39) wherein the output image is generated based on the text effect embedding (col. 15, lines 41-50). Therefore, it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma and Xie to incorporate the teachings of Smetanin for identifying a background color, wherein the output image is generated based on the background color. Doing so would provide the user with an additional parameter for customizing the output image.
Regarding claim 15, Ma and Xie fail to teach training a style encoder to encode at least a portion of the training text effect prompt to obtain a style embedding, wherein the output image is generated based on the style embedding. Smetanin teaches training a text effect encoder to encode at least a portion of the training text effect prompt to obtain a style embedding, wherein (col. 13, lines 31-39) wherein the output image is generated based on the style embedding (col. 15, lines 41-50). Therefore, it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma and Xie to incorporate the teachings of Smetanin for identifying a background color, wherein the output image is generated based on the background color. Doing so would provide the user with an additional parameter for customizing the output image.
Regarding Claim 17, Ma fails to teach wherein the image generation model comprises a diffusion model. Xie teaches wherein the image generation model comprises a diffusion model (Fig. 2). Therefore, it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma and Xie to incorporate the teachings of Smetanin for identifying a background color, wherein the output image is generated based on the background color. Doing so would provide the user with an additional parameter for customizing the output image.
Regarding claim 19, Ma and Xie fail to teach the text effect encoder comprises an aesthetic encoder configured to generate an aesthetic embedding and a style encoder configures to generate a style embedding, wherein the output image is generated based on the aesthetic embedding and style embedding. Smetanin teaches the text effect encoder comprises an aesthetic encoder configured to generate an aesthetic embedding and a style encoder configures to generate a style embedding, (col. 13, lines 31-39) wherein the output image is generated based on the aesthetic embedding and style embedding (col. 15, lines 41-50). Therefore, it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma and Xie to incorporate the teachings of Smetanin for identifying a background color, wherein the output image is generated based on the background color. Doing so would provide the user with an additional parameter for customizing the output image.
Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Ma in view of Xie, in view of Smetanin and further in view of Dehouche et al, "What is in a Text-to-Image Prompt: The Potential of Stable Diffusion in Visual Arts Education", arXiv, pages 1-11 (hereinafter "Dehouche").
Regarding Claim 9, Ma, Xie and Smetanin fail to teach generating a style embedding and an aesthetic embedding based on the text effect prompt wherein the output image is generated based on the style embedding and the aesthetic embedding. Dehouche teaches generating a style embedding and an aesthetic embedding based on the text effect prompt (Dehouche, Page 5-7, 4 Data and Methods; 5.1 Formalizing Stable Diffusion Prompts, Table 1, Table 2) wherein the output image is generated based on the style embedding and the aesthetic embedding (Dehouche, Page 8, Fig. 4). Therefore it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma, Xie, and Smetanin to incorporate the teachings of Dehouche further comprising generating a style embedding and an aesthetic embedding based on the text effect wherein the output image is generated based on the style embedding and the aesthetic embedding. Doing so would provide the user with a more intuitive method for prompt generation using text input.
Regarding Claim 10, Ma, Xie and Smetanin fail to teach the text effect prompt comprises a style tag and the style embedding is based on the style tag. Dehouche teaches wherein the text effect prompt comprises a style tag (Page 5, Figure 2) and the style embedding is based on the style tag (Page 5-7, 4 Data and Methods; 5.1 Formalizing Stable Diffusion Prompts, Table 1, Table 2). Therefore it would be obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Ma, Xie, and Smetanin to incorporate the teachings of Dehouche further comprising generating a style embedding and an aesthetic embedding based on the text effect wherein the output image is generated based on the style embedding and the aesthetic embedding. Doing so would provide the user with a more intuitive method for prompt generation using text input.
Response to Arguments
The applicant argues that the prior art reference Darabi (US 2024/0127510) does not qualify as prior art in view of the claimed limitations recited in claims 1-20. Darabi has thereby been withdrawn in this office action. However, a new grounds of rejection has been provided in this office action in view of claims 1-20. Therefore applicant’s arguments with respect to claims 1-20 have been considered but are moot in view of the new grounds of rejection.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Said Broome whose telephone number is (571)272-2931. The examiner can normally be reached Monday - Friday 8:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Said Broome/Supervisory Patent Examiner, Art Unit 2612