Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Amendment
This is in response to applicant’s amendment/response filed on 11/26/2025, which has been entered and made of record. Claims 1 and 7-8 have been amended. Claim 3 has been cancelled. Claim 9-14 have been added. Claims 1-2, 4-14 are pending in the application.
As an initial matter, the rejection under 35 USC 101 for claim 7 has been withdrawn in view of applicant's amendments.
Response to Arguments
Applicant’s arguments on 11/26/2025 have been fully considered but are moot because the arguments do not apply to any of the references being used in the current rejection.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4-5, 7-8, 9-14 are rejected under 35 U.S.C. 103 as being unpatentable over Shi et al. (US Pub 2024/0355022 A1) in view of Wilson et al. (US Pub 2022/0377257 A1).
As to claim 1, Shi discloses a generation support device for supporting generation of a result image (Shi, Abstract.), the generation support device comprising:
a generation information acquisition unit configured to acquires, from a user, generation information including at least style information regarding a style of the result image and an element image constituting part of the result image (Shi, abstract, “obtaining an input description and an input image depicting a subject” Fig. 2, ¶0021, “enable efficient generation of customized images by learning a concept from a set of representative inputs” ¶0023, “new high-quality images can be generated from text description, p, and a few images of a subject (i.e., concept), using a pre-trained text-to-image model.“ ¶0043, “image generation apparatus 120 can prompt a user 105 to provide a plurality of initial images 115, where the set 112 of the plurality of images 115 contains the same subject, which can be a person, an object, a building, an animal, etc., where the subject is the primary visual element that is the focus of the image 115. The subject can be present on a background that presents a scene for the subject, where the subject and scene can present a concept. The image generation apparatus 120 can prompt a user to provide a textual description for a new image to be generated, where the requested textual description provides attributes and details about the subject and background different from the initial images obtained by the image generation apparatus 120.”) and
a result image generation unit configured to input text information generated based on the generation information into a generative model, and generate the result image based on output information output from the generative model (¶0003, “a generative network that can learn a concept from a set of images, then generate new scenes or styles of the concept from an input prompt. Personalized image synthesis may generate new images of a particular subject (e.g., person, animal, object, etc.) with different poses, backgrounds, locations, positions, orientations, dressing, lighting, styles, all while keeping the same subject's identity.” ¶0004, “encoding the input description using a text encoder of an image generation model to obtain a text embedding, and encoding the input image using a subject encoder of the image generation model to obtain a subject embedding. One or more aspects of the method, apparatus, and non-transitory computer readable medium further include generating a guidance embedding by combining the subject embedding and the text embedding, and generating an output image based on the guidance embedding using a diffusion model of the image generation model, wherein the output image depicts the subject and the input description.” ¶0049, “the final textual embeddings of the input prompt can be obtained from the Text embeddings, c.sub.s, of the modified prompt, e.g. c.sub.s=CLIP(p.sub.s). The embedding of identifier, {circumflex over (V)}, can be a word vector that can be replaced with the concept feature vector, f.sub.c, to obtain the subject injected textual embedding c. The learned subject embedding is a compact feature containing the global semantics of the input images.” ¶0052, “a new image 125 can be generated based on the subject embedding, prompt embedding, and rich patch feature tokens, where the image can be generated by an image generation component”).
Shi does not explicitly disclose the style information includes information of a web address, and the generation information acquisition unit is configured to: access a website designated by the web address; obtain web information from the website, the web information including at least one of text information, image information, video information, or code information of the website; and determine the style information based on the web information.
Wilson teaches the style information includes information of a web address (Wilson, ¶0106, “styles may be received (and in some instances purchased) from products or brands. For example, a makeup company, such as Hotlips may produce and make available styles for users to use to influence their personal style, as described herein. In some embodiments, the styles may comprise style information, logic, settings, or one or more file(s) that specify model- or image-related information for use by the processes described herein for transferring a style to the user, and thus producing a personalized style. In this example, according to zone settings window 568, the user will appear to be wearing lipstick color #48 from the Hotlips brand. In some instances, companies may sell such styles on their websites, make them available via social media, or include them as promotional items (or make them available such as via a code or link) with the purchase of real-world products. These acquired styles may be imported into a user's style profiles or library.”), and the generation information acquisition unit is configured to:
access a website designated by the web address (Wilson, ¶0106, “companies may sell such styles on their websites, make them available via social media, or include them as promotional items (or make them available such as via a code or link) with the purchase of real-world products. These acquired styles may be imported into a user's style profiles or library.”);
obtain web information from the website, the web information including at least one of text information, image information, video information, or code information of the website (Wilson, ¶0106, “styles may be received (and in some instances purchased) from products or brands. For example, a makeup company, such as Hotlips may produce and make available styles for users to use to influence their personal style, as described herein. In some embodiments, the styles may comprise style information, logic, settings, or one or more file(s) that specify model- or image-related information for use by the processes described herein for transferring a style to the user, and thus producing a personalized style. ¶0110, “Responsive to selecting get the look item 580 in menu 531, style editor 503 may present window 571 showing source(s) of styles that a user may desire to acquire for influencing their personal style, as described herein. Example window 571 shows three sources of style information: celebrities 573, social media influencers 575, and your friends 577, and a link 579 to get more sources. Selecting one of these sources may navigate the user to a collection of styles corresponding to that source, such as a website, online library, repository, social media venue, or the like, where the user can browse, select, and import these styles into their style profiles or library.”); and
determine the style information based on the web information (Wilson, ¶0110, “the user can browse, select, and import these styles into their style profiles or library. In some instances, users may be required to purchase these styles before receiving them. In this way, a user may “get the look” of another user by receiving the other user's style and using it to influence their own style, according to the processes described in connection with FIGS. 3, 4, and 5A, so that the resulting style is personalized to the user.”).
Shi and Wilson are considered to be analogous art because all pertain to image generation. It would have been obvious before the effective filing date of the claimed invention to have modified Shi with the features of “the style information includes information of a web address, and the generation information acquisition unit is configured to: access a website designated by the web address; obtain web information from the website, the web information including at least one of text information, image information, video information, or code information of the website; and determine the style information based on the web information.” as taught by Wilson. The suggestion/motivation would have been may navigate the user to a collection of styles corresponding to that source, such as a website, online library, repository, social media venue, or the like, where the user can browse, select, and import these styles into their style profiles or library (Wilson, ¶0110). The claim would have been obvious because a particular known technique was recognized as part of the ordinary capabilities of one skilled in the art.
As to claim 2, claim 1 is incorporated and the combination of Shi and Wilson discloses the generation information acquisition unit further acquires position information of the element image in the result image (Shi, ¶0003, “Personalized image synthesis may generate new images of a particular subject (e.g., person, animal, object, etc.) with different poses, backgrounds, locations, positions, orientations, dressing, lighting, styles, all while keeping the same subject's identity.” ¶0026, “the network architecture can be trained and used to generate a new image with the same identifiable subject as a set of input images, but with a different arrangement of object(s), location(s), position(s), attribute(s), material, expression, and style.”).
As to claim 4, claim 2 is incorporated and the combination of Shi and Wilson discloses the information acquisition unit accepts upload or selection of the element image, and acquires the position information based on layout of the element image in a frame of the result image (Shi, ¶0004, “encoding the input image using a subject encoder of the image generation model to obtain a subject embedding.” ¶0005, “a subject encoder and a diffusion model based on the training set, wherein the subject encoder is trained to encode an input image depicting a subject to obtain a subject embedding, and wherein the diffusion model is trained to generate an output image depicting the subject based on the subject embedding” ¶0026, “generate a new image with the same identifiable subject as a set of input images, but with a different arrangement of object(s), location(s), position(s), attribute(s), material, expression, and style. Because the subject of the input concept in the images may not make up a large enough portion of the image, the subject may be cropped out from each input image to obtain a set of conditional images, which can force the machine learning model to focus on the exact object.” ¶0091, “The reverse diffusion process 530 can also be guided based on a text prompt 540, or another guidance prompt, such as a text description, an image, a layout, a segmentation map, etc. The text prompt 540 can be encoded using a text encoder 550 (e.g., a multimodal encoder) to obtain guidance features 560 in guidance space 570.” ¶0112-0113).
As to claim 5, claim 1 is incorporated and the combination of Shi and Wilson discloses wherein the information acquisition unit acquires the generation information by input made by the user in a chat format (Shi, ¶0003, “generate new scenes or styles of the concept from an input prompt” ¶0019, “a descriptive text prompt, that changes aspects of the individual subject in the image” ¶0043, “The image generation apparatus 120 can prompt a user to provide a textual description for a new image to be generated” ¶0058-0060. ¶0073, “given input of several digital images (e.g., photos) of the same woman and a language (i.e., text) prompt, “a woman is eating at the Eiffel Tower,” the output image should be of the same woman eating at the Eiffel Tower.”).
As to claim 7, the combination of Shi and Wilson discloses discloses a non-transitory computer- readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: a generation information acquisition step of acquiring generation information including at least style information regarding a style of the result image and an element image constituting part of the result image; and an image generation step of inputting text information generated based on the generation information into a generative model, and generating the result image based on output information output from the generative model, wherein the style information includes information of a web address, and the generation information acquisition step comprises: accessing a website designated by the web address; obtaining web information from the website, the web information including at least one of text information, image information, video information, or code information of the website; and determining the style information based on the web information (See claim 1 for detailed analysis.).
As to claim 8, the combination of Shi and Wilson discloses a generation support method for supporting generation of a result image, the generation support method comprising causing a processor to execute: a generation information acquisition step of acquiring generation information including at least style information regarding a style of the result image and an element image constituting part of the result image; and an image generation step of inputting text information generated based on the generation information into a generative model, and generating the result image based on output information output from the generative model, wherein the style information includes information of a web address, and the generation information acquisition step comprises: accessing a website designated by the web address; obtaining web information from the website, the web information including at least one of text information, image information, video information, or code information of the website; and determining the style information based on the web information (See claim 1 for detailed analysis.).
As to claim 9, claim 1 is incorporated and the combination of Shi and Wilson discloses obtain web information from the website comprises obtaining the web information from the code information of the website (Shi, ¶0038, “image generating apparatus 120 is implemented on a server. A server provides one or more functions to users linked by way of one or more communication networks. In some cases, the server can include a single microprocessor board, which includes a microprocessor responsible for controlling aspects of the server. In some cases, a server uses one or more microprocessors and protocols to exchange data with other devices/users on one or more of the communication networks via hypertext transfer protocol (HTTP), and simple mail transfer protocol (SMTP), although other protocols such as file transfer protocol (FTP), and simple network management protocol (SNMP) may also be used. In some cases, a server is configured to send and receive hypertext markup language (HTML) formatted files (e.g., for displaying web pages).” ¶0178, “code or data is transmitted from a website”. When Shi’s image generating apparatus is implemented on a server, the communication can be via HTML format files. The claim would have been obvious because a particular known technique was recognized as part of the ordinary capabilities of one skilled in the art.)
As to claim 10, claim 9 is incorporated and the combination of Shi and Wilson discloses the code information of the website comprises hyper text markup language (HTML) or cascading style sheets (CSS) (Shi, ¶0038, “image generating apparatus 120 is implemented on a server. A server provides one or more functions to users linked by way of one or more communication networks. In some cases, the server can include a single microprocessor board, which includes a microprocessor responsible for controlling aspects of the server. In some cases, a server uses one or more microprocessors and protocols to exchange data with other devices/users on one or more of the communication networks via hypertext transfer protocol (HTTP), and simple mail transfer protocol (SMTP), although other protocols such as file transfer protocol (FTP), and simple network management protocol (SNMP) may also be used. In some cases, a server is configured to send and receive hypertext markup language (HTML) formatted files (e.g., for displaying web pages).” ¶0178, “code or data is transmitted from a website”. When Shi’s image generating apparatus is implemented on a server, the communication can be via HTML format files. The claim would have been obvious because a particular known technique was recognized as part of the ordinary capabilities of one skilled in the art. HTML or CSS file formats are common on websites and well known to one of ordinary skill in the art.)
As to claim 11, claim 7 is incorporated and the combination of Shi and Wilson discloses wherein obtain web information from the website comprises obtaining the web information from the code information of the website (See claim 9 for detailed analysis.).
As to claim 12, claim 11 is incorporated and the combination of Shi and Wilson discloses wherein the code information of the website comprises hyper text markup language (HTML) or cascadingstyle sheets (CSS) (See claim 10 for detailed analysis.).
As to claim 13, claim 8 is incorporated and the combination of Shi and Wilson discloses obtain web information from the website comprises obtaining the web information from the code information of the website (See claim 9 for detailed analysis.).
As to claim 14, claim 13 is incorporated and the combination of Shi and Wilson discloses the code information of the website comprises hyper text markup language (HTML) or cascading style sheets (CSS) (See claim 10 for detailed analysis.).
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Shi et al. (US Pub 2024/0355022 A1) in view of Maschmeyer et al. (US Pub 2024/0161258 A1).
As to claim 6, claim 5 is incorporated and Shi does not explicitly disclose when acquiring input in the chat format, the information acquisition unit presents a suggestion of information required as the generation information to the user.
Maschmeyer teaches when acquiring input in the chat format, the information acquisition unit presents a suggestion of information required as the generation information to the user (Maschmeyer, ¶0052, “A text prompt provided by a user guides the image's generation. The text prompt includes an identifier referencing the subject to be included in the image. Users may also specify the number of inference steps for a sampler associated with the image generative model 212.” ¶0080, “the image generation engine determines a modified input to the customized generative model for a next iteration of generation. In at least some embodiments, modifying the input may include modifying a text prompt. The image generation engine may be configured to automatically modify text prompts or present, to a user, suggestions for modifying text prompts between iterations of the generative process.” ¶0081, “upon detecting a defect in light projections and/or shadows in a generated sample, a corresponding modification text such as “with correct shadow of [subject]” or “with consistent light and shadow conditions” may be included in the modified text prompt. In some embodiments, the image generation engine may provide suggested modification text to a user and prompt the user for input of a modified text prompt.”).
Shi and Maschmeyer are considered to be analogous art because all pertain to image generation. It would have been obvious before the effective filing date of the claimed invention to have modified Shi with the features of “when acquiring input in the chat format, the information acquisition unit presents a suggestion of information required as the generation information to the user” as taught by Maschmeyer. The suggestion/motivation would have been in order to determines a modified input to the customized generative model for a next iteration of generation (Maschmeyer, ¶0080). The claim would have been obvious because a particular known technique was recognized as part of the ordinary capabilities of one skilled in the art.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YU CHEN whose telephone number is (571)270-7951. The examiner can normally be reached on M-F 8-5 PST Mid-day flex.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/YU CHEN/
Primary Examiner, Art Unit 2613