DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
2. The information disclosure statement (IDS) submitted on 06/28/2024. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
3. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
4. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
6. Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sugden (US 2025/0095222 A1) in view of Kim et al. (US 2025/0005829 A1).
7. With reference to claim 1, Sugden teaches A method comprising: obtaining an image generation prompt, wherein the image generation prompt includes a first element and a second element, (“This disclosure generally relates to systems and methods for generating consumable content, such as digital images (or simply “images”), using a generative artificial intelligence (AI) model.” [0016] “The descriptor collector 210 may include a prompt engine 208. The prompt engine 208 may generate an image prompt to submit to a generative AI model 204. To generate the image prompt, the prompt engine 208 may include a descriptor combiner 256. The descriptor combiner 256 may combine, collate, sort, filter, or otherwise process the descriptors collected from the descriptor collector 210. The descriptor combiner 256 may generate a prompt input to send to the prompt generator. In some embodiments, the prompt engine 208 may, using the prompt input (e.g., the processed descriptors), generate the image prompt. As discussed herein, the prompt engine 208 may generate the image prompt in a prompt format using an image prompt formator 258. For example, the prompt engine 208 may generate the image prompt in the prompt format that is tailored to a particular generative AI model 204.” [0052-0053] “the prompt engine 208 may include an image prompt reviewer 218. The image prompt reviewer 218 may present the image prompt to the user. The user may review the image prompt and/or revise the image prompt.” [0055]) Sugden also teaches generating, using an image generation model, an intermediate image based on the prompt; (“the generative AI model 204 may be applied to the image prompt to generate one or more images based on the image prompt. In some embodiments, the generative AI model 204 may generate images periodically and/or episodically.” [0057]) Sugden further teaches generating, using the image generation model, a image including the first element and the second element based on the image generation prompt. (“the generative AI model 404 may generate a single image 412 for a single query and/or request. For example, the prompt engine 408 may receive the descriptors from the descriptor collector 410 and generate the image query and submit the image query to the generative AI model 404 one time. The generative AI model 404 may generate one image based on the single image query. … the prompt engine 408 may generate the image prompt based on the descriptors and the prompt engine 408 may submit the prompt to the generative AI model 404 multiple times to generate multiple images 412.” [0066-0067] “the image prompt inserted into the text box may include a list of the descriptors. In some embodiments, the image prompt inserted into the text box may include the image prompt in the prompt format that is directly transmitted to the generative AI model. … Upon review of the image prompt, the user may revise the image prompt as a revised image prompt 520. For example, the user may desire to make a change to the image prompt, but may not desire to change the descriptors. The user may review the image prompt at the image prompt reviewer 518, identify the portion of the image prompt he or she desires to change, and make the associated change. This may allow the user to fine-tune the revised image prompt 520. In this manner, the user may fine-tune the image according to his or her desires by fine-tuning the revised image prompt 520.” [0071-0072])
PNG
media_image1.png
496
427
media_image1.png
Greyscale
Sugden does not explicitly teach a reference prompt, and the reference prompt includes the second element; a synthetic image based on the intermediate image. This is Kim teaches (“the electronic device 2000 may obtain a predefined prompt for image generation, based on the context information. The predefined prompt may be text containing a description of what task the generative model is to perform. For example, the predefined prompt may be text that means “generate/edit the image to match the ‘description’ and the generative model may generate/edit the image so that an output matches the description.” [0087-0088] “The generative model may generate various image variations by editing the image. For example, the generative model may generate synthetic images that change the time, weather, season, color, contrast, saturation, mood, style, etc. of the image. In one or more embodiments of the disclosure, a modified image is a concept included in the synthetic images. Among the synthetic image(s) generated by the generative model, a synthetic image modified to correspond to the context may be referred to as a modified image. For example, an image that the electronic device 2000 finally provides to the user as a result of modifying the original image may be referred to as a modified image.” [0105-0106]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
8. With reference to claim 2, Sugden teaches the first element and the second element (“To generate the image prompt, the prompt engine 208 may include a descriptor combiner 256. The descriptor combiner 256 may combine, collate, sort, filter, or otherwise process the descriptors collected from the descriptor collector 210. The descriptor combiner 256 may generate a prompt input to send to the prompt generator.” [0052] “the prompt engine 208 may include an image prompt reviewer 218. The image prompt reviewer 218 may present the image prompt to the user. The user may review the image prompt and/or revise the image prompt.” [0055])
Sugden does not explicitly teach a subject and an action performed by the subject. This is Kim teaches (“the context information may detect a facial expression of a subject as context information. For example, based on the facial expression, the context information may be correlated with a person's mood such as happy, sad, concerned, etc.” [0061] “the context information may include other information available to generate a modified image in addition to the real-time information. The context information may include information related to the user of the electronic device 2000 and information obtained according to various situations and environments of the electronic device 2000. … the electronic device 2000 may obtain a predefined prompt for image generation, based on the context information. The predefined prompt may be text containing a description of what task the generative model is to perform. For example, the predefined prompt may be text that means “generate/edit the image to match the ‘description’ and the generative model may generate/edit the image so that an output matches the description.” [0086-0088]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
9. With reference to claim 3, Sugden teaches the image generation prompt comprises a nonce token corresponding to the first element. (“The prompt engine 208 may generate an image prompt to submit to a generative AI model 204. To generate the image prompt, the prompt engine 208 may include a descriptor combiner 256. The descriptor combiner 256 may combine, collate, sort, filter, or otherwise process the descriptors collected from the descriptor collector 210. The descriptor combiner 256 may generate a prompt input to send to the prompt generator.” [0052] “The randomization may take any form. For example, the randomization may be based on a random number generator. The random number generator may include a formula or other generator that may generate an output representative of a number. The randomization may occur at the generative AI model 404. In some embodiments, the randomization may occur at the prompt engine 408 and/or the descriptor collector 410. In some embodiments, each of the images 412 may be unique, with the random number deleted or not stored after generating the image 412.” [0068])
10. With reference to claim 4, Sugden teaches obtaining a source image depicting the first element; (“the prompt format may include context information usable by the generative AI model 104 to generate the images. For example, the prompt format may include context such as a particular training database, a particular domain, a particular sub-database, any other context information, and combinations thereof. In some embodiments, the context information may be generated based on the descriptors. For example, the context information may identify a sub-database or a filter on a database based on the descriptors, such as a filter applied to a color, an artistic style, an identified emotion, and so forth. In some embodiments, the context information may identify a domain, such as images produced by a particular source.” [0038])
Sugden does not explicitly teach generating an amplitude guidance based on the source image, wherein the synthetic image is generated based on the phase guidance. This is Kim teaches (“The additional information processing module 350 may generate data, such as text, icons, audio, etc. related to the context, based on the context information, and provide additional information on the customized screen. For example, the additional information processing module 350 may generate contextual text by summarizing or reconstructing the context information. Specifically, for example, when the context information specifies the current weather as “rainy”, contextual text “Don't forget to take an umbrella” may be generated.” [0139] “The training process of the generative model 700 may include, but is not limited to, a diffusion-reverse diffusion process which includes a diffusion process of adding noise to an original image step by step, and a reverse diffusion process of reconstructing the original image by removing noise from a noisy image (or by denoising the noisy image). Through the diffusion-reverse diffusion process, the generative model 700 may be trained to add noise to the image step by step, and predict the amount of noise and remove the noise. The image information generator 704 may include a noise predictor for processing the reverse diffusion process. The noise predictor may be implemented using a known neural network architecture, or through modifications of the known neural network architecture. … in order to have a newly generated image region naturally match with the surrounding image regions, an attention mechanism (e.g., self-attention) that assigns different weights to different locations within input data may be used.” [0185] “In an inference operation of the generative model 700 trained by applying the CFG technique, a guidance scale s for CFG may be used as an input parameter. The higher the value of a guidance scale that is input, the greater the degree and likelihood that the generated model 700 reflects the predefined prompt 720, but the quality of an image may deteriorate.” [0188] “The electronic device 2000 may generate a plurality of synthetic images by using the generative model and calculate evaluation scores for the generated synthetic images.” [0191]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
11. With reference to claim 5, Sugden does not explicitly teach obtaining a reference image depicting the second element; and generating a phase guidance based on the reference image, wherein the intermediate image is generated based on the phase guidance. This is Kim teaches (“the electronic device 2000 may obtain a predefined prompt for image generation, based on the context information. The predefined prompt may be text containing a description of what task the generative model is to perform. For example, the predefined prompt may be text that means “generate/edit the image to match the ‘description’ and the generative model may generate/edit the image so that an output matches the description.” [0087-0088] “The generative model may generate various image variations by editing the image. For example, the generative model may generate synthetic images that change the time, weather, season, color, contrast, saturation, mood, style, etc. of the image. In one or more embodiments of the disclosure, a modified image is a concept included in the synthetic images. Among the synthetic image(s) generated by the generative model, a synthetic image modified to correspond to the context may be referred to as a modified image. For example, an image that the electronic device 2000 finally provides to the user as a result of modifying the original image may be referred to as a modified image.” [0105-0106] “The additional information processing module 350 may generate data, such as text, icons, audio, etc. related to the context, based on the context information, and provide additional information on the customized screen. For example, the additional information processing module 350 may generate contextual text by summarizing or reconstructing the context information. Specifically, for example, when the context information specifies the current weather as “rainy”, contextual text “Don't forget to take an umbrella” may be generated.” [0139] “The training process of the generative model 700 may include, but is not limited to, a diffusion-reverse diffusion process which includes a diffusion process of adding noise to an original image step by step, and a reverse diffusion process of reconstructing the original image by removing noise from a noisy image (or by denoising the noisy image). Through the diffusion-reverse diffusion process, the generative model 700 may be trained to add noise to the image step by step, and predict the amount of noise and remove the noise. The image information generator 704 may include a noise predictor for processing the reverse diffusion process. The noise predictor may be implemented using a known neural network architecture, or through modifications of the known neural network architecture. … in order to have a newly generated image region naturally match with the surrounding image regions, an attention mechanism (e.g., self-attention) that assigns different weights to different locations within input data may be used.” [0185] “In an inference operation of the generative model 700 trained by applying the CFG technique, a guidance scale s for CFG may be used as an input parameter. The higher the value of a guidance scale that is input, the greater the degree and likelihood that the generated model 700 reflects the predefined prompt 720, but the quality of an image may deteriorate.” [0188] “The electronic device 2000 may generate a plurality of synthetic images by using the generative model and calculate evaluation scores for the generated synthetic images.” [0191] “the electronic device 2000 may generate the intermediate images 1625 constituting a dynamic screen by using a video generative model. The video generative model may comprise a diffusion model including a “diffusion-reverse diffusion” process which is adding and removing noise to image.” [0287]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
12. With reference to claim 6, Sugden does not explicitly teach generating the intermediate image comprises: performing a diffusion process up to an intermediate timestep. This is Kim teaches (“the electronic device 2000 may generate the intermediate images 1625 constituting a dynamic screen by using a video generative model. The video generative model may comprise a diffusion model including a “diffusion-reverse diffusion” process which is adding and removing noise to image. The video generative model may include a noise predictor for processing the diffusion-reverse diffusion process. For example, the noise predictor may be implemented based on a video U-Net architecture, but is not limited thereto. In addition, when training a video generative model to generate image frames constituting a video, a temporal attention mechanism for merging information according to time and a spatial attention mechanism for assigning weights to important parts of an image to improve image resolution may be used. Because the video generative model for generating the intermediate images 1625 may use various known architectures and/or algorithms, description of specific technical methods is omitted herein.” [0287]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
13. With reference to claim 7, Sugden does not explicitly teach generating the synthetic image comprises: performing a diffusion process starting from an intermediate timestep. This is Kim teaches (“The generative model may generate various image variations by editing the image. For example, the generative model may generate synthetic images that change the time, weather, season, color, contrast, saturation, mood, style, etc. of the image. In one or more embodiments of the disclosure, a modified image is a concept included in the synthetic images. Among the synthetic image(s) generated by the generative model, a synthetic image modified to correspond to the context may be referred to as a modified image. For example, an image that the electronic device 2000 finally provides to the user as a result of modifying the original image may be referred to as a modified image.” [0105-0106] “the electronic device 2000 may generate the intermediate images 1625 constituting a dynamic screen by using a video generative model. The video generative model may comprise a diffusion model including a “diffusion-reverse diffusion” process which is adding and removing noise to image. The video generative model may include a noise predictor for processing the diffusion-reverse diffusion process. For example, the noise predictor may be implemented based on a video U-Net architecture, but is not limited thereto. In addition, when training a video generative model to generate image frames constituting a video, a temporal attention mechanism for merging information according to time and a spatial attention mechanism for assigning weights to important parts of an image to improve image resolution may be used. Because the video generative model for generating the intermediate images 1625 may use various known architectures and/or algorithms, description of specific technical methods is omitted herein.” [0287]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
14. With reference to claim 8, Sugden teaches the image generation model is trained using a training set including a source image depicting the first element (“the prompt format may include context information usable by the generative AI model 104 to generate the images. For example, the prompt format may include context such as a particular training database, a particular domain, a particular sub-database, any other context information, and combinations thereof. In some embodiments, the context information may be generated based on the descriptors. For example, the context information may identify a sub-database or a filter on a database based on the descriptors, such as a filter applied to a color, an artistic style, an identified emotion, and so forth. In some embodiments, the context information may identify a domain, such as images produced by a particular source.” [0038] “the generative AI model 404 may generate a single image 412 for a single query and/or request. For example, the prompt engine 408 may receive the descriptors from the descriptor collector 410 and generate the image query and submit the image query to the generative AI model 404 one time. The generative AI model 404 may generate one image based on the single image query. … the prompt engine 408 may generate the image prompt based on the descriptors and the prompt engine 408 may submit the prompt to the generative AI model 404 multiple times to generate multiple images 412.” [0066-0067])
Sugden does not explicitly teach a reference image depicting the second element. This is Kim teaches (“the electronic device 2000 may obtain a predefined prompt for image generation, based on the context information. The predefined prompt may be text containing a description of what task the generative model is to perform. For example, the predefined prompt may be text that means “generate/edit the image to match the ‘description’ and the generative model may generate/edit the image so that an output matches the description.” [0087-0088] “The generative model may generate various image variations by editing the image. For example, the generative model may generate synthetic images that change the time, weather, season, color, contrast, saturation, mood, style, etc. of the image. In one or more embodiments of the disclosure, a modified image is a concept included in the synthetic images. Among the synthetic image(s) generated by the generative model, a synthetic image modified to correspond to the context may be referred to as a modified image. For example, an image that the electronic device 2000 finally provides to the user as a result of modifying the original image may be referred to as a modified image.” [0105-0106]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
15. Claims 9 is similar in scope to the combination of claims 1 and 8, and thus is rejected under similar rationale.
16. With reference to claim 10, Sugden teaches the training set includes a source prompt describing the source image (“the prompt format may include context information usable by the generative AI model 104 to generate the images. For example, the prompt format may include context such as a particular training database, a particular domain, a particular sub-database, any other context information, and combinations thereof. In some embodiments, the context information may be generated based on the descriptors. For example, the context information may identify a sub-database or a filter on a database based on the descriptors, such as a filter applied to a color, an artistic style, an identified emotion, and so forth. In some embodiments, the context information may identify a domain, such as images produced by a particular source.” [0038] “the generative AI model 404 may generate a single image 412 for a single query and/or request. For example, the prompt engine 408 may receive the descriptors from the descriptor collector 410 and generate the image query and submit the image query to the generative AI model 404 one time. The generative AI model 404 may generate one image based on the single image query. … the prompt engine 408 may generate the image prompt based on the descriptors and the prompt engine 408 may submit the prompt to the generative AI model 404 multiple times to generate multiple images 412.” [0066-0067])
Sugden does not explicitly teach a reference prompt describing the reference image. This is Kim teaches (“the electronic device 2000 may obtain a predefined prompt for image generation, based on the context information. The predefined prompt may be text containing a description of what task the generative model is to perform. For example, the predefined prompt may be text that means “generate/edit the image to match the ‘description’ and the generative model may generate/edit the image so that an output matches the description.” [0087-0088] “The generative model may generate various image variations by editing the image. For example, the generative model may generate synthetic images that change the time, weather, season, color, contrast, saturation, mood, style, etc. of the image. In one or more embodiments of the disclosure, a modified image is a concept included in the synthetic images. Among the synthetic image(s) generated by the generative model, a synthetic image modified to correspond to the context may be referred to as a modified image. For example, an image that the electronic device 2000 finally provides to the user as a result of modifying the original image may be referred to as a modified image.” [0105-0106]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
17. Claims 11 is similar in scope to the claim 3, and thus is rejected under similar rationale.
18. With reference to claim 12, Sugden teaches obtaining a pre-trained image generation model, wherein training the image generation model comprises finctuning the pre-trained image generation model based on the source image (“the prompt engine 108 may generate an image prompt to input into the generative AI model 104 to improve the image generation of the generative AI model 104. For example, the prompt engine 108 may generate an image prompt that pre-processes the descriptors. In some examples, the prompt engine 108 may generate an image prompt that processes the natural language input from the client device 102 and provides the generative AI model 104 with an image prompt based on the natural language input. In some examples, the prompt engine 108 may collect and organize the descriptors that are selected from the group of pre-determined descriptors “ [0036] “the prompt format may include context information usable by the generative AI model 104 to generate the images. For example, the prompt format may include context such as a particular training database, a particular domain, a particular sub-database, any other context information, and combinations thereof. In some embodiments, the context information may be generated based on the descriptors. For example, the context information may identify a sub-database or a filter on a database based on the descriptors, such as a filter applied to a color, an artistic style, an identified emotion, and so forth. In some embodiments, the context information may identify a domain, such as images produced by a particular source.” [0038] “the prompt engine 208 may include an image prompt reviewer 218. The image prompt reviewer 218 may present the image prompt to the user. The user may review the image prompt and/or revise the image prompt. Such a dynamic image prompt may help the user to tailor the image to his or her desires.” [0055])
Sugden does not explicitly teach the reference image. This is Kim teaches (“the electronic device 2000 may obtain a predefined prompt for image generation, based on the context information. The predefined prompt may be text containing a description of what task the generative model is to perform. For example, the predefined prompt may be text that means “generate/edit the image to match the ‘description’ and the generative model may generate/edit the image so that an output matches the description.” [0087-0088] “The generative model may generate various image variations by editing the image. For example, the generative model may generate synthetic images that change the time, weather, season, color, contrast, saturation, mood, style, etc. of the image. In one or more embodiments of the disclosure, a modified image is a concept included in the synthetic images. Among the synthetic image(s) generated by the generative model, a synthetic image modified to correspond to the context may be referred to as a modified image. For example, an image that the electronic device 2000 finally provides to the user as a result of modifying the original image may be referred to as a modified image.” [0105-0106]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
19. With reference to claim 13, Sugden does not explicitly teach training the image generation model comprises: computing a first diffusion loss term based on the source image; computing a second diffusion loss term based on the reference image; and updating parameters of the image generation model based on the first diffusion loss term and the second diffusion loss term. This is Kim teaches (“during a training process of the generative model 700, the image information generator 704 may process a diffusion-reverse diffusion process. The training process of the generative model 700 may include, but is not limited to, a diffusion-reverse diffusion process which includes a diffusion process of adding noise to an original image step by step, and a reverse diffusion process of reconstructing the original image by removing noise from a noisy image (or by denoising the noisy image). Through the diffusion-reverse diffusion process, the generative model 700 may be trained to add noise to the image step by step, and predict the amount of noise and remove the noise. The image information generator 704 may include a noise predictor for processing the reverse diffusion process. The noise predictor may be implemented using a known neural network architecture, or through modifications of the known neural network architecture.” [0185] “When generating an image by using the generative model, the electronic device 2000 may use a guidance scale for CFG as an input parameter. The higher the value of the guidance scale that is input, the greater the degree to which the generative model reflects a predefined prompt, and the lower the value of the guidance scale that is input, the degree to which the generative model reflects the predefined prompt is reduced.” [0208] “The video generative model may comprise a diffusion model including a “diffusion-reverse diffusion” process which is adding and removing noise to image. The video generative model may include a noise predictor for processing the diffusion-reverse diffusion process. For example, the noise predictor may be implemented based on a video U-Net architecture, but is not limited thereto. In addition, when training a video generative model to generate image frames constituting a video, a temporal attention mechanism for merging information according to time and a spatial attention mechanism for assigning weights to important parts of an image to improve image resolution may be used.” [0287]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
20. With reference to claim 14, Sugden does not explicitly teach generating an amplitude guidance based on the source image; and computing a phase guidance based on the reference image, wherein the generation of the synthetic image is based on the phase guidance and the amplitude guidance. This is Kim teaches (“the electronic device 2000 may obtain a predefined prompt for image generation, based on the context information. The predefined prompt may be text containing a description of what task the generative model is to perform. For example, the predefined prompt may be text that means “generate/edit the image to match the ‘description’ and the generative model may generate/edit the image so that an output matches the description.” [0087-0088] “The generative model may generate various image variations by editing the image. For example, the generative model may generate synthetic images that change the time, weather, season, color, contrast, saturation, mood, style, etc. of the image. In one or more embodiments of the disclosure, a modified image is a concept included in the synthetic images. Among the synthetic image(s) generated by the generative model, a synthetic image modified to correspond to the context may be referred to as a modified image. For example, an image that the electronic device 2000 finally provides to the user as a result of modifying the original image may be referred to as a modified image.” [0105-0106] “The additional information processing module 350 may generate data, such as text, icons, audio, etc. related to the context, based on the context information, and provide additional information on the customized screen. For example, the additional information processing module 350 may generate contextual text by summarizing or reconstructing the context information. Specifically, for example, when the context information specifies the current weather as “rainy”, contextual text “Don't forget to take an umbrella” may be generated.” [0139] “The training process of the generative model 700 may include, but is not limited to, a diffusion-reverse diffusion process which includes a diffusion process of adding noise to an original image step by step, and a reverse diffusion process of reconstructing the original image by removing noise from a noisy image (or by denoising the noisy image). Through the diffusion-reverse diffusion process, the generative model 700 may be trained to add noise to the image step by step, and predict the amount of noise and remove the noise. The image information generator 704 may include a noise predictor for processing the reverse diffusion process. The noise predictor may be implemented using a known neural network architecture, or through modifications of the known neural network architecture. … in order to have a newly generated image region naturally match with the surrounding image regions, an attention mechanism (e.g., self-attention) that assigns different weights to different locations within input data may be used.” [0185] “In an inference operation of the generative model 700 trained by applying the CFG technique, a guidance scale s for CFG may be used as an input parameter. The higher the value of a guidance scale that is input, the greater the degree and likelihood that the generated model 700 reflects the predefined prompt 720, but the quality of an image may deteriorate.” [0188] “The electronic device 2000 may generate a plurality of synthetic images by using the generative model and calculate evaluation scores for the generated synthetic images.” [0191]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
21. Claims 15 is similar in scope to claim 1, and thus is rejected under similar rationale. Sugden additionally teaches An apparatus comprising: at least one processor; at least one memory including instructions executable by the at least one processor; (“The computer system 900 includes a processor 901. … The computer system 900 also includes memory 903 in electronic communication with the processor 901. … Instructions 905 and data 907 may be stored in the memory 903. The instructions 905 may be executable by the processor 901 to implement some or all of the functionality disclosed herein. Executing the instructions 905 may involve the use of the data 907 that is stored in the memory 903.” [0094-0096])
22. With reference to claim 16, Sugden does not explicitly teach the image generation model comprises a diffusion model. This is Kim teaches (“during a training process of the generative model 700, the image information generator 704 may process a diffusion-reverse diffusion process. The training process of the generative model 700 may include, but is not limited to, a diffusion-reverse diffusion process which includes a diffusion process of adding noise to an original image step by step, and a reverse diffusion process of reconstructing the original image by removing noise from a noisy image (or by denoising the noisy image). Through the diffusion-reverse diffusion process, the generative model 700 may be trained to add noise to the image step by step, and predict the amount of noise and remove the noise. The image information generator 704 may include a noise predictor for processing the reverse diffusion process. The noise predictor may be implemented using a known neural network architecture, or through modifications of the known neural network architecture.” [0185] “The video generative model may comprise a diffusion model including a “diffusion-reverse diffusion” process which is adding and removing noise to image. The video generative model may include a noise predictor for processing the diffusion-reverse diffusion process. For example, the noise predictor may be implemented based on a video U-Net architecture, but is not limited thereto. In addition, when training a video generative model to generate image frames constituting a video, a temporal attention mechanism for merging information according to time and a spatial attention mechanism for assigning weights to important parts of an image to improve image resolution may be used.” [0287]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
23. Claims 17 is similar in scope to the combination of claims 6 and 7, and thus is rejected under similar rationale.
24. With reference to claim 18, Sugden does not explicitly teach a text encoder configured to generate a text embedding of the image generation prompt and the reference prompt. This is Kim teaches (“the encoder 702 may include an image encoder and a text encoder. The image encoder may generate image input data by converting an image into a vector representation, and a text encoder may generate text input data by converting text into a vector representation. The encoder 702 may be trained to find relationships between text and images and generate vector representations representing features of the images and text. … when the original image 710 and the predefined prompt 720 are input to the generative model 700, the generative model 700 may generate an image input data by converting the original image 710 into a vector representation via the image encoder, and generate a text embedding by converting the predefined prompt 720 into a vector representation via the text encoder. The image input data and the text input data may be transmitted to the image information generator 704.” [0182-0183]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
25. With reference to claim 19, Sugden does not explicitly teach a guidance component configured to generate a phase guidance and an amplitude guidance. This is Kim teaches (“The additional information processing module 350 may generate data, such as text, icons, audio, etc. related to the context, based on the context information, and provide additional information on the customized screen. For example, the additional information processing module 350 may generate contextual text by summarizing or reconstructing the context information. Specifically, for example, when the context information specifies the current weather as “rainy”, contextual text “Don't forget to take an umbrella” may be generated.” [0139] “The training process of the generative model 700 may include, but is not limited to, a diffusion-reverse diffusion process which includes a diffusion process of adding noise to an original image step by step, and a reverse diffusion process of reconstructing the original image by removing noise from a noisy image (or by denoising the noisy image). Through the diffusion-reverse diffusion process, the generative model 700 may be trained to add noise to the image step by step, and predict the amount of noise and remove the noise. The image information generator 704 may include a noise predictor for processing the reverse diffusion process. The noise predictor may be implemented using a known neural network architecture, or through modifications of the known neural network architecture. … in order to have a newly generated image region naturally match with the surrounding image regions, an attention mechanism (e.g., self-attention) that assigns different weights to different locations within input data may be used.” [0185] “In an inference operation of the generative model 700 trained by applying the CFG technique, a guidance scale s for CFG may be used as an input parameter. The higher the value of a guidance scale that is input, the greater the degree and likelihood that the generated model 700 reflects the predefined prompt 720, but the quality of an image may deteriorate.” [0188]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Kim into Sugden, in order to generate an image that accurately reflect a user's intended inquiry or desire.
26. Claims 20 is similar in scope to claim 3, and thus is rejected under similar rationale.
Conclusion
27. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michelle Chin whose telephone number is (571)270-3697. The examiner can normally be reached on Monday-Friday 8:00 AM-4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http:/Awww.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Kent Chang can be reached on (571)272-7667. The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https:/Awww.uspto.gov/patents/apply/patent- center for more information about Patent Center and https:/Awww.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHELLE CHIN/
Primary Examiner, Art Unit 2614