Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Allowable Subject Matter
Claim 4-5, 24-25, 30 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The claim recites “determining an intermediate diffusion step based on the similarity, wherein the intermediate noise state is selected based on the intermediate diffusion step “ which determines the step based on the similarity. The prior art does not teach these limitations in combination with the other limitations.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3, 6-8 , 21-23, 26-29, 31-32 is/are rejected under 35 U.S.C. 103 as being unpatentable over GPTCache ( https://gptcache.readthedocs.io/en/latest/bootcamp/openai/image_generation.html) in view of HuggingFace (https://huggingface.co/compvis/stable-diffusion-v1-4/discussions/24).
Regarding claim 1:
A method comprising:
obtaining an input prompt (GPTCache generate relevant images based on text descriptions)(GPTCache get_prompt).
retrieving an (GPTCache GPTCache will cache the generated images so that the next time the same or similar text description is requested, it can be returned directly from the cache, which can improve efficiency and reduce costs.)(GPTCache There are two ways to initialize the cache, the first is to use the map cache (exact match cache) and the second is to use the database cache (similar search cache)).
and generating, using an image generation model, a synthetic image based on the input prompt and (GPTCache Then run openai.Image.create to generate the image.)(GPTCache First define the image_generation method, which is used to generate an image based on the input text and also return whether the cache hit or not. Then start the service with gradio, as shown below:).
GPTCache does not teach intermediate noise state. In a related field of endeavor, HuggingFace teaches:
retrieving an intermediate noise state (HuggingFace “re-seed with the old latent space.”) (HuggingFace “1. Your noise at each interpolation step will be the output of slerp .2. That noise is added at timestep 0 to the encoded "root" image (using scheduler add_noise function)”)(HuggingFace “1. encode your input image (could be the output of previous interpolation step)”)
and generating, using an image generation model, a synthetic image based on the input prompt and the intermediate noise state (HuggingFace “The image-to-image script img2img.py essentially does what you're describing, you could adapt it to work with the interpolation script. It takes an image, corrupts it with a lot-or-a-little amount of noise, depending on the strength parameter, and then "resumes" the diusion process for the corresponding number of steps.”
Therefore, it would have been obvious before the effective filing date of the claimed invention to reuse intermediate noise as taught by HuggingFace. The motivation for doing so would have been to reduce computation by reusing data. Therefore it would have been obvious to combine HuggingFace with GPTCache to obtain the invention.
Regarding claim 2:
The method of claim 1, has all of its limitations taught by GPTCache in view of HuggingFace. GPTCache further teaches
and comparing the text embedding with a candidate embedding (GPTCache “pre_embedding_func: pre-processing before extracting feature vectors embedding_func: the method to extract the text feature vector” “cache.init(pre_embedding_func=get_prompt, embedding_func=onnx.to_embeddings, data_manager=data_manager, similarity_evaluation=SearchDistanceEvaluation(),
“) of the candidate prompt, wherein the similarity is determined based on the comparison (GPTCache GPTCache will cache the generated images so that the next time the same or similar text description is requested, it can be returned directly from the cache, which can improve efficiency and reduce costs.)(GPTCache There are two ways to initialize the cache, the first is to use the map cache (exact match cache) and the second is to use the database cache (similar search cache)).
Hugging face teaches:
wherein retrieving the intermediate noise state comprises: (HuggingFace “re-seed with the old latent space.”) (HuggingFace “1. Your noise at each interpolation step will be the output of slerp .2. That noise is added at timestep 0 to the encoded "root" image (using scheduler add_noise function)”)(HuggingFace “1. encode your input image (could be the output of previous interpolation step)”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to reuse intermediate noise as taught by HuggingFace. The motivation for doing so would have been to reduce computation by reusing data. Therefore it would have been obvious to combine HuggingFace with GPTCache to obtain the invention.
Regarding claim 3:
The method of claim 1, has all of its limitations taught by GPTCache in view of HuggingFace. GPTCache further teaches wherein retrieving the intermediate noise state comprises:
generating a similarity score for each of a plurality of candidate prompts and selecting the candidate prompt having a highest similarity score among the plurality of candidate prompts (GPTCache The data_manager is used to store text, feature vector, and image object data, in the example, it takes Milvus (please make sure it is started), you can also configure other vector storage, refer to VectorBase API. Also you can set ObjectBase to configure which method to use to save the generated image, this example will be stored locally, you can also set it to S3 storage.)(GPTCache from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation).
Regarding claim 6:
The method of claim 1, has all of its limitations taught by GPTCache in view of HuggingFace. HuggingFace further teaches wherein:
the intermediate noise state comprises an intermediate output of the image generation model (HuggingFace “1. encode your input image (could be the output of previous interpolation step) 2. corrupt it with the corresponding amount of noise you'd expect from step strength*num_inference_steps=25 3. resume diusion for the remaining 25 steps strength=0 means that you're starting from pure noise as usual, strength will give you back the input image.).
Therefore, it would have been obvious before the effective filing date of the claimed invention to reuse intermediate noise as taught by HuggingFace. The motivation for doing so would have been to reduce computation by reusing data. Therefore it would have been obvious to combine HuggingFace with GPTCache to obtain the invention.
Regarding claim 7:
The method of claim 1, has all of its limitations taught by GPTCache in view of HuggingFace. HuggingFace further teaches wherein:
the intermediate noise state comprises a partially denoised image (HuggingFace “2. corrupt it with the corresponding amount of noise you'd expect from step strength*num_inference_steps=25”).
Therefore, it would have been obvious before the effective filing date of the claimed invention to reuse intermediate noise as taught by HuggingFace. The motivation for doing so would have been to reduce computation by reusing data. Therefore it would have been obvious to combine HuggingFace with GPTCache to obtain the invention.
Regarding claim 8:
The method of claim 1, has all of its limitations taught by GPTCache in view of HuggingFace. HuggingFace further teaches wherein:
the intermediate noise state comprises a partially denoised latent representation (HuggingFace “1. encode your input image (could be the output of previous interpolation step) 2. corrupt it with the corresponding amount of noise you'd expect from step strength*num_inference_steps=25 3. resume diusion for the remaining 25 steps strength=0 means that you're starting from pure noise as usual, strength will give you back the input image.).
Therefore, it would have been obvious before the effective filing date of the claimed invention to reuse intermediate noise as taught by HuggingFace. The motivation for doing so would have been to reduce computation by reusing data. Therefore it would have been obvious to combine HuggingFace with GPTCache to obtain the invention.
Regarding claim 21:
The claim is a parallel version of claim 1. As such it is rejected under the same teachings.
Regarding claim 22:
The claim is a parallel version of claim 2. As such it is rejected under the same teachings.
Regarding claim 23:
The claim is a parallel version of claim 3. As such it is rejected under the same teachings.
Regarding claim 26:
The claim is a parallel version of claim 6. As such it is rejected under the same teachings.
Regarding claim 27:
The claim is a parallel version of claim 1. As such it is rejected under the same teachings.
Regarding claim 28:
The claim is a parallel version of claim 2. As such it is rejected under the same teachings.
Regarding claim 29:
The claim is a parallel version of claim 3. As such it is rejected under the same teachings.
Regarding claim 31:
The claim is a parallel version of claim 7. As such it is rejected under the same teachings.
Regarding claim 32:
The claim is a parallel version of claim 8. As such it is rejected under the same teachings.
Conclusion
For the prior art referenced and the prior art considered pertinent to Applicant’s disclosure but not relied upon, see PTO-892 “Notice of References Cited”.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON PRINGLE-PARKER whose telephone number is (571) 272-5690 and e-mail is jason.pringle-parker@uspto.gov. The examiner can normally be reached on 8:30am-5:00pm est Monday-Friday. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, King Poon can be reached on (571) 270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, seehttp://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JASON A PRINGLE-PARKER/
Primary Examiner, Art Unit 2617