Prosecution Insights
Last updated: April 19, 2026
Application No. 18/817,692

MODALITY SPECIFIC LEARNABLE ATTENTION FOR MULTI-CONDITIONED DIFFUSION MODELS

Non-Final OA §103
Filed
Aug 28, 2024
Examiner
BROWN, SHEREE N
Art Unit
2612
Tech Center
2600 — Communications
Assignee
Adobe Inc.
OA Round
1 (Non-Final)
65%
Grant Probability
Favorable
1-2
OA Rounds
3y 7m
To Grant
92%
With Interview

Examiner Intelligence

Grants 65% — above average
65%
Career Allow Rate
481 granted / 738 resolved
+3.2% vs TC avg
Strong +27% interview lift
Without
With
+27.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
34 currently pending
Career history
772
Total Applications
across all art units

Statute-Specific Performance

§101
14.3%
-25.7% vs TC avg
§103
25.0%
-15.0% vs TC avg
§102
32.7%
-7.3% vs TC avg
§112
22.0%
-18.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 738 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Application Status This office action is responsive to the Application No.:18/817,692 filed on 08/28/2024 (Priority Date: 10/06/2023). Claims 1-20 are pending and presented for examination. This action has been made NON-FINAL. Examiner Remarks In the spirit of compact prosecution, Applicant is requested to contact the Examiner for an interview to discuss the inventive concepts of the instant application. Applicant may optionally amend the claims to further direct the claims toward a particular inventive concept described in the specification without an interview. Additionally, the prior art rejection (if applicable) cites particular paragraphs, columns, and/or line numbers in the references for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art. transform Information Disclosure Statement The information disclosure statement (IDS) submitted on 11/14/2024; 08/28/24 being considered by the examiner. A signed IDS is hereby attached. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shi, 20250078392 in view of Liew, US 20240144544. Claim 1: Shi discloses a method (See Shi Abstract; Summary of Invention; Paragraphs 0004-0025) but failed to disclose a synthesized image based on the text attention output and the image attention output. Liew discloses this feature in paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have further modified Shi by the teachings of Liew to enable improved generating an object from text and/or image input using a pre-trained text-to-image diffusion-based generative model, more effectively (See Liew Field of Invention Paragraph 0001). In addition, both of the references teach features that are directed to analogous art and they are directed to the same field of endeavor, such as generating images. This close relation between both of the reference highly suggests an expectation of success. As modified: The combination of Shi and Liew discloses the following: encoding a text prompt to obtain a text embedding (See Shi Paragraphs 0045; 0058); encoding an image prompt to obtain an image embedding (See Shi Paragraphs 0066); performing, using a text attention layer of an image generation model, cross-attention on the text embedding to obtain a text attention output (See Shi Paragraphs 0045; 0058); performing, using an image attention layer of the image generation model, cross-attention on the image embedding to obtain an image attention output (See Shi Paragraphs 0066); and generating, using a generator network of the image generation model, a synthesized image based on the text attention output and the image attention output (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 2: The combination of Shi and Liew discloses wherein encoding the image prompt (See Shi Paragraphs 0066) comprises: encoding, using an image encoder, the image prompt to obtain a preliminary image encoding (See Shi Paragraphs 0066); and projecting, using an image projector, the preliminary image encoding to obtain the image embedding (See Shi Paragraphs 0066). Claim 3: The combination of Shi and Liew discloses the text embedding (See Shi Paragraphs 0045; 0058) comprises a first plurality of tokens (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086) in a text embedding space and the image embedding comprises a second plurality of tokens (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086) in the text embedding space (See Shi Paragraphs 0045; 0058). Claim 4: The combination of Shi and Liew discloses the text embedding (See Shi Paragraphs 0045; 0058) comprises a same number of tokens (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086) as the image embedding (See Shi Paragraphs 0066). Claim 5: The combination of Shi and Liew discloses combining the text attention output and the image attention output to obtain a combined attention output, wherein the synthesized image is generated based on the combined attention output (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 6: The combination of Shi and Liew discloses wherein generating the synthesized image comprises: performing a diffusion process on a noise input (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 7: The combination of Shi and Liew discloses encoding the noise input to obtain an intermediate feature map (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086), wherein the text attention output and the image attention output are based on the intermediate feature map (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 8: The combination of Shi and Liew discloses wherein: the text attention output and the image attention output are located in a common embedding space (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 9: Shi discloses a method of training a machine learning model (See Shi Abstract; Summary of Invention; Paragraphs 0004-0025) but failed to disclose a synthesized image. Liew discloses this feature in paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have further modified Shi by the teachings of Liew to enable improved generating an object from text and/or image input using a pre-trained text-to-image diffusion-based generative model, more effectively (See Liew Field of Invention Paragraph 0001). In addition, both of the references teach features that are directed to analogous art and they are directed to the same field of endeavor, such as generating images. This close relation between both of the reference highly suggests an expectation of success. As modified: The combination of Shi and Liew discloses the following: obtaining a training set including a training text prompt and a training image prompt (See Shi Paragraphs 0045; 0058; 0066; 0073); and training (See Shi Paragraphs 0045; 0058; 0066; 0073), using the training set, an image generation model to generate a synthesized image (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086), the training comprising: training a text attention layer of the image generation model to perform cross-attention based on the training text prompt (See Shi Paragraphs 0045; 0058); and training an image attention layer of the image generation model to perform cross-attention based on the training image prompt (See Shi Paragraphs 0066). Claim 10: The combination of Shi and Liew discloses wherein the training the image generation model comprises: computing a diffusion loss (See Shi Paragraphs 0074; 0095-0097); and updating parameters of the image generation model based on the diffusion loss (See Shi Paragraphs 0074; 0095-0097). Claim 11: The combination of Shi and Liew discloses wherein obtaining the training set comprises: generating the training text prompt based on the training image prompt (See Shi Paragraphs 0045; 0058; 0066). Claim 12: The combination of Shi and Liew discloses encoding the training text prompt to obtain a text embedding; and encoding the training image prompt to obtain an image embedding, wherein the image generation model is trained to generate the synthesized image based on the text embedding and the image embedding (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 13: The combination of Shi and Liew discloses projecting, using an image projector, a preliminary image encoding to obtain the image embedding (See Liew Paragraphs 0003-0004; 0014; 0026; 0038; 0061-0062; 0076; 0086), wherein the image generation model is trained to generate the synthesized image based on the image embedding (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 14: The combination of Shi and Liew discloses wherein: the image projector is jointly trained with the image generation model (See Liew Paragraphs 0003-0004; 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 15: Shi discloses an apparatus (See Shi Abstract; Summary of Invention; Paragraphs 0004-0025) but failed to disclose a synthesized image. Liew discloses this feature in paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086. It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have further modified Shi by the teachings of Liew to enable improved generating an object from text and/or image input using a pre-trained text-to-image diffusion-based generative model, more effectively (See Liew Field of Invention Paragraph 0001). In addition, both of the references teach features that are directed to analogous art and they are directed to the same field of endeavor, such as generating images. This close relation between both of the reference highly suggests an expectation of success. As modified: The combination of Shi and Liew discloses the following: at least one processor (See Shi Figure 1; Paragraphs 0039; 0043; 0048); at least one memory including instructions executable by the at least one processor (See Shi Figure 1; Paragraphs 0039; 0043; 0048); and an image generation model comprising parameters in the at least one memory (See Shi Figure 1; Paragraphs 0039; 0043; 0048), wherein the image generation model includes a text attention layer that performs cross-attention based on a text prompt (See Shi Paragraphs 0045; 0058; 0066) and an image attention layer that performs cross-attention based on an image prompt (See Shi Paragraphs 0045; 0058; 0066)., and wherein the image generation model is trained to generate a synthesized image (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086). Claim 16: The combination of Shi and Liew discloses a text encoder configured to encode the text prompt to obtain a text embedding (See Shi Paragraphs 0045; 0058; 0066). Claim 17: The combination of Shi and Liew discloses the text encoder includes a transformer architecture (See Shi Paragraphs 0059). Claim 18: The combination of Shi and Liew discloses an image encoder configured to encode the image prompt to obtain an image embedding (See Shi Paragraphs 0045; 0058; 0066). Claim 19: The combination of Shi and Liew discloses an image projector configured to project a preliminary image encoding to obtain the image embedding (See Shi Paragraphs 0045; 0058; 0066). Claim 20: The combination of Shi and Liew discloses wherein: the image generation model comprises a diffusion model (See Shi Paragraphs 0037-0039). Pertinent Art The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Zheng (US 20250094484) discloses transformer architecture and the cross-attention module 208 may combine asymmetrically the image and text embeddings. Contact Information Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHEREE N BROWN whose telephone number is (571)272-4229. The examiner can normally be reached M-F 5:30-2:00 PM EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SAID BROOME can be reached at (571) 272-2931. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SHEREE N BROWN/Primary Examiner, Art Unit 2612 February 5, 2026
Read full office action

Prosecution Timeline

Aug 28, 2024
Application Filed
Feb 05, 2026
Non-Final Rejection — §103
Apr 08, 2026
Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12593956
METHOD FOR BUILDING IMAGE READING MODEL BASED ON CAPSULE ENDOSCOPE, DEVICE, AND MEDIUM
2y 5m to grant Granted Apr 07, 2026
Patent 12573130
METHOD AND SYSTEM PROVIDING TEMPORARY TEXTURE APPLICATION TO ENHANCE 3D MODELING
2y 5m to grant Granted Mar 10, 2026
Patent 12548204
NEURAL FRAME EXTRAPOLATION RENDERING MECHANISM
2y 5m to grant Granted Feb 10, 2026
Patent 12541487
Method for Constructing Database, Method for Retrieving Document and Computer Device
2y 5m to grant Granted Feb 03, 2026
Patent 12541539
METHODS AND SYSTEMS FOR A COMPLIANCE FRAMEWORK DATABASE SCHEMA
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
65%
Grant Probability
92%
With Interview (+27.0%)
3y 7m
Median Time to Grant
Low
PTA Risk
Based on 738 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month