Last updated: April 19, 2026

Application No. 18/817,692

MODALITY SPECIFIC LEARNABLE ATTENTION FOR MULTI-CONDITIONED DIFFUSION MODELS

Non-Final OA §103

Filed

Aug 28, 2024

Examiner

BROWN, SHEREE N

Art Unit

2612

Tech Center

2600 — Communications

Assignee

Adobe Inc.

OA Round

1 (Non-Final)

Interview Optional

— +27.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 738 resolved cases, 2023–2026

Examiner Intelligence

BROWN, SHEREE N View full profile →

Grants 65% — above average

Career Allow Rate

481 granted / 738 resolved

+3.2% vs TC avg

Strong +27% interview lift

Without

With

+27.0%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

34 currently pending

Career history

772

Total Applications

across all art units

Statute-Specific Performance

§101

14.3%

-25.7% vs TC avg

§103

25.0%

-15.0% vs TC avg

§102

32.7%

-7.3% vs TC avg

§112

22.0%

-18.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 738 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Application Status
This office action is responsive to the Application No.:18/817,692 filed on 08/28/2024 (Priority Date: 10/06/2023).
Claims 1-20 are pending and presented for examination.
This action has been made NON-FINAL.
Examiner Remarks
In the spirit of compact prosecution, Applicant is requested to contact the Examiner for an interview to discuss the inventive concepts of the instant application. Applicant may optionally amend the claims to further direct the claims toward a particular inventive concept described in the specification without an interview.
Additionally, the prior art rejection (if applicable) cites particular paragraphs, columns, and/or line numbers in the references for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well.  It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art. transform
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 11/14/2024; 08/28/24 being considered by the examiner.  A signed IDS is hereby attached.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shi, 20250078392 in view of Liew, US 20240144544.
Claim 1:
Shi discloses a method (See Shi Abstract; Summary of Invention; Paragraphs 0004-0025) but failed to disclose a synthesized image based on the text attention output and the image attention output.  Liew discloses this feature in paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086.  It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have further modified Shi by the teachings of Liew to enable improved generating an object from text and/or image input using a pre-trained text-to-image diffusion-based generative model, more effectively (See Liew Field of Invention Paragraph 0001).  In addition, both of the references teach features that are directed to analogous art and they are directed to the same field of endeavor, such as generating images.  This close relation between both of the reference highly suggests an expectation of success.
As modified:
The combination of Shi and Liew discloses the following:
	encoding a text prompt to obtain a text embedding (See Shi Paragraphs 0045; 0058); 
	encoding an image prompt to obtain an image embedding (See Shi Paragraphs 0066); 
	performing, using a text attention layer of an image generation model, cross-attention on the text embedding to obtain a text attention output (See Shi Paragraphs 0045; 0058); 
	performing, using an image attention layer of the image generation model, cross-attention on the image embedding to obtain an image attention output (See Shi Paragraphs 0066);
	and generating, using a generator network of the image generation model, a synthesized image based on the text attention output and the image attention output (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 2:
The combination of Shi and Liew discloses wherein encoding the image prompt (See Shi Paragraphs 0066) comprises: encoding, using an image encoder, the image prompt to obtain a preliminary image encoding (See Shi Paragraphs 0066); and projecting, using an image projector, the preliminary image encoding to obtain the image embedding (See Shi Paragraphs 0066).
Claim 3:
The combination of Shi and Liew discloses the text embedding (See Shi Paragraphs 0045; 0058) comprises a first plurality of tokens (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086) in a text embedding space and the image embedding comprises a second plurality of tokens (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086) in the text embedding space (See Shi Paragraphs 0045; 0058).
Claim 4:
The combination of Shi and Liew discloses the text embedding (See Shi Paragraphs 0045; 0058) comprises a same number of tokens (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086) as the image embedding (See Shi Paragraphs 0066).
Claim 5:
The combination of Shi and Liew discloses combining the text attention output and the image attention output to obtain a combined attention output, wherein the synthesized image is generated based on the combined attention output (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 6:
The combination of Shi and Liew discloses wherein generating the synthesized image comprises: performing a diffusion process on a noise input (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 7:
The combination of Shi and Liew discloses encoding the noise input to obtain an intermediate feature map (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086), wherein the text attention output and the image attention output are based on the intermediate feature map (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 8:
The combination of Shi and Liew discloses wherein: the text attention output and the image attention output are located in a common embedding space (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 9: 
Shi discloses a method of training a machine learning model (See Shi Abstract; Summary of Invention; Paragraphs 0004-0025) but failed to disclose a synthesized image.  Liew discloses this feature in paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086.  It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have further modified Shi by the teachings of Liew to enable improved generating an object from text and/or image input using a pre-trained text-to-image diffusion-based generative model, more effectively (See Liew Field of Invention Paragraph 0001).  In addition, both of the references teach features that are directed to analogous art and they are directed to the same field of endeavor, such as generating images.  This close relation between both of the reference highly suggests an expectation of success.
As modified:
The combination of Shi and Liew discloses the following:
	obtaining a training set including a training text prompt and a training image prompt (See Shi Paragraphs 0045; 0058; 0066; 0073); 
	and training (See Shi Paragraphs 0045; 0058; 0066; 0073), using the training set, an image generation model to generate a synthesized image (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086), the training comprising: 
	training a text attention layer of the image generation model to perform cross-attention based on the training text prompt (See Shi Paragraphs 0045; 0058); 
	and training an image attention layer of the image generation model to perform cross-attention based on the training image prompt (See Shi Paragraphs 0066).
Claim 10:
The combination of Shi and Liew discloses wherein the training the image generation model comprises: computing a diffusion loss (See Shi Paragraphs 0074; 0095-0097); and updating parameters of the image generation model based on the diffusion loss (See Shi Paragraphs 0074; 0095-0097).
Claim 11:
The combination of Shi and Liew discloses wherein obtaining the training set comprises: generating the training text prompt based on the training image prompt (See Shi Paragraphs 0045; 0058; 0066).
Claim 12:
The combination of Shi and Liew discloses encoding the training text prompt to obtain a text embedding; and encoding the training image prompt to obtain an image embedding, wherein the image generation model is trained to generate the synthesized image based on the text embedding and the image embedding (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 13:
The combination of Shi and Liew discloses projecting, using an image projector, a preliminary image encoding to obtain the image embedding (See Liew Paragraphs 0003-0004; 0014; 0026; 0038; 0061-0062; 0076; 0086), wherein the image generation model is trained to generate the synthesized image based on the image embedding (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 14:
The combination of Shi and Liew discloses wherein: the image projector is jointly trained with the image generation model (See Liew Paragraphs 0003-0004; 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 15: 
Shi discloses an apparatus (See Shi Abstract; Summary of Invention; Paragraphs 0004-0025) but failed to disclose a synthesized image.  Liew discloses this feature in paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086.  It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have further modified Shi by the teachings of Liew to enable improved generating an object from text and/or image input using a pre-trained text-to-image diffusion-based generative model, more effectively (See Liew Field of Invention Paragraph 0001).  In addition, both of the references teach features that are directed to analogous art and they are directed to the same field of endeavor, such as generating images.  This close relation between both of the reference highly suggests an expectation of success.
As modified:
The combination of Shi and Liew discloses the following:
	at least one processor (See Shi Figure 1; Paragraphs 0039; 0043; 0048); 
	at least one memory including instructions executable by the at least one processor (See Shi Figure 1; Paragraphs 0039; 0043; 0048); 
	and an image generation model comprising parameters in the at least one memory (See Shi Figure 1; Paragraphs 0039; 0043; 0048), wherein the image generation model includes a text attention layer that performs cross-attention based on a text prompt (See Shi Paragraphs 0045; 0058; 0066) and an image attention layer that performs cross-attention based on an image prompt (See Shi Paragraphs 0045; 0058; 0066)., and wherein the image generation model is trained to generate a synthesized image (See Liew Paragraphs 0014; 0026; 0038; 0061-0062; 0076; 0086).
Claim 16:
The combination of Shi and Liew discloses a text encoder configured to encode the text prompt to obtain a text embedding (See Shi Paragraphs 0045; 0058; 0066).
Claim 17:
The combination of Shi and Liew discloses the text encoder includes a transformer architecture (See Shi Paragraphs 0059).
Claim 18:
The combination of Shi and Liew discloses an image encoder configured to encode the image prompt to obtain an image embedding (See Shi Paragraphs 0045; 0058; 0066).
Claim 19:
The combination of Shi and Liew discloses an image projector configured to project a preliminary image encoding to obtain the image embedding (See Shi Paragraphs 0045; 0058; 0066).
Claim 20:
The combination of Shi and Liew discloses wherein: the image generation model comprises a diffusion model (See Shi Paragraphs 0037-0039).
Pertinent Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Zheng (US 20250094484) discloses transformer architecture and the cross-attention module 208 may combine asymmetrically the image and text embeddings.
Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHEREE N BROWN whose telephone number is (571)272-4229. The examiner can normally be reached M-F 5:30-2:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SAID BROOME can be reached at (571) 272-2931. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHEREE N BROWN/Primary Examiner, Art Unit 2612                                                                                                                                                                                                        February 5, 2026

Read full office action

Prosecution Timeline

Aug 28, 2024

Application Filed

Feb 05, 2026

Non-Final Rejection — §103

Apr 08, 2026

Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

18/551,297

Patent 12593956

METHOD FOR BUILDING IMAGE READING MODEL BASED ON CAPSULE ENDOSCOPE, DEVICE, AND MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/236,338

Patent 12573130

METHOD AND SYSTEM PROVIDING TEMPORARY TEXTURE APPLICATION TO ENHANCE 3D MODELING

2y 5m to grant Granted Mar 10, 2026

17/303,651

Patent 12548204

NEURAL FRAME EXTRAPOLATION RENDERING MECHANISM

2y 5m to grant Granted Feb 10, 2026

17/696,737

Patent 12541487

Method for Constructing Database, Method for Retrieving Document and Computer Device

2y 5m to grant Granted Feb 03, 2026

18/678,908

Patent 12541539

METHODS AND SYSTEMS FOR A COMPLIANCE FRAMEWORK DATABASE SCHEMA

2y 5m to grant Granted Feb 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

65%

Grant Probability

92%

With Interview (+27.0%)

3y 7m

Median Time to Grant

Low

PTA Risk

Based on 738 resolved cases by this examiner. Grant probability derived from career allow rate.