Last updated: July 17, 2026

Application No. 18/459,186

SELECTIVELY CONDITIONING LAYERS OF A NEURAL NETWORK WITH STYLIZATION PROMPTS FOR DIGITAL IMAGE GENERATION

Final Rejection §102

Filed

Aug 31, 2023

Examiner

SANKS, SCHYLER S

Art Unit

2129

Tech Center

2100 — Computer Architecture & Software

Assignee

Adobe Inc.

OA Round

2 (Final)

Interview Optional

— +15.9% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 73% grant rate with +15.9% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 515 resolved cases, 2023–2026

Examiner Intelligence

SANKS, SCHYLER S View full profile →

Grants 73% — above average

Career Allowance Rate

374 granted / 515 resolved

+17.6% vs TC avg

Strong +16% interview lift

Without

With

+15.9%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

24 currently pending

Career history

546

Total Applications

across all art units

Statute-Specific Performance

§101

1.5%

-38.5% vs TC avg

§103

74.1%

+34.1% vs TC avg

§102

6.6%

-33.4% vs TC avg

§112

17.3%

-22.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 515 resolved cases

Office Action

§102

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-5, 8-14, 16-17, and 19-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhang (Zhang, Lvmin, Anyi Rao, and Maneesh Agrawala. "Adding conditional control to text-to-image diffusion models." Proceedings of the IEEE/CVF international conference on computer vision. 2023.)
Regarding claim 1, Zhang teaches a method comprising: 
receiving a text prompt and an image prompt for generating a digital image (Figure 3: Prompt → Text Encoder → Each layer of stable diffusion, i.e. a text prompt, and Condition → … → zero convolution → a layer of SD Decoder, i.e. an image prompt, and Output, i.e. a digital image, see Figure 4); 
conditioning multiple upsampling layer of a neural network with an image vector representation of the image prompt (Figure 3: Any of zero convolution → a block of SD Decoder, which contains three layers, §3.1, “Herein, we use the term network block to refer to a set of neural layers that are commonly put together to form a single unit of a neural network”);
conditioning an additional upsampling layer of the neural network with a text vector representation of the text prompt without the image vector representation of the image prompt (Figure 3: Any of the other SD Decoder blocks with a Text Encoder input. The Text Encoder input does not include the image vector representation); and 
generating, utilizing the neural network, the digital image from the image vector representation and the text vector representation (Figure 4).
Regarding claim 2, Zhang teaches all of the limitations of claim 1, wherein 
conditioning the multiple upsampling layers of the neural network comprises conditioning a high-resolution upsampling layer of the neural network with the image vector representation of the image prompt, wherein the high-resolution upsampling layer has a higher resolution than a low-resolution upsampling layer of the neural network (Figure 3(a), any of the 16x16, 32x32, or 64x64 Decoder blocks).
Regarding claim 3, Zhang teaches all of the limitations of claim 2, wherein 
conditioning the additional upsampling layer of the neural network comprises conditioning the low-resolution upsampling layer with the text vector representation of the text prompt without the image vector representation of the image prompt (Figure 3(a), the 8x8 Decoder block or any of the 16x16 or 32x32 blocks if lower than that chosen in claim 2).
Regarding claim 4, Zhang teaches all of the limitations of claim 2, further comprising 
conditioning the high-resolution upsampling layer of the neural network with the text vector representation of the text prompt (Figure 3(a)).
Regarding claim 5, Zhang teaches all of the limitations of claim 1,wherein generating, utilizing the neural network, the digital image from the image vector representation and the text vector representation comprises 
utilizing the neural network in at least one denoising iteration of a diffusion neural network to generate the digital image (Figure 3(a), Decoder blocks are denoising iterations).
Regarding claim 8, Zhang teaches all of the limitations of claim 1, further comprising
conditioning a plurality of downsampling layers of the neural network with the text vector representation of the text prompt without the image vector representation of the image prompt (Figure 3(a), Encoder layers).
Regarding claim 9, Zhang teaches a system comprising: 
a memory component (§1, a computer with a processor and memory are used to run the disclosed models/algorithms, “We also show that in some tasks like depth-to-image, training ControlNets on a personal computer (one Nvidia RTX 3090TI) can achieve competitive results to commercial models trained on large computation clusters with terabytes of GPU memory and thousands of GPU hours.”); and 
one or more processing devices coupled to the memory component (§1, a computer with a processor and memory are used to run the disclosed models/algorithms, “We also show that in some tasks like depth-to-image, training ControlNets on a personal computer (one Nvidia RTX 3090TI) can achieve competitive results to commercial models trained on large computation clusters with terabytes of GPU memory and thousands of GPU hours.”), the one or more processing devices to perform operations comprising: 
receiving a first prompt and a second prompt for generating a digital image, the first prompt comprising an image prompt (Figure 3, “Condition” and “Prompt”, see Figure 6, the prompts are both used to generate an image and are therefore image prompts); 
generating, from a noise representation utilizing a denoising iteration of a diffusion neural network, an additional noise representation (Figure 3: Input → Encoder/Middle/Decoder) by: 
conditioning multiple layers of a neural network of the denoising iteration with a first vector representation of the first prompt (Figure 3, Condition → zero convolution → second decoder block, which is comprised of three layers); and 
conditioning a second layer of the neural network of the denoising iteration with a second vector representation of the second prompt (Figure 3, Text encoder → first decoder block or any of the first encoder blocks); and 
generating, utilizing additional denoising iterations of the diffusion neural network, the digital image from the additional noise representation, the first vector representation, and the second vector representation (Figure 6).
Regarding claim 10, Zhang teaches all of the limitations of claim 9, wherein conditioning the first layer of the neural network of the denoising iteration with the first vector representation comprises 
conditioning a high-resolution upsampling layer of the neural network with an image vector representation of the image prompt, wherein the high-resolution upsampling layer has a higher resolution than a low-resolution upsampling layer of the neural network (Figure 3, second decoder block).
Regarding claim 11, Zhang teaches all of the limitations of claim 10 wherein conditioning the second layer of the neural network of the denoising iteration with the second vector representation comprises conditioning the low-resolution upsampling layer of the neural network with a text vector representation of a text prompt without the image vector representation of the image prompt (Figure 3: The first SD Decoder blocks with a Text Encoder input. The Text Encoder input does not include the image vector representation).
Regarding claim 12, Zhang teaches all of the limitations of claim 9, wherein conditioning the first layer of the neural network of the denoising iteration with the first vector representation comprises 
conditioning a low-resolution upsampling layer of the neural network with an image vector representation of the image prompt, wherein the low-resolution upsampling layer has a lower resolution than a high-resolution upsampling layer of the neural network (Figure 3, the second decoder block can be considered a low-resolution upsampling layer).
Regarding claim 13, Zhang teaches all of the limitations of claim 12, wherein
conditioning the second layer of the neural network of the denoising iteration with the second vector representation comprises conditioning the high-resolution upsampling layer of the neural network with a text vector representation of a text prompt without the image vector representation (Figure 3, the third decoder block takes in a text vector representation of a text prompt that does not include the image vector representation).
Regarding claim 14, Zhang teaches all of the limitations of claim 9, wherein: conditioning the multiple layers of the neural network comprises 
conditioning a downsampling layer of the neural network with a text vector representation of a text prompt (Figure 3, any of the encoder blocks); and 
conditioning the second layer of the neural network comprises conditioning an upsampling layer of the neural network with an image vector representation of an image prompt (Figure 3, any of the decoder blocks, each of which has three layers).
Regarding claim 16, Zhang according to claim 1 teaches all of the limitations of claim 16, see §1 “We also show that in some tasks like depth-to-image, training ControlNets on a personal computer (one Nvidia RTX 3090TI) can achieve competitive results to commercial models trained on large computation clusters with terabytes of GPU memory and thousands of GPU hours.”
Regarding claim 17, Zhang teaches all of the limitations of claim 16, wherein: 
conditioning the multiple upsampling layer of the neural network comprises conditioning a high-resolution upsampling layer of the neural network with the image vector representation of the image prompt (Figure 3, any of the 16x16, 32x32, or 64x64 decoder blocks, each of which has three layers); and 
conditioning the additional upsampling layer of the neural network comprises conditioning a low-resolution upsampling layer of the neural network with the text vector representation of the text prompt, wherein the high-resolution upsampling layer has a higher resolution than the low-resolution upsampling layer (Figure 3, Decoder block 8x8).
Regarding claim 19, Zhang teaches all of the limitations of claim 16, wherein generating, utilizing the neural network, the digital image from the image vector representation and the text vector representation comprises: 
generating a first noise representation utilizing a first neural network of a first denoising iteration of a diffusion neural network; and generating a second noise representation utilizing a second neural network of a second denoising iteration of the diffusion neural network (Figure 3, Encoder and Decoder).
Regarding claim 20, Zhang teaches all of the limitations of claim 16, wherein the operations further comprise 
conditioning a plurality of downsampling layers of the neural network with the text vector representation of the text prompt (Figure 3: Encoder layers).
Allowable Subject Matter
Claims 6-7, 15, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Response to Arguments
Applicant’s remarks filed 05/04/2026 have been fully considered.
Applicant argues that Zhang fails to disclose the amended limitations in the response filed 05/04/2026. However, as shown herein, each block of Zhang contains three layers and therefore Zhang discloses conditioning multiple layers with the image representation and, mutatis mutandis, the limitations of claim 9.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCHYLER S SANKS whose telephone number is (571)272-6125. The examiner can normally be reached 06:30 - 15:30 Central Time, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SCHYLER S SANKS/Primary Examiner, Art Unit 2129

Read full office action

Prosecution Timeline

Aug 31, 2023

Application Filed

Mar 24, 2026

Non-Final Rejection mailed — §102

Apr 22, 2026

Interview Requested

Apr 28, 2026

Applicant Interview (Telephonic)

Apr 28, 2026

Examiner Interview Summary

May 04, 2026

Response Filed

Jun 23, 2026

Final Rejection mailed — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/415,742

Patent 12682275

LEARNING MODEL APPLYING SYSTEM, A LEARNING MODEL APPLYING METHOD, AND A PROGRAM

5y 0m to grant Granted Jul 14, 2026

17/876,931

Patent 12681743

Virtual Machine Managing System Using Snapshot

3y 11m to grant Granted Jul 14, 2026

18/044,852

Patent 12675670

Offline Primitive Discovery For Accelerating Data-Driven Reinforcement Learning

3y 4m to grant Granted Jul 07, 2026

17/469,573

Patent 12670404

METHOD AND SYSTEM FOR TRAINING A NEURAL NETWORK MODEL USING KNOWLEDGE DISTILLATION

4y 9m to grant Granted Jun 30, 2026

18/168,740

Patent 12670424

STORAGE MEDIUM, SOLUTION SEARCH METHOD, AND INFORMATION PROCESSING DEVICE

3y 4m to grant Granted Jun 30, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

73%

Grant Probability

88%

With Interview (+15.9%)

2y 10m (~0m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 515 resolved cases by this examiner. Grant probability derived from career allowance rate.