Last updated: April 18, 2026

Application No. 18/664,600

LOCALIZED ATTENTION-GUIDED SAMPLING FOR IMAGE GENERATION

Non-Final OA §103

Filed

May 15, 2024

Examiner

NGUYEN, HAU H

Art Unit

2611

Tech Center

2600 — Communications

Assignee

Adobe Inc.

OA Round

1 (Non-Final)

Interview Optional

— +8.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 892 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, HAU H View full profile →

Grants 90% — above average

Career Allow Rate

807 granted / 892 resolved

+28.5% vs TC avg

Moderate +9% lift

Without

With

+8.9%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

22 currently pending

Career history

914

Total Applications

across all art units

Statute-Specific Performance

§101

5.5%

-34.5% vs TC avg

§103

58.0%

+18.0% vs TC avg

§102

19.2%

-20.8% vs TC avg

§112

3.8%

-36.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 892 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/15/2024 was filed after the mailing date of the application.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-4, 8, 10-15, 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Ruiz et al. (“HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models”, arXiv:2307.06949v1 [cs.CV] 13 Jul 2023, “Ruiz”, hereinafter) in view of Alaluf et al. (“ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement”, IEEE/CVF International Conference on Computer Vision (ICCV), 2021, “Alaluf”, hereinafter).
As per claim 1, Ruiz teaches a method comprising: 
obtaining an input prompt (page 4, section 3 Preliminaries, subsection Latent Diffusion Models (LDM), obtaining a text prompt T); 
adding a customized residual to a base parameter of an image generation model based on an element of the input prompt to obtain an updated parameter, wherein the customized residual is determined based on the element of the input prompt (4.1 Lightweight DreamBooth (LiDB), pages 5-6, i.e., adding LoRa (low-rank adaptation) residuals which is customized for generating the personalized subset of weights to the diffusion network shown in Fig. 4. The updated parameter is described in section 4.2 HyperNetwork for Fast Personalization of Text-to-Image Models according supervisory text prompt. See further Alaluf addressed below); and 
generating, using the image generation model with the updated parameter, a synthesized image depicting the element based on the input prompt (Fig. 5, page 7, and Fig. 6, page 8).
Ruiz does not explicitly teach adding a customized residual to a base parameter of an image generation model even though this teaching is implicitly included therein as addressed above. However, Alaluf is recited for further clarification.
Alaluf, in a very similar method of generating synthesized image from input prompt (see Fig. 2, page 3), wherein the method further includes, “The predicted residual is then added to the previous latent code wt to obtain the updated latent code prediction wt+1 (shown in green). Finally, passing the newly computed latent code to the generator G results in an updated reconstruction ˆyt+1, which is then passed as input in the following step”, see Fig. 2, page 3).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method as taught by Alaluf into the method as taught by Ruiz addressed above, the advantage of which is for learning to perform a small number of steps in a residual-based manner (page 2, left column).
As per claim 2, the combined teachings of Ruiz and Alaluf also include the element of the input prompt indicates an object depicted in a reference image used to learn the customized residual (see Ruiz, section 4.2, subsection Supervisory Text Prompt, page 6, the object is a face).
As per claim 3, the combined Ruiz-Alaluf also teaches encoding the input prompt to obtain a text embedding, wherein the synthesized image is generated based on the text embedding (Ruiz, page 4, subsection Fast T2I Personalization, “learn encoders that predicts initial text embeddings”. See also subsection Latent Diffusion Models (LDM), page 4).
As per claim 4, the combined Ruiz-Alaluf also teaches the base parameter is in a transformer layer of the image generation model (Ruiz, Fig. 4, page 6, Visual Transformer).
As per claim 8, as addressed in claim 1, the combined Ruiz-Alaluf does teach the customized residual comprises a low rank adaptation of the base parameter.
As per claim 10, the combined Ruiz-Alaluf does also teach wherein generating the synthesized image comprises: performing a diffusion process on a noise input (Ruiz, page 4, subsection Latent Diffusion Models (LDM), “Text-to-Image (T2I) diffusion models Dθ(ϵ, c) iteratively denoises a given noise map ϵ ∈ Rh×w into an image I following the description of a text prompt T…”).
As per claim 11, the combined Ruiz-Alaluf does further teach the input prompt includes a nonce token representing the element and an additional token representing a target action of the element; and the synthesized image depicts the element performing the target action (at best understood by the examiner as the claimed element is not clear to be referred to. See Ruiz, page 6, subsection Supervisory Text Prompt, i.e., learned tokens embedded for the task of generating synthesized image addressed above).
As per claim 12, as addressed in claim 1, the combined Ruiz-Alaluf teaches a method comprising: 
obtaining a training set including a reference image depicting an element (Ruiz, Fig. 2, page 3, hypernetwork training. It is not clear what is meant by an element, thus interpreted as a human face shown in Figure); and 
training, using the training set, an image generation model to generate images depicting the element of the reference image by determining a customized residual to be added to a base parameter of the image generation model (as addressed in claim 1).
Claim 13, which is similar in scope to claim 8 as addressed above, is thus rejected under the same rationale. 
As per claim 14, the combined Ruiz-Alaluf also teaches the image generation model comprises a pre-trained model (Ruiz, section 4.2 HyperNetwork for Fast Personalization of Text-to-Image Models, page 6, pre-trained T2I model) and the base parameter is fixed while learning the customized residual (Alaluf, page 3, under section 3.1. Encoder-Based Inversion Methods,  “…during training, the pre-trained generator network G typically remains fixed”. Thus, claim 14 would have been obvious over the combined references for the reason above. 
As per claim 15, the combined Ruiz-Alaluf also teach computing a diffusion loss based on the reference image; and updating the customized residual based on the diffusion loss (Ruiz, page 6, section 4.2 HyperNetwork for Fast Personalization of Text-to-Image Models, vanilla diffusion denoising loss, and page 7, section 4.3 Rank-Relaxed Fast Finetuning).
Claim 17, which is similar in scope to claim 1 as addressed above, is thus rejected under the same rationale. 
Claim 18, which is similar in scope to claim 3 as addressed above, is thus rejected under the same rationale. 
As per claim 19, as addressed above in claim 1, the combined Ruiz-Alaluf does also teach the image generation model comprises a diffusion model.
As per claim 20, the combined Ruiz-Alaluf the base parameter is located within a projection block of a transformer layer (Ruiz, Fig. 4, page 6, in Visual Transformer Encoder).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Ruiz et al. (“HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models”, arXiv:2307.06949v1 [cs.CV] 13 Jul 2023) in view of Alaluf et al. (“ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement”, IEEE/CVF International Conference on Computer Vision (ICCV), 2021) further in view of Hu et al. (US. Patent App. Pub. No. 2023/0080693, “Hu”.
As per claim 7, the combined Ruiz-Alaluf fails to explicitly teach the base parameter comprises a parameter of a one-by-one convolutional block. However, Hu teaches a method of denoising an image for better quality image (see Abstract, Fig. 1, ¶ [61]), in which the method further includes convolutional layer of 1 x1 (see Fig. 4, ¶ [212]).
Since the combined Ruiz-Alaluf also teaches denoising input image (addressed in claim 10 above), it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to add the one-by-one convolutional block taught by Hu addressed to the combined Ruiz-Alaluf, the advantage of which is for the noise reduction is accurate and the denoise performance is improved (¶ [88]).

Claims 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Ruiz et al. (“HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models”, arXiv:2307.06949v1 [cs.CV] 13 Jul 2023) in view of Alaluf et al. (“ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement”, IEEE/CVF International Conference on Computer Vision (ICCV), 2021) further in view of Karpman et al. (U.S. Patent No. 11,995,803, “Karpman”).
As per claim 9, the combined Ruiz-Alaluf does not expressly teach adding customized residuals to a plurality of different layers of the image generation model at a plurality of different resolutions, respectively.
However, in a similar method of text-to-image diffusion model as shown in Fig. 1, col. 2, lines 51 to col. 3, line 36), Karpman teaches this feature (see col. 5, lines 36-62, “In some implementations, each high-resolution diffusion model 116 in the set of high-resolution diffusion models 116 defines a deep learning network configured to receive a low-resolution base image (e.g., 64 pixels by 64 pixels, 256 pixels by 256 pixels) and generate a higher-resolution version (e.g., copy) of the base image (e.g., 256 pixels by 256 pixels, 1024 pixels by 1024 pixels)”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the method as taught by Karpman to the combined Ruiz-Alaluf method as addressed above, the advantage of which is to upscale original image to high resolution image (col. 18, lines 31-42).
Claim 16, which is similar in scope to claim 9 as addressed above, is thus rejected under the same rationale. 

Allowable Subject Matter
Claims 5-6 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  The prior art taken singly or in combination does not teach or suggest, a method, among other things, comprising:
…wherein generating the synthesized image comprises: 
generating a foreground map and a background map using an attention layer of the image generation model; 
generating a first preliminary output using the base parameter and the background map;
generating a second preliminary output using the updated parameter and the foreground map; and 
combining the first preliminary output and the second preliminary output to obtain an intermediate output.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hau H. Nguyen whose telephone number is: 571-272-7787.  The examiner can normally be reached on MON-FRI from 8:30-5:30.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy Goddard, can be reached on (571) 272-7773.
The fax number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/HAU H NGUYEN/Primary Examiner, Art Unit 2611

Read full office action

Prosecution Timeline

May 15, 2024

Application Filed

Dec 19, 2025

Non-Final Rejection — §103

Mar 08, 2026

Interview Requested

Mar 18, 2026

Applicant Interview (Telephonic)

Mar 21, 2026

Examiner Interview Summary

Apr 06, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

18/501,350

Patent 12597194

METHOD FOR OBTAINING IMAGE RELATED TO VIRTUAL REALITY CONTENT AND ELECTRONIC DEVICE SUPPORTING THE SAME

2y 5m to grant Granted Apr 07, 2026

18/379,088

Patent 12591435

DEVICE LINK MANAGEMENT

2y 5m to grant Granted Mar 31, 2026

18/544,111

Patent 12586288

DEVICE AND METHOD FOR GENERATING DYNAMIC TEXTURE MAP FOR 3 DIMENSIONAL DIGITAL HUMAN

2y 5m to grant Granted Mar 24, 2026

18/292,970

Patent 12573135

GENERATION OF A DENSE POINT CLOUD OF A PHYSICAL OBJECT

2y 5m to grant Granted Mar 10, 2026

18/539,162

Patent 12573141

METHOD AND DEVICE FOR LEARNING 3D MODEL RECONSTRUCTION

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

90%

Grant Probability

99%

With Interview (+8.9%)

2y 9m

Median Time to Grant

Low

PTA Risk

Based on 892 resolved cases by this examiner. Grant probability derived from career allow rate.