Prosecution Insights
Last updated: April 19, 2026
Application No. 18/457,895

UTILIZING INDIVIDUAL-CONCEPT TEXT-IMAGE ALIGNMENT TO ENHANCE COMPOSITIONAL CAPACITY OF TEXT-TO-IMAGE MODELS

Non-Final OA §103
Filed
Aug 29, 2023
Examiner
HE, WEIMING
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Adobe Inc.
OA Round
3 (Non-Final)
46%
Grant Probability
Moderate
3-4
OA Rounds
3y 4m
To Grant
60%
With Interview

Examiner Intelligence

Grants 46% of resolved cases
46%
Career Allow Rate
190 granted / 410 resolved
-15.7% vs TC avg
Moderate +14% lift
Without
With
+13.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
40 currently pending
Career history
450
Total Applications
across all art units

Statute-Specific Performance

§101
7.4%
-32.6% vs TC avg
§103
59.2%
+19.2% vs TC avg
§102
12.4%
-27.6% vs TC avg
§112
15.0%
-25.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 410 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/17/2025 has been entered. Examiner's Note (1) In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. This will assist in expediting compact prosecution. MPEP 714.02 recites: "Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. An amendment which does not comply with the provisions of 37 CFR i .121 (b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714." Amendments not pointing to specific support in the disclosure may be deemed as not complying with provisions of 37 C.F.R. 1.131 (b), (c), (d), and (h) and therefore held not fully responsive. Generic statements such as "Applicants believe no new matter has been introduced" may be deemed insufficient. (2) Examiner has cited particular columns/paragraph and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. Response to Amendment The amendment filed on 12/17/2025 has been entered and made of record. Claims 1-2, 4-5, 9-11, 13-14, 16 and 19-20 are amended. Claims 1-20 are pending. Response to Arguments Applicant’s arguments with respect to the rejections of independent claims 1, 9 and 16 have been fully considered but they are moot because the arguments do not apply to the references being used in the current rejection. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xiao et al. (FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention) in view of Avrahami (WO 2024/243527 A1). As to Claim 1, Xiao teaches A method comprising: generating, utilizing a first denoising step of a diffusion neural network, a first noise representation (Xiao teaches a training stage in Fig 3 as shown below: PNG media_image1.png 636 1189 media_image1.png Greyscale ); generating, utilizing a second denoising step of the diffusion neural network, a prompt noise representation from the first noise representation by conditioning the second denoising step with text tokens of a first text concept and a second text concept of a text prompt; generating, utilizing the second denoising step of the diffusion neural network, a first concept noise representation for the second denoising step from the first noise representation (Xiao discloses inference stage as a second diffusion neural network with delayed subject conditioning in Fig 3); generating, utilizing the second denoising step of the diffusion neural network, a second concept noise representation for the second denoising step from the first noise representation by conditioning the second denoising step with an additional subset of the text tokens corresponding to the second text concept (Xiao discloses delayed subject conditioning on the input text prompt in Fig 3); combining the first concept noise representation and the second concept noise representation to generate a combined concept noise representation for the second denoising step (Xiao discloses generated image at inference stage in Fig 3, see also section Text-Conditioning via Cross-Attention Mechanism at p. 4 and section 4.3 Delayed Subject Conditioning in Iterative Denoising at p. 6). Xiao teaches loss function without detail description. The combination of Avrahami further teaches following limitations: comparing the combined concept noise representation, generated from the first concept noise representation and the second concept noise representation for the second denoising step, with the prompt noise representation, generated from the text prompt also for the second denoising step, to determine a concept-prompt noise representation measure of loss; and modifying parameters of the second denoising step of the diffusion neural network according to the concept-prompt noise representation measure of loss (Xiao discloses “denoising loss (Figure 3)” at p. 5 and cross-attention localization loss under section 5.4 Ablation Study; “At inference time, a random noise zT is sampled from N(0, 1) and iteratively denoised by the U-Net to the initial latent representation z0” at p. 4. Avrahami further discloses “in step 405, the system evaluates a loss function. The function includes a reconstruction loss term that generates a loss value based on a comparison of the input image and the synthetic image. As an example, in cases where a latent diffusion model is used, the reconstruction loss can be a latent diffusion loss that measures the difference between a predicted set of noise linked to the synthetic image and a set of noise intentionally added to the input image during the generation of the noised latent image” in [0053]; “For example, a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function)… Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations” in [0075]; see also addition loss term in [0026], masked diffusion loss in [0040-0041], cross-attention loss in [0043] and Fig 2. Here, the loss functions can be used to calculate the difference between an input image and output image during a neural network processing.) It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the invention of Xiao with the invention of Avrahami so as to calculate a reconstruction loss or any other loss functions based on a comparison of the input image and the synthetic image. As to Claim 2, Xiao in view of Avrahami teaches The method of claim 1, wherein generating the prompt noise representation further comprises selecting the second denoising step of the diffusion neural network from a plurality of denoising steps to generate the prompt noise representation from the text prompt (Xiao, section Stable Diffusion at p. 3. Avrahami discloses “For instance, the image generation model could be… or a more complex latent diffusion model. The latter uses a series of noise-adding and denoising steps to generate the synthetic image. Thus, in some implementations, the image generation model can be a latent diffusion model that creates the synthetic image from a noised latent image” in [0052]; see also [0040, 0045].) As to Claim 3, Xiao in view of Avrahami teaches The method of claim 1, further comprises: generating a third concept noise representation from a third text concept included within the text prompt; and combining the first concept noise representation, the second concept noise representation, and the third concept noise representation to generate the combined concept noise representation (Avrahami discloses “A text prompt can then be constructed that includes the selected concepts. One example text prompt is "a photo of [vi1] and ... [vik]". Here, multiple concepts within the text prompt can be extracted for individual noise representation. Xiao, Fig 3, 5, 7.) As to Claim 4, Xiao in view of Avrahami teaches The method of claim 1, wherein generating the prompt noise representation comprises conditioning the second denoising step of the diffusion neural network with the text tokens of the first text concept and the second text concept of the text prompt by providing the text tokens to attention mechanisms of the second denoising step, wherein the attention mechanisms focus on removing noise from portions of the first noise representation indicated by the text tokens (Xiao discloses “We use a vision encoder to derive this identity embedding from a referenced image, and then augment the generic text tokens with features from this identity embedding. This enables image generation based on subject-augmented conditioning… To tackle the multi-subject identity blending issue, we identify unregulated cross-attention as the primary reason (Figure 4). When the text includes two "person" tokens, each token’s attention map attends to both person in the image rather than linking each token to a distinct person in the image” at p. 2; “Figure 4: In the absence of cross-attention regularization (top), the diffusion model attends to multiple subjects’ input tokens and merge their identity. By applying cross-attention regularization (bottom), the diffusion model learns to focus on only one reference token while generating a subject. This ensures that the features of multiple subjects in the generated image are more separated” in Fig 4. Avrahami also discloses “In addition, (2) in order to avoid overfitting, the illustrated example uses a two-phase training regime, which starts by optimizing only the newly-added tokens…” in [0036].) As to Claim 5, Xiao in view of Avrahami teaches The method of claim 1, wherein generating the first concept noise representation and the second concept noise representation comprises: conditioning the second denoising step by guiding a removal of noise from the first noise representation according to the subset of the text tokens corresponding to the first text concept; and conditioning the second denoising step by guiding an additional removal of noise from the first noise representation according to the additional subset of the text tokens corresponding to the second text concept (Xiao discloses “Figure 4: In the absence of cross-attention regularization (top), the diffusion model attends to multiple subjects’ input tokens and merge their identity. By applying cross-attention regularization (bottom), the diffusion model learns to focus on only one reference token while generating a subject. This ensures that the features of multiple subjects in the generated image are more separated”; see also section 4.2 Localizing cross-attention maps with subject segmentation masks.) As to Claim 6, Xiao in view of Avrahami teaches The method of claim 1, further comprises: selecting, an additional denoising step of the diffusion neural network from a plurality of denoising steps; and generating, utilizing the additional denoising step of the diffusion neural network, an additional prompt noise representation from an additional text prompt comprising a third text concept and a fourth text concept (Xiao teaches a delayed subject conditioning in Fig 3. Avrahami discloses “A text prompt can then be constructed that includes the selected concepts. One example text prompt is "a photo of [vi1] and ... [vik]" in [0039]. Here, multiple concepts within the text prompt can be extracted for individual noise representation.) As to Claim 7, Xiao in view of Avrahami teaches The method of claim 6, further comprises: generating, utilizing the additional denoising step of the diffusion neural network, a third concept noise representation and a fourth concept noise representation; generating an additional combined concept noise representation by combining the third concept noise representation and the fourth concept noise representation; and modifying parameters of the diffusion neural network by comparing the additional combined concept noise representation and the additional prompt noise representation (Avrahami discloses “A text prompt can then be constructed that includes the selected concepts. One example text prompt is "a photo of [vi1] and ... [vik]" in [0039]; “In some implementations, the proposed approach can be performed in two phases. In the first phase, a computing system can designate a set of dedicated text tokens (or handles), freeze the model weights, and optimize the handles to reconstruct the input image. In the second phase, the computing system can switch to fine-tuning the model weights, while continuing to optimize the handles” in [0023]; “in step 405, the system evaluates a loss function. The function includes a reconstruction loss term that generates a loss value based on a comparison of the input image and the synthetic image. As an example, in cases where a latent diffusion model is used, the reconstruction loss can be a latent diffusion loss that measures the difference between a predicted set of noise linked to the synthetic image and a set of noise intentionally added to the input image during the generation of the noised latent image” in [0053]; “For example, a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function)… Gradient descent techniques can be used to iteratively update the parameters over a number of training iterations” in [0075]; fine-tuning in [0018, 0028, 0037]. Xiao, Fig 2-3) As to Claim 8, Xiao in view of Avrahami teaches The method of claim 7, further comprises: identifying a text prompt comprising multiple text concepts from a client device; and generating, utilizing the diffusion neural network with the parameters modified, a digital image comprising the multiple text concepts (Avrahami discloses “The method includes initializing, by the computing system, a plurality of embeddings respectively for the plurality of visual concepts. The method includes, for each of one or more learning iterations: generating, by the computing system, a text prompt comprising one or more of the plurality of embeddings; processing, by the computing system, the text prompt with an image generation model to generate a synthetic image that depicts the visual concepts associated with the one or more embeddings included in the text prompt” in [0006]; “the goal is to extract a dedicated text token for each concept. This enables generation of novel images from textual prompts, featuring individual concepts or combinations of multiple concepts, as demonstrated in Figure 5” in [0019]. Xiao, Fig 3.) Claim 9 recites similar limitations as claims 1 in an system form, further recites static and training diffusion neural network (Xiao teach stable diffusion at p. 3). Therefore, the same rationale used for claims 1 is applied. Claim 10 is rejected based upon similar rationale as Claim 2. Claim 11 is rejected based upon similar rationale as Claim 2. Claim 12 is rejected based upon similar rationale as Claim 7. As to Claim 13, Xiao in view of Avrahami teaches The system of claim 9, wherein comparing the prompt noise representation and a combined concept noise representation from the first concept noise representation, and the second concept noise representation comprises utilizing a loss function to determine a concept-prompt noise representation measure of loss to backpropagate through one or more denoising steps of the training diffusion neural network (Xiao discloses “denoising loss (Figure 3)” at p. 5 and cross-attention localization loss under section 5.4 Ablation Study. Avrahami discloses “The training computing system 150 can include a model trainer 160 that trains the machine-learned models 120 and/or 140 stored at the user computing device 102 and/or the server computing system 130 using various training or learning techniques, such as, for example. backwards propagation of errors. For example, a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function)” in [0075], see also [0076].) As to Claim 14, Xiao in view of Avrahami teaches The system of claim 9, wherein training the training diffusion neural network further comprise: applying a stop gradient operation to the additional denoising step of the static diffusion neural network utilized to generate the first concept noise representation and the second concept noise representation, wherein the stop gradient operation controls a gradient flow by stopping a concept-prompt noise representation measure of loss from being backpropagated to more than the additional denoising step selected from a plurality of denoising steps (Avrahami discloses “The update could be performed using gradient-based optimization algorithms such as stochastic gradient descent (SGD), RMSprop, or Adam” in [0056]; “For example, a loss function can be backpropagated through the model(s) to update one or more parameters of the model(s) (e.g., based on a gradient of the loss function)” in [0075].) Claim 15 is rejected based upon similar rationale as Claims 1 & 8. Claim 16 recites similar limitations as claim 1 but in a computer-readable medium form. Therefore, the same rationale used for claim 1 is applied. Claim 17 is rejected based upon similar rationale as Claim 3. Claim 18 is rejected based upon similar rationale as Claim 3. Claim 19 is rejected based upon similar rationale as Claims 4 & 5. Claim 20 is rejected based upon similar rationale as Claim 15. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIMING HE whose telephone number is (571)270-1221. The examiner can normally be reached on Monday-Friday, 8:30am-5:00pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tammy Goddard can be reached on 571-272-7773. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /WEIMING HE/ Primary Examiner, Art Unit 2611
Read full office action

Prosecution Timeline

Aug 29, 2023
Application Filed
Aug 01, 2025
Non-Final Rejection — §103
Sep 02, 2025
Interview Requested
Sep 09, 2025
Examiner Interview Summary
Sep 09, 2025
Applicant Interview (Telephonic)
Sep 19, 2025
Response Filed
Oct 14, 2025
Final Rejection — §103
Dec 17, 2025
Request for Continued Examination
Jan 15, 2026
Response after Non-Final Action
Feb 11, 2026
Non-Final Rejection — §103
Mar 20, 2026
Interview Requested
Apr 01, 2026
Applicant Interview (Telephonic)
Apr 05, 2026
Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12567135
MULTIMEDIA PLAYBACK MONITORING SYSTEM AND METHOD, AND ELECTRONIC APPARATUS
2y 5m to grant Granted Mar 03, 2026
Patent 12561876
System and method for an audio-visual avatar creation
2y 5m to grant Granted Feb 24, 2026
Patent 12514672
System, Method And Software Program For Aiding In Positioning Of Objects In A Surgical Environment
2y 5m to grant Granted Jan 06, 2026
Patent 12494003
AUTOMATIC LAYER FLATTENING WITH REAL-TIME VISUAL DEPICTION
2y 5m to grant Granted Dec 09, 2025
Patent 12468949
SYSTEMS AND METHODS FOR FEW-SHOT TRANSFER LEARNING
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
46%
Grant Probability
60%
With Interview (+13.8%)
3y 4m
Median Time to Grant
High
PTA Risk
Based on 410 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month