Prosecution Insights
Last updated: April 19, 2026
Application No. 18/404,295

DISCOVERING AND MITIGATING BIASES IN LARGE PRE-TRAINED MULTIMODAL BASED IMAGE EDITING

Final Rejection §103
Filed
Jan 04, 2024
Examiner
CHEN, YU
Art Unit
2613
Tech Center
2600 — Communications
Assignee
Adobe Inc.
OA Round
2 (Final)
68%
Grant Probability
Favorable
3-4
OA Rounds
2y 10m
To Grant
98%
With Interview

Examiner Intelligence

Grants 68% — above average
68%
Career Allow Rate
711 granted / 1052 resolved
+5.6% vs TC avg
Strong +30% interview lift
Without
With
+29.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
110 currently pending
Career history
1162
Total Applications
across all art units

Statute-Specific Performance

§101
2.2%
-37.8% vs TC avg
§103
43.9%
+3.9% vs TC avg
§102
27.0%
-13.0% vs TC avg
§112
20.7%
-19.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1052 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION Response to Amendment This is in response to applicant’s amendment/response filed on 12/04/2025, which has been entered and made of record. Claims 1, 9-10, 15 have been amended. No claim has been cancelled. No claim has been added. Claims 1-20 are pending in the application. Response to Arguments Applicant’s arguments on 12/04/2025 have been fully considered but are moot because the arguments do not apply to any of the references being used in the current rejection. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1-5, 7-8, 10-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Pub 2024/0161369 A1) in view of Miller et al. (US Pub 2025/0139160 A1), further in view of Yin et al. (US Pub 2024/0290054 A1). As to claim 1, Li discloses a method comprising: obtaining a text prompt and an input image depicting a Li, Abstract, “a system receives, via a data interface, an image containing a subject, a text description of the subject in the image, and a text prompt relating to a different rendition of the subject.” Fig. 7, ¶0089.); generating a latent code based on the text prompt and the input image (Li, ¶0020, “a multimodal encoder may be trained to generate a latent representation of an input image and associated text input.” ¶0076, “Encoder 604 may encode input 602 into a latent representation (e.g., a vector) which represents the image.” ¶0078, “Latent vector representation z0 606a represents the first encoded latent representation of input 602.” ¶0079, “the noisy latent representation may be repeatedly and progressively fed into denoising model 612 to gradually remove noise from the latent representation vector based on the conditioning input 610”), an identity preserving loss (¶0039, “subject-driven image model 130 may be integrated with an image editing model which edits an original image with subject-specific visuals. To edit an image, a subject may be identified for replacement in the original image (e.g., “dog”). Next, cross-attention maps from the original generation are used while generating new attention maps for the inserted subject embeddings. Denoising latents are mixed at each step based on the extracted editing mask. Namely, latents of the unedited regions are from the original generation whereas latents of the edited regions are from the subject-driven generation. In this way, an edited image may be generated with subject-specific visuals while also preserving the unedited regions.” ¶0081, “the latent representation 606a of input 602 may be compared with the denoised latent representation 618a to compute a loss for training. In another embodiment, a loss objective may be computed comparing the noise actually added (e.g., by noise ε 608) with the noise predicted by denoising model εθ 612. Denoising model εθ 612 may be trained based on this loss objective (e.g., parameters of denoising model εθ 612 may be updated in order to minimize the loss by gradient descent using backpropagation).”); and generating, using an image generator of a machine learning model, a synthetic image based on the latent code, wherein the synthetic image includes an element of the text prompt and preserves an identity of the (¶0085, “The resulting denoised latent image representation after T denoising model steps may be decoded by a decoder (e.g., decoder 614) to produce an output 616 of a denoised image. For example, conditioning input may include a description of an image, and the output 616 may be an image which is aligned with that description.” Fig. 7, ¶0092-0094.). Li does not explicitly disclose a person. Miller teaches identity of a person (Miller, ¶0019, “a deepfake of a real person” ¶0082, “an image of a real person”. ¶0115, “image of a real person may be animated” ¶0138, “generate an avatar or an image of a real person”). Li and Miller are considered to be analogous art because all pertain to image generation. It would have been obvious before the effective filing date of the claimed invention to have modified Li with the features of “identity of a person” as taught by Miller. The suggestion/motivation would have been in order to generate image or video of a real person (Miller, ¶0019). Li does not explicitly disclose “by iteratively computing an identity preserving loss and applying the identity preserving loss to the latent code”. Yin teaches by iteratively computing an identity preserving loss and applying the identity preserving loss to the latent code (Yin, Fig. 3, ¶0023, “The environment may stylize both the geometry and texture of the initial image input according to text provided by the user while maintaining the identity of the original shape.” ¶0024, “The object generation environment 102 is used to constrain a CLIP-driven stylization to a valid domain using the ccGAN 104 to represent the identity of a 3D object in an intermediate latent space (w).” ¶0032, “FIG. 3 illustrates an environment 300 representing the 3D stylization module 106 along with a generative neural network (e.g., ccGAN 104) to generate one or more losses that may be used to generate one or more images based on the image input 110 and the text input 112.” “incorporate the stylization generator in order to generate one or more outputs for evaluation by the discriminator in order to determine a loss or cost.” ¶0038, “various embodiments generate a loss or cost associated with the output 322. In this example, losses may be generated with respect to content, CLIP, and style.” ¶0048, “Using the stylized GAN, a set of target multi-view renderings may be generated 610 along with a set of generated multi-view renderings from the object generation system 612. These sets of renderings may then be compared and evaluated for differences or losses, which may be used to develop a loss for the object generation system 614. This loss, or losses, may then be used to refine the object generation system 616.”) Li, Miller and Yin are considered to be analogous art because all pertain to image generation. It would have been obvious before the effective filing date of the claimed invention to have modified Li with the features of “iteratively computing an identity preserving loss and applying the identity preserving loss to the latent code” as taught by Yin. The suggestion/motivation would have been in order to identify features of the input object and then tuned in accordance with the textual input to generate a modified 3D object that includes a new texture along with one or more geometric adjustments (Yin, abstract). As to claim 2, claim 1 is incorporated and the combination of Li, Miller and Yin discloses generating the latent code comprises: generating, using an image encoder of the machine learning model, a preliminary latent code based on the input image and the text prompt (Li, Fig .2-3, ¶0079, “Input to denoising model εθ 612 may include a noisy latent representation (e.g., noised latent representation zT 606t), and conditioning input 610 such as a text prompt describing desired content of an output image, e.g., “a hand holding a globe.” As shown, the noisy latent representation may be repeatedly and progressively fed into denoising model 612 to gradually remove noise from the latent representation vector based on the conditioning input 610”); generating, using the image generator of the machine learning model, a preliminary image based on the preliminary latent code (Li, ¶0080, “Ideally, the progressive outputs of repeated denoising models εθ 612 z′T 618t to z′0 618a may be an incrementally denoised version of the input latent representation z′T 618t, as conditioned by a conditioning input 610. The latent image representation produced using denoising model εθ 612 may be decoded using decoder 614 to provide an output 616 which is the denoised image.”); and computing the identity preserving loss based on the input image and the preliminary image (Li, ¶0081, “the output image 616 is then compared with the input training image 602 to compute a loss for updating the denoising model 612 via back propagation. In another embodiment, the latent representation 606a of input 602 may be compared with the denoised latent representation 618a to compute a loss for training. In another embodiment, a loss objective may be computed comparing the noise actually added (e.g., by noise ε 608) with the noise predicted by denoising model εθ 612. Denoising model εθ 612 may be trained based on this loss objective (e.g., parameters of denoising model εθ 612 may be updated in order to minimize the loss by gradient descent using backpropagation). Note that this means during the training process of denoising model εθ 612, an actual denoised image does not necessarily need to be produced (e.g., output 616 of decoder 614), as the loss is based on each intermediate noise estimation, not necessarily the final image.”). As to claim 3, claim 2 is incorporated and the combination of Li, Miller and Yin discloses iteratively generating preliminary images and computing the identity preserving loss to optimize the latent code (Li, Fig .2, ¶0056, “compute a loss function that measures the discrepancy between the predicted output and the expected output. For example, the loss function may be a cross entropy loss. Given the loss, the negative gradient of the loss function is computed with respect to each weight of each layer individually.” ¶0057, “Parameters of the neural network are updated backwardly from the last layer to the input layer (backpropagating) based on the computed negative gradient using an optimization algorithm to minimize the loss. The backpropagation from the last layer 443 to the input layer 441 may be conducted for a number of training samples in a number of iterative training epochs. In this way, parameters of the neural network may be gradually updated in a direction to result in a lesser or minimized loss, indicating the neural network has been trained to generate a predicted output value closer to the target output value with improved prediction accuracy.”). As to claim 4, claim 2 is incorporated and the combination of Li, Miller and Yin discloses computing a multi-modal loss based on the preliminary image and the text prompt, wherein the latent code is optimized based on the multi-modal loss (Li, ¶0034, “The subject-driven image generation model may be provided an input subject image 102 including a subject, ad subject text 112, and a text prompt 118 to generate an output image 124. The output image may be compared to the ground-truth subject image (e.g., modified image 204) by loss computation 206. The loss computed by loss computation 206 may be used to update parameters of subject-driven image model 130 via backpropagation 208. In some embodiments, backpropagation 208 may update parameters of multimodal encoder 108, queries 110, text encoder 120, and/or image model 122. Loss computation 206 may include, for example, a cross entropy loss function. The subject representation learning stage is not specific to a certain subject, and is performed using images of a variety of subjects.” Miller, ¶0028, “Various loss functions are used, including pixel-wise differences, perceptual losses, and adversarial losses.” ¶0043). As to claim 5, claim 1 is incorporated and the combination of Li, Miller and Yin discloses computing a perceptual loss based on the preliminary image and the input image, wherein the latent code is optimized based on the perceptual loss (Miller, ¶0028, “Various loss functions are used, including pixel-wise differences, perceptual losses, and adversarial losses.”). As to claim 7, claim 1 is incorporated and the combination of Li, Miller and Yin discloses generating the latent code comprises: encoding, using a text encoder, the text prompt to obtain a text encoding, wherein the latent code is generated based on the text encoding (Li, ¶0027, “Input subject image 102 may be encoded by an image encoder 104 into an image feature vector. Image encoder 104 may be a pretrained image encoder which extracts generic image features. Subject text 112 may be encoded by text encoder 106 into a text feature vector. The image feature vector and text feature vector may be input to multimodal encoder 108.”). As to claim 8, claim 1 is incorporated and the combination of Li, Miller and Yin discloses the identity preserving loss is based on the input image and a preliminary image generated by the machine learning model (Yin, abstract, “An input including a 3D mesh and texture may be provided to a trained system along with a textual input that includes parameters for object generation. Features of the input object may be identified and then tuned in accordance with the textual input to generate a modified 3D object that includes a new texture along with one or more geometric adjustments.” ¶0015, “a pretrained language-vision model, such as contrastive language-image pre-training (CLIP), produces a joint embedding of image and text. Using CLIP, systems and methods may render result into images, obtain embeddings of the rendered images, and try to match the embedding of the input text. The system may be improved by evaluating different costs or losses, where the costs or losses may be tuned to provide more preference to the input 3D shape or to the text input. Costs or losses may include style costs or losses, content costs or losses, or CLIP costs or losses, as examples.” ¶0024, “The object generation environment 102 is used to constrain a CLIP-driven stylization to a valid domain using the ccGAN 104 to represent the identity of a 3D object in an intermediate latent space (w).” ¶0032, “incorporate the stylization generator in order to generate one or more outputs for evaluation by the discriminator in order to determine a loss or cost.” ¶0038, “losses may be generated with respect to content, CLIP, and style.” ¶0048, “These sets of renderings may then be compared and evaluated for differences or losses, which may be used to develop a loss for the object generation system 614. This loss, or losses, may then be used to refine the object generation system 616.”). As to claim 10, the combination of Li and Miller discloses an apparatus comprising: at least one processor; at least one memory storing instructions executable by the at least one processor; and a machine learning model comprising parameters stored in the at least one memory and trained to generate a synthetic image, wherein the machine learning model comprises an image encoder configured to generate a latent code based on a text prompt and an input image depicting a person, an optimization component configured to optimize the latent code by iteratively computing an identity preserving loss and applying the identity preserving loss to the latent code, and an image generator configured to generate the synthetic image including an element of the text prompt and preserving an identity of the person in the input image (See claim 1 for detailed analysis.). As to claim 11, claim 10 is incorporated and the combination of Li and Miller discloses the machine learning model comprises a text encoder configured to encode the text prompt (Li, ¶0027, “Input subject image 102 may be encoded by an image encoder 104 into an image feature vector. Image encoder 104 may be a pretrained image encoder which extracts generic image features. Subject text 112 may be encoded by text encoder 106 into a text feature vector. The image feature vector and text feature vector may be input to multimodal encoder 108.”). As to claim 12, claim 10 is incorporated and the combination of Li and Miller discloses the machine learning model comprises a generative adversarial network (GAN) (Miller, ¶0024, ¶0189, “a Latent Diffusion Model; a Conditional Variational Autoencoder; a Generative Adversarial Network”). As to claim 13, claim 10 is incorporated and the combination of Li and Miller discloses the machine learning model comprises a variational autoencoder (VAE) (Miller, ¶0024, ¶0189, “a Latent Diffusion Model; a Conditional Variational Autoencoder; a Generative Adversarial Network”). As to claim 14, claim 10 is incorporated and the combination of Li and Miller discloses the machine learning model comprises a diffusion model (Li, ¶0012, “framework for a denoising diffusion model for generating or editing an image given a conditioning input such as a text prompt according to some embodiments” ¶0019, “The subject-driven image generation model may be built on a generic base image generation model, such as a denoising diffusion model, which generates an image based on an input prompt.”). As to claim 15, the combination of Li and Miller discloses a non-transitory computer readable medium storing code for image processing, the code comprising instructions executable by at least one processor to: generate, using an image generator of a machine learning model, a preliminary image based on a preliminary latent code, wherein the preliminary latent code is based on an input image; optimize the preliminary latent code by iteratively computing an identity preserving loss and applying the identity preserving loss to obtain an optimized latent code, wherein the optimized latent code preserves an identity of a person in the input image; and generate, using the image generator of the machine learning model, a synthetic image based on the optimized latent code (See claim 1 for detailed analysis.). As to claim 16, claim 15 is incorporated and the combination of Li and Miller discloses the code further comprising instructions executable by the at least one processor to: iteratively generate preliminary images and computing the identity preserving loss to update the optimized latent code (See claim 3 for detailed analysis.). As to claim 17, claim 15 is incorporated and the combination of Li and Miller discloses the code further comprising instructions executable by the at least one processor to: compute a multi-modal loss based on the preliminary image and the text prompt, wherein the optimized latent code is optimized based on the multi-modal loss (See claim 4 for detailed analysis.). As to claim 18, claim 15 is incorporated and the combination of Li and Miller discloses the code further comprising instructions executable by the at least one processor to: compute a perceptual loss based on the preliminary image and the input image, wherein the optimized latent code is optimized based on the perceptual loss (See claim 5 for detailed analysis.). As to claim 19, claim 15 is incorporated and the combination of Li and Miller discloses the code further comprising instructions executable by the at least one processor to: encode, using a text encoder, the text prompt to obtain a text encoding, wherein the optimized latent code is generated based on the text encoding (See claim 7 for detailed analysis.). As to claim 20, claim 15 is incorporated and the combination of Li and Miller discloses the preliminary latent code is generated by an image encoder based on a text prompt and the input image (Li, Fig. 2-Fig. 3, ¶0027, “Input subject image 102 may be encoded by an image encoder 104 into an image feature vector. Image encoder 104 may be a pretrained image encoder which extracts generic image features. Subject text 112 may be encoded by text encoder 106 into a text feature vector. The image feature vector and text feature vector may be input to multimodal encoder 108.”). Claims 6 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Pub 2024/0161369 A1) in view of Miller et al. (US Pub 2025/0139160 A1), Yin et al. (US Pub 2024/0290054 A1) and Balakrishnan, Guha, et al. ("Towards causal benchmarking of biasin face analysis algorithms." Deep Learning-Based Face Analytics. Cham: Springer International Publishing, 2021. 327-359.) As to claim 6, claim 2 is incorporated and the combination of Li and Miller does not disclose identifying a biased depiction of the person by calculating an attribute prediction score based on the input image and the preliminary image. Balakrishnan teaches identifying a biased depiction of the person by calculating an attribute prediction score based on the input image and the preliminary image (Balakrishnan, abstract, “measuring algorithmic bias of face analysis algorithms, which directly manipulates the attributes of interest, e.g., gender and skin tone, in order to reveal causal links between attribute variation and performance change.” Page 328, “Measuring biases, i.e., performance differences, across protected attributes such as age, sex, gender, and ethnicity, is particularly important for decisions that may affect peoples’ lives.”. Page 328, “Algorithmic bias is measured for two reasons”. Fig. 15.2, “Attribute-specific bias measurements are obtained by comparing the algorithm’s predictions with human annotations as the attributes are varied.” Page 347, 15.5.4 Analysis of Bias. Page 353-354.). Li, Miller and Balakrishnan are considered to be analogous art because all pertain to image generation. It would have been obvious before the effective filing date of the claimed invention to have modified Li with the features of “identifying a biased depiction of the person by calculating an attribute prediction score based on the input image and the preliminary image.” as taught by Balakrishnan. The suggestion/motivation would have been because measuring biases, i.e., performance differences, across protected attributes such as age, sex, gender, and ethnicity, is particularly important for decisions that may affect peoples’ lives (Balarishnan, Page 328.) As to claim 9, claim 1 is incorporated and the combination of Li and Miller discloses the synthetic image comprises an (See claim 1 for detailed analysis.). The combination of Li and Miller does not disclose unbiased. Balakrishnan teaches unbiased (Balakrishnan, Page 331, “The machine learning community is active in analyzing biases of learning models and how one may train models where bias is mitigated [32, 35–42], usually by ensuring that performance is equal across certain subgroups of a dataset.” Page 354, “The main reason for measuring algorithmic bias is to get rid of it. Error and bias measurements guide scientists and engineers towards effective corrective measures for improving the performance of their algorithms. It is instructive to view the different predictions of the two methods through this lens. The correlational study based on PPB (Fig. 15.1) may suggest that, in order to reduce biases in our classifiers, more images of dark-skinned women should be added to their training sets. The experimental method leads engineers in a different direction. First, more training images of long-haired men and short-haired women of all races are needed. Second, correcting age bias requires more training images in the child-teen and, possibly, senior age groups.” Li, Miller and Balakrishnan are considered to be analogous art because all pertain to image generation. It would have been obvious before the effective filing date of the claimed invention to have modified Li with the features of “unbiased” as taught by Balakrishnan. The suggestion/motivation would have been clearly better to start from unbiased synthesis methods (Balarishnan, Page 355.). Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to YU CHEN whose telephone number is (571)270-7951. The examiner can normally be reached on M-F 8-5 PST Mid-day flex. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /YU CHEN/ Primary Examiner, Art Unit 2613
Read full office action

Prosecution Timeline

Jan 04, 2024
Application Filed
Sep 02, 2025
Non-Final Rejection — §103
Dec 01, 2025
Examiner Interview Summary
Dec 01, 2025
Applicant Interview (Telephonic)
Dec 04, 2025
Response Filed
Dec 29, 2025
Final Rejection — §103
Mar 04, 2026
Examiner Interview Summary
Mar 04, 2026
Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12604497
THIN FILM TRANSISTOR AND ARRAY SUBSTRATE
2y 5m to grant Granted Apr 14, 2026
Patent 12597176
IMAGE GENERATOR AND METHOD OF IMAGE GENERATION
2y 5m to grant Granted Apr 07, 2026
Patent 12589481
TOOL ATTRIBUTE MANAGEMENT IN AUTOMATED TOOL CONTROL SYSTEMS
2y 5m to grant Granted Mar 31, 2026
Patent 12588347
DISPLAY DEVICE
2y 5m to grant Granted Mar 24, 2026
Patent 12586265
LINE DRAWING METHOD, LINE DRAWING APPARATUS, ELECTRONIC DEVICE, AND COMPUTER READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
68%
Grant Probability
98%
With Interview (+29.9%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 1052 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month