Last updated: April 19, 2026
Application No. 18/541,377
TEXT EDITING OF DIGITAL IMAGES

Non-Final OA §101§103
Filed
Dec 15, 2023
Examiner
KY, KEVIN
Art Unit
2671
Tech Center
2600 — Communications
Assignee
Adobe Inc.
OA Round
1 (Non-Final)
Interview Optional

— +25.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 549 resolved cases, 2023–2026
Examiner Intelligence

KY, KEVIN View full profile →
Grants 76% — above average
Career Allow Rate
420 granted / 549 resolved
+14.5% vs TC avg
Strong +25% interview lift
Without
With
+25.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
33 currently pending
Career history
582
Total Applications
across all art units
Statute-Specific Performance

§101
17.6%
-22.4% vs TC avg
§103
46.5%
+6.5% vs TC avg
§102
20.8%
-19.2% vs TC avg
§112
9.9%
-30.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 549 resolved cases
Office Action

§101 §103
DETAILED ACTION
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-16 and 18-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bahng et al (NPL: Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation) in view of Li et al (NPL: ManiGAN: Text-Guided Image Manipulation).
Regarding claim 1, Bahng discloses a method for digital image text editing (abstract) comprising:
generating, by a processing device, a color profile by a model based on a text user input, the model including a generator trained using training digital images and training text as inputs (abstract: This paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette. In  contrast to existing approaches, our model can understand rich text, whether it is a single word, a phrase, or a sentence, and generate multiple possible palettes from it; Our proposed model called Text2Colors consists of two conditional generative adversarial networks: the text-to-palette generation networks and the palette-based colorization networks; hardware (e.g. processor or CPU) is inherently needed to implement Text2Colors model; see further Fig. 4 for Text2Colors architecture);
generating, by the processing device, a color gradient based on a feature representation that includes a color gradient based on the color profile (abstract: The former captures the semantics of the text input and produce relevant color palettes; pg. 2 Fig. 1  Colorization results of Text2Colors given text inputs. The text input is shown above the input grayscale image, and the generated palettes are on the right of the grayscale image);
Bahng fails to teach where Li teaches segmenting, by the processing device, a digital object from a digital image, the digital object identified in the text user input (abstract: The ACM selects image regions relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation; pg. 3 3.1. Architecture: The proposed ACM further combines h with the original image features v in order to effectively select image regions corresponding to the given text, and then correlate those regions with text information for accurate manipulation); 
editing, by the processing device, the segmented digital object in the digital image based on the color gradient (pg. 1 1. Introduction: To achieve effective image manipulation guided by text descriptions, the key is to exploit both text and image crossmodality information, generating new attributes matching the given text and also preserving text-irrelevant contents of the original image; we propose a novel generative adversarial network for text-guided image manipulation (ManiGAN), which can generate high-quality new attributes matching the given text, and at the same time effectively reconstruct text-irrelevant contents of the original image; see Fig. 1 e.g. Given an original image that needs to be edited and a text provided by a user describing desired attributes, the goal is to edit parts of the image according to the given text while preserving text-irrelevant contents; for example A bird with black eye rings and a black bill, with a red crown and a red belly.); and
presenting, by the processing device, the digital image including the edited segmented digital object (Fig. 1 showing the result of the edited image; e.g. A bird with black eye rings and a black bill, with a red crown and a red belly).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of segmenting, by the processing device, a digital object from a digital image, the digital object identified in the text user input, editing, by the processing device, the segmented digital object in the digital image based on the color gradient, and presenting, by the processing device, the digital image including the edited segmented digital object  from Li into the method as disclosed by Bahng. The motivation for doing this is to improve methods to semantically edit parts of an image to match a given text that describes desired attributes.

Regarding claim 2, the combination of Bahng and Li discloses the method as described in claim 1, further comprising collecting the training digital images based on an image search performed using the training text (Bahng pg. 3 Introduction: We introduce our manually curated dataset called Palette-and-Text (PAT), which includes 10,183 pairs of a multi-word text and a multi-color palette; pg. 4 3 Palette-and-Text (PAT) Dataset: This section introduces our manually curated dataset named Palette-and-Text (PAT). PAT contains 10,183 text and five-color palette pairs, where the set of five colors in a palette is associated with its corresponding text description as shown in Figs. 3(b)-(d). Words vary with respect to their relationships with colors; some words are direct color words (e.g., pink, blue, etc.) while others evoke a particular set of colors (e.g., autumn or vibrant). To the best of our knowledge, there has been no dataset that matches a multi-word text and its corresponding 5-color palette. This dataset allows us to train our models for predicting semantically consistent color palettes with textual inputs.).

Regarding claim 3, the combination of Bahng and Li discloses the method as described in claim 1, wherein the editing the segmented digital object includes applying a texture and one or more colors based on the color gradient to the segmented digital object by a texture machine learning model (Li abstract: The goal of our paper is to semantically edit parts of an image to match a given text that describes desired attributes (e.g., texture, colour, and background), while preserving other contents that are irrelevant to the text; Fig. 1 e.g. A bird with black eye rings and a black bill, with a red crown and a red belly edits both a color and texture). The motivation to combine the references is discussed above in the rejection for claim 1.
	
Regarding claim 4, the combination of Bahng and Li discloses the method as described in claim 3, wherein the texture machine learning model is a texture generative adversarial network trained in a first stage to perform digital image editing (Li pg. 1 Introduction: besides modifying the required attributes, both models [7, 24] also change the texture of the bird (first row) and the structure of the scene (second row); pg. 5 3.4 Training: To train the network, we follow [18] and adopt adversarial training, where our network and the discriminators (D1, D2, D3, DDCM) are alternatively optimised.) and a second stage to fine tune the texture machine learning model to reproduce and propagate textures (Li pg. 4 3.3. Detail Correction Module: To further enhance the details and complete missing contents in the synthetic image, we propose a detail correction module (DCM), exploiting word-level text information and fine-grained image features). The motivation to combine the references is discussed above in the rejection for claim 1.

Regarding claim 5, the combination of Bahng and Li discloses the method as described in claim 1, wherein the text user input describes a visual object and a visual attribute, the visual object specifying a visual context of the visual attribute (Li Fig. 1 e.g. user given text is “A bird with black eye rings and a black bill, with a red crown and a red belly”). The motivation to combine the references is discussed above in the rejection for claim 1.

Regarding claim 6, the combination of Bahng and Li discloses the method as described in claim 5, wherein the visual object references a physical object and the visual attribute describes an appearance of the physical object (Li Fig. 1 e.g. user given text is “A bird with black eye rings and a black bill, with a red crown and a red belly”). The motivation to combine the references is discussed above in the rejection for claim 1.

Regarding claim 7, the combination of Bahng and Li discloses the method as described in claim 1, wherein: the generator is a generator of a generative adversarial network (Bahng abstract: Our proposed model called Text2Colors consists of two conditional generative adversarial networks: the text-topalette generation networks and the palette-based colorization networks) that receives as an input the training text as part of training (Bahng pg. 6 4 Text2Colors: Text-Driven Colorization: Text2Colors consists of two networks: Text-to-Palette Generation Networks (TPN) and Palette-based Colorization Networks (PCN). We train the first networks to generate color palettes given a multi-word text); and 
the generator is trained using a discriminator as part of the generative adversarial network, the discriminator is configured to receive as an input the training text, image features extracted from the training digital images using machine learning, and a candidate color profile generated by the generator as part of the training of the generator (Bahng pg. 8 Discriminator. For the discriminator D0, the conditioning variable ¯c and the color palette are concatenated and fed into a series of fully-connected layers. By jointly learning features across the encoded text and palette, the discriminator classifies whether the palettes are real or fake).

Regarding claim 8, the combination of Bahng and Li discloses the method as described in claim 1, wherein the color profile includes a color histogram representation (Bahng Fig. 3 Palette-and-Text (PAT) dataset).

Regarding claim 9, the combination of Bahng and Li discloses the method as described in claim 1, wherein the feature representation further includes one or more of a texture, contrast, lighting, or luminance based on the text user input (Li Fig. 1 e.g. user given text is “A bird with black eye rings and a black bill, with a red crown and a red belly”). The motivation to combine the references is discussed above in the rejection for claim 1.

Regarding claim 10, Bahng discloses a system for digital image text editing (abstract) comprising:
collecting training digital images based on an image search performed using training text (pg. 3 Introduction: We introduce our manually curated dataset called Palette-and-Text (PAT), which includes 10,183 pairs of a multi-word text and a multi-color palette; pg. 4 3 Palette-and-Text (PAT) Dataset: This section introduces our manually curated dataset named Palette-and-Text (PAT). PAT contains 10,183 text and five-color palette pairs, where the set of five colors in a palette is associated with its corresponding text description as shown in Figs. 3(b)-(d). Words vary with respect to their relationships with colors; some words are direct color words (e.g., pink, blue, etc.) while others evoke a particular set of colors (e.g., autumn or vibrant). To the best of our knowledge, there has been no dataset that matches a multi-word text and its corresponding 5-color palette. This dataset allows us to train our models for predicting semantically consistent color palettes with textual inputs.);
generating a feature representation including a color profile by a model based on a text user input, the model including a generator trained using the training text and training feature representations generated from the training digital images (abstract: This paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette. In  zontrast to existing approaches, our model can understand rich text, whether it is a single word, a phrase, or a sentence, and generate multiple possible palettes from it; Our proposed model called Text2Colors consists of two conditional generative adversarial networks: the text-to-palette generation networks and the palette-based colorization networks; see further Fig. 4 for Text2Colors architecture); 
Bahng fails to teach where Li teaches a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising (pg. 1 Introduction: e.g. applications in video games, image editing, and computer-aided design; e.g. computers inherently have memory and processing components):
editing a digital object in a digital image to have colors based on the feature representation (pg. 1 1. Introduction: To achieve effective image manipulation guided by text descriptions, the key is to exploit both text and image crossmodality information, generating new attributes matching the given text and also preserving text-irrelevant contents of the original image; we propose a novel generative adversarial network for text-guided image manipulation (ManiGAN), which can generate high-quality new attributes matching the given text, and at the same time effectively reconstruct text-irrelevant contents of the original image; see Fig. 1 e.g. Given an original image that needs to be edited and a text provided by a user describing desired attributes, the goal is to edit parts of the image according to the given text while preserving text-irrelevant contents; for example A bird with black eye rings and a black bill, with a red crown and a red belly.); and
presenting the digital image with the edited digital object by the processing device (Fig. 1 showing the result of the edited image; e.g. A bird with black eye rings and a black bill, with a red crown and a red belly).
Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of editing a digital object in a digital image to have colors based on the feature representation and presenting the digital image with the edited digital object by the processing device from Li into the system as disclosed by Bahng. The motivation for doing this is to improve systems to semantically edit parts of an image to match a given text that describes desired attributes.

Regarding claim 11, the combination of Bahng and Li discloses the system as described in claim 10, wherein the text user input specifies a visual object that references a physical object and a visual attribute that describes an appearance of the physical object (Li Fig. 1 e.g. user given text is “A bird with black eye rings and a black bill, with a red crown and a red belly”). The motivation to combine the references is discussed above in the rejection for claim 10.

Regarding claim 12, the combination of Bahng and Li discloses the system as described in claim 10, wherein collecting the training digital images includes identifying salient portions of the training digital images that are salient to the training text and the training feature representations are generated based on the salient portions of the training digital images (Li abstract: The ACM selects image regions
relevant to the given text and then correlates the regions with corresponding semantic words for effective manipulation; pg. 5 Training: Moreover, to prevent the model from learning an identity mapping and to promote the model to learn a good (S → I) mapping in the regions relevant to the given text, we propose the following training schemes. Firstly, we introduce a regularisation term Lreg as Eq. (2) in the generator objective to produce a penalty if the generated image becomes the same as the input image. Secondly, we choose to early stop the training when the model achieves the best trade-off between the generation of new visual attributes aligned with the given text descriptions and the reconstruction of text irrelevant contents existing in the original images). The motivation to combine the references is discussed above in the rejection for claim 10.

Regarding claim 13, the combination of Bahng and Li discloses the system as described in claim 12, wherein a convolutional neural network based classification model is used to identify the salient portions of the training digital images using visual attention to focus on parts of the training digital images (Li pg. 3 3.1. Architecture: For each stage, the text features are refined with several convolutional layers to produce hidden features h. The proposed ACM further combines h with the original image features v in order to effectively select image regions corresponding to the given text, and then correlate those regions with text information for accurate manipulation). The motivation to combine the references is discussed above in the rejection for claim 10.

Regarding claim 14, the combination of Bahng and Li discloses the system as described in claim 10, wherein editing the digital object includes using an additional generator trained as part of a texture generative adversarial network to apply a texture and colors from the feature  representation within an outline of the digital object as segmented within the digital image (Li pg. 5 Implementation: There are three stages in the main module, and each stage contains a generator and a discriminator; pg. 5 3.4 Training: To train the network, we follow [18] and adopt adversarial training, where our network and the discriminators (D1, D2, D3, DDCM) are alternatively optimised). The motivation to combine the references is discussed above in the rejection for claim 10.

Regarding claim 15, the combination of Bahng and Li discloses the system as described in claim 10, wherein: 
the generator is a generator of a generative adversarial network (Bahng abstract: Our proposed model called Text2Colors consists of two conditional generative adversarial networks: the text-topalette generation networks and the palette-based colorization networks) that receives as an input the training text as part of training (Bahng pg. 6 4 Text2Colors: Text-Driven Colorization: Text2Colors consists of two networks: Text-to-Palette Generation Networks (TPN) and Palette-based Colorization Networks (PCN). We train the first networks to generate color palettes given a multi-word text); and
the generator is trained using a discriminator as part of the generative adversarial network, the discriminator is configured to receive as an input the training text, image features extracted from the training digital images using machine learning, and a candidate feature representation generated by the generator as part of the training of the generator (Bahng pg. 8 Discriminator. For the discriminator D0, the conditioning variable ¯c and the color palette are concatenated and fed into a series of fully-connected layers. By jointly learning features across the encoded text and palette, the discriminator classifies whether the palettes are real or fake).

Regarding claim(s) 16 and 18-21 (drawn to a CRM):               
The rejection/proposed combination of Bahng and Li, explained in the rejection of method claim(s) 1-3, 8 and 7, respectively, anticipates/renders obvious the steps of the computer readable medium of claim(s) 16 and 18-21 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) ) 1-3, 8 and 7 is/are equally applicable to claim(s) 16 and 18-21.

Response to Amendment
The applicant’s supplemental amendment filed 06/04/2025 has been considered and entered.
Bahng discloses a method for digital image text editing comprising: generating, by a processing device, a color profile by a model based on a text user input, the model including a generator trained using training digital images and training text as inputs in the abstract, where “this paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette. In contrast to existing approaches, our model can understand rich text, whether it is a single word, a phrase, or a sentence, and generate multiple possible palettes from it; Our proposed model called Text2Colors consists of two conditional generative adversarial networks: the text-to-palette generation networks and the palette-based colorization networks.” See further Fig. 4 for Text2Colors architecture, which wherein “During training, generator G0 learns to produce a color palette ˆy given a set of conditional variables ˆc processed from input text x = {x1, · · · , xT }. Generator G1 learns to predict a colorized output of a grayscale image L given a palette p extracted from the ground truth image. At test time, the trained generators G0 and G1 are used to produce a color palette from given text and then colorize a grayscale image reflecting the generated palette.” 
The supplemental amendments do not present a rejection under 35 U.S.C. 101 because when considered as a whole, the claims integrate any alleged abstract idea into a practical application by reciting specific technological process for modifying digital images and presenting them in a manner that cannot be performed by a human and improves computer-based image processing.
During the interview on 06/03/2025, the examiner proposed amendments that would place the case in condition for allowance. The proposed amendments include "generating, by a processing device, a color profile by a model based on a text user input, the model including a generator trained using adversarial training against a discriminator training digital images and training text as inputs" and an presenting step in all of the independent claims. However, at this time, the applicant did not want to incorporate the limitation of a generator trained using adversarial training against a discriminator.
The previously proposed amendments, which was not accepted by the applicant, no longer places the case in condition for allowance. Bahng et al teaches “a generator trained using adversarial training against a discriminator” in the abstract: “This paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette. In  contrast to existing approaches, our model can understand rich text, whether it is a single word, a phrase, or a sentence, and generate multiple possible palettes from it; Our proposed model called Text2Colors consists of two conditional generative adversarial networks: the text-to-palette generation networks and the palette-based colorization networks”, and further on pg. 9 4.2 Palette-based Colorization Networks (PCN) “the generator learns to be close to the ground truth image with plausible colorizations, while incorporating palette colors to the output image to fool the discriminator”.
 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648. The examiner can normally be reached Monday-Friday 9-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KEVIN KY/Primary Examiner, Art Unit 2671
Read full office action
Prosecution Timeline

Dec 15, 2023
Application Filed
Jun 03, 2025
Examiner Interview (Telephonic)
Jun 08, 2025
Examiner Interview Summary
Jan 05, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/676,432
Patent 12597158
POSE ESTIMATION
2y 5m to grant Granted Apr 07, 2026
18/814,687
Patent 12597291
IMAGE ANALYSIS FOR PERSONAL INTERACTION
2y 5m to grant Granted Apr 07, 2026
18/222,090
Patent 12586393
KNOWLEDGE-DRIVEN SCENE PRIORS FOR SEMANTIC AUDIO-VISUAL EMBODIED NAVIGATION
2y 5m to grant Granted Mar 24, 2026
18/570,168
Patent 12586559
METHOD AND APPARATUS FOR GENERATING SPEECH OUTPUTS IN A VEHICLE
2y 5m to grant Granted Mar 24, 2026
19/080,452
Patent 12579382
NATURAL LANGUAGE GENERATION USING KNOWLEDGE GRAPH INCORPORATING TEXTUAL SUMMARIES
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+25.3%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 549 resolved cases by this examiner. Grant probability derived from career allow rate.