Prosecution Insights
Last updated: April 19, 2026
Application No. 18/171,046

SUPER-RESOLUTION ON TEXT-TO-IMAGE SYNTHESIS WITH GANS

Non-Final OA §103
Filed
Feb 17, 2023
Examiner
OMETZ, RACHEL ANNE
Art Unit
2668
Tech Center
2600 — Communications
Assignee
Adobe Inc.
OA Round
3 (Non-Final)
69%
Grant Probability
Favorable
3-4
OA Rounds
2y 11m
To Grant
99%
With Interview

Examiner Intelligence

Grants 69% — above average
69%
Career Allow Rate
18 granted / 26 resolved
+7.2% vs TC avg
Strong +30% interview lift
Without
With
+30.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
24 currently pending
Career history
50
Total Applications
across all art units

Statute-Specific Performance

§101
3.1%
-36.9% vs TC avg
§103
62.1%
+22.1% vs TC avg
§102
18.8%
-21.2% vs TC avg
§112
14.7%
-25.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 26 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 1st, 2025 has been entered. Claim Status Claims 1-13, 15, and 21-26 were pending for examination in the amendments filed for Application No. 18/171,046 filed August 20th, 2025. In the remarks and amendments received on December 1st, 2025, claims 1, 7, 9, and 21 are amended, claims 6 and 25 are cancelled, and no claims are added. Accordingly, claims 1-5, 7-13, 15, 21-24, and 26 are pending for examination in the application. Response to Arguments Applicant's arguments filed December 1st, 2025 have been fully considered but they are not persuasive. The examiner respectfully disagrees with applicant’s assertions that Zhang NPL in view of Lee does not teach “generating an adaptive convolution filter based on the style vector, wherein the adaptive convolution filter comprises a convolution matrix with parameters determined based on the style vector”. The applicant asserts (pgs. 9-11 of Applicant’s Remarks) that Lee does not teach this limitation because of applicant’s specific techniques of using the adaptive convolution filter, and that Lee’s filter is not adaptive in the same way as applicant’s, or is not adaptive at all. The examiner asserts that Lee does teach the above claimed limitations, in Paragraph 0007: “a method for generating an adaptive convolution filter for text classification performed by a server according to the first aspect of the present disclosure includes the steps of: generating text data by performing a preprocessing process on raw data collected through a network; generating a context vector through a filter generation model using the text data as input; and generating a convolution filter through the filter generation model based on the context vector”. In other words, Lee’s “context vector” provides context or parameters to generate a “convolution filter”, and this convolution filter is based on the “context vector”. This means a unique context vector will create an equally unique convolution filter with parameters based on the context vector, making it adaptive to the context vector. Therefore, Zhang NPL in view of Lee teaches the limitation. Applicant’s remarks on pgs. 12-14 of Applicant’s Remarks have been fully considered but are moot because the arguments do not apply to the new combination of references, that is, Zhang NPL in view of Lee and newly cited reference Suresha. Suresha teaches performing convolutions on image feature maps generated from low-resolution images in order to create high-resolution images, see later citations for more context. The examiner respectfully disagrees with applicant’s assertions (pgs. 14-16 of Applicant’s Remarks) that 1) Zhang NPL in view of Lee does not teach “generating a style vector representing the target style based on the text description”, and 2) that the examiner points to Zhang NPL’s “embedding φt” to teach “a text embedding”. The examiner asserts that this interpretation is incorrect and that applicant’s “a text embedding” corresponds to Zhang NPL’s “Text description t” instead. Further, examiner interprets “a target style” to mean, roughly, a computer-readable version of the text description (that is human-readable). A style vector contains the context needed (a “target style”) for a computer to generate an adaptive convolution filter based on the style vector. Due to Zhang NPL’s “embedding φt” which is based on Zhang NPL’s “text embedding t”, Zhang NPL teaches the claimed limitation. The examiner respectfully disagrees with applicant’s assertions (pgs. 16-18) that there is no motivation to combine Lee and Zhang NPL to perform at least the last two generating steps recited in claim 1. The examiner disagrees, because although Lee is indeed in the field of natural language processing (NLP), Lee is still relevant to the pertinent application because the pertinent application claims a computer interpreting language (“text”), that is, “generating a style vector representing the target style based on the text description”. Lee’s adaptive convolution filter does indeed not produce an image like in the pertinent application, but the general structure of Lee’s creation of the filter is analogous to applicant’s creation of an adaptive convolution filter, and in view of Zhang NPL’s high-resolution image creation mechanism, Lee is therefore appropriate to apply to the pertinent application in view of Zhang NPL. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1, 5, 8-11, 15, 21, 24, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., "StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks," arXiv.org, 2017 (hereinafter “Zhang NPL”), in view of Suresha et al. (US-20230082567-A1) and Lee et al. (KR-20210121537-A). Regarding claim 1, Zhang NPL teaches a method comprising: obtaining a low-resolution image (Zhang NPL, Fig. 2, “Stage-I results”) and a text description (Zhang NPL, Fig. 2, “Text description t” that is embedded into the conditioning augmentation right before “Stage-II Generator G for refinement”) wherein the text description indicates a target style (Zhang NPL, the text descriptions in Fig. 8 dictate how the style of the image is later output); PNG media_image1.png 495 1036 media_image1.png Greyscale PNG media_image2.png 395 584 media_image2.png Greyscale generating a style vector representing the target style based on the text description (Zhang NPL, Fig. 2, “Embedding φt”); and generating a high-resolution image (Zhang NPL, Fig. 2, “256 x 256 results”) corresponding to the low-resolution image (Zhang NPL, Fig. 2, “Stage-I results”) by performing a convolution process (Zhang NPL, Section 3.5, “up-sampling blocks consist of… a 3×3 stride 1 convolution” and “The down-sampling blocks consist of 4×4 stride 2 convolutions”). Zhang NPL fails to teach the following limitations as further claimed. Suresha, however, further teaches: generating an image feature map (Suresha, “CNN… includes two convolution layers 504 for feature maps extraction and a sub-pixel convolution layer 506 that aggregates the feature maps from the low resolution space,” Para [0080]) based on the low-resolution image (Suresha, “input a low resolution seismic image,” Para [0080]); and generating a high-resolution image (Zhang NPL, Fig. 2, “256 x 256 results”) corresponding to the low-resolution image (Zhang NPL, Fig. 2, “Stage-I results”) by performing a convolution process (Zhang NPL, Section 3.5, “up-sampling blocks consist of… a 3×3 stride 1 convolution” and “The down-sampling blocks consist of 4×4 stride 2 convolutions”) on the image feature map (Suresha, “aggregates the feature maps from the low resolution space,” Para [0080]). And Lee further teaches: generating an adaptive convolution filter (Lee, “generating an adaptive convolution filter for text classification,” Para [0007]) based on the style vector (Lee, “context vector”; “generating a convolution filter through the filter generation model based on the context vector,” Para [0007], where the “filter generation model generates an adaptive convolution filter,” Para [0007]), wherein the adaptive convolution filter comprises a convolution matrix with parameters (inherent to any convolution filter/the process of a convolution in machine learning) determined based on the style vector (Lee, “context vector”); and generating a high-resolution image (Zhang NPL, Fig. 2, “256 x 256 results”) corresponding to the low-resolution image (Zhang NPL, Fig. 2, “Stage-I results”) by performing a convolution process (Zhang NPL, Section 3.5, “up-sampling blocks consist of… a 3×3 stride 1 convolution” and “The down-sampling blocks consist of 4×4 stride 2 convolutions”) on the image feature map (Suresha, “aggregates the feature maps from the low resolution space,” Para [0080]) using the adaptive convolution filter based on the style vector (Lee, “context vector”; “generating a convolution filter through the filter generation model based on the context vector,” Para [0007], where the “filter generation model generates an adaptive convolution filter,” Para [0007]). Lee is considered to be analogous to the claimed invention because they are in the same field of generating adaptive convolution filters from text inputs in computer vision applications. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Lee into Zhang NPL for the benefit of generating more accurate high-resolution images compared to the initial low-resolution images. Suresha is considered to be analogous to the claimed invention because they are both in the same field of creating a high-resolution image from a low-resolution image. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Suresha into Zhang NPL and Lee for the benefit of spatially-aware feature extraction. Regarding claim 5, the rejection of claim 1 is incorporated herein. Zhang NPL, Suresha, and Lee teach the method of claim 1, further comprising: encoding the low-resolution image to obtain an image embedding, wherein the style vector (based on the text description) is generated based on the image embedding (Zhang NPL, “StackGAN does not achieve good results by simply memorizing training samples but by capturing the complex underlying language-image relations. We extract visual features from our generated images and all training images by the Stage-II discriminator D of our StackGAN,” 4.2 Quantitative and qualitative results). Regarding claim 8, the rejection of claim 1 is incorporated herein. Zhang NPL, Suresha, and Lee teach the method of claim 1, further comprising: identifying a plurality of predetermined convolution filters (Lee, “output value of the hashing function is used as an index to extract candidate vectors from the shared matrix,” Para [0045]); and combining the plurality of predetermined convolution filters based on the style vector (“the method of combining candidate filters of the shared matrix is dynamically determined based on the input,” Para [0050]) to obtain the adaptive convolution filter (Lee, “extracted candidate vectors are linearly combined to form a single vector, which is used as a convolution filter,” Para [0045]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Lee into Zhang NPL and Suresha for the benefit of accurate and high-resolution image generation. Claims 9-10 are system claims that correspond to method claims 1-2. Implementation of the system claims would necessitate the method claims. Therefore, the rejections of claims 1-2 apply fully to claims 9-10. Regarding claim 11, the rejection of claim 9 is incorporated herein. Zhang NPL, Suresha, and Lee teach the system of claim 9, wherein: the image generation network comprises a generative adversarial network (GAN) (Zhang NPL, Fig. 2, “StackGAN”). Regarding claim 15, the rejection of claim 9 is incorporated herein. Zhang NPL in view of Suresha and Lee teach the system of claim 9, further comprising: a discriminator network (Zhang NPL, Fig. 2, “Stage-II Discriminator D”) configured to generate an image embedding (Zhang NPL, Fig. 2, the pink 3D rectangle in “Stage-II Discriminator D”) and a conditioning embedding (Zhang NPL, Fig. 2, the green 3D rectangle in “Stage-II Discriminator D”), wherein the discriminator network is trained together with the image generation network (Fig. 2, “Stage-II Generator G for refinement”) using an adversarial training loss (Fig. 2, “{0,1}” in “Stage-II Discriminator D”) based on the image embedding (Zhang NPL, Fig. 2, the pink 3D rectangle in “Stage-II Discriminator D”) and the conditioning embedding (Zhang NPL, Fig. 2, the green 3D rectangle in “Stage-II Discriminator D”). Claims 21, 24, and 26 are non-transitory computer readable medium claims that correspond to method claims 1, 5, and 8. Implementation of the non-transitory computer readable medium claims would necessitate the method claims. Therefore, the rejections of claims 1, 5, and 8 apply to claims 21, 24, and 26. Claim(s) 2 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., "StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks," arXiv.org, 2017 ("Zhang NPL") in view of Suresha et al. (US-20230082567-A1) and Lee et al. (KR-20210121537-A) as applied to claim 1 above, and further in view of Zhang et al. (US-20230081171-A1) (hereinafter “Zhang ‘171”). Regarding claim 2, the rejection of claim 1 is incorporated herein. Zhang NPL, Suresha, and Lee teach the method of claim 1, but fail to teach the following limitations as further claimed. Zhang ‘171 teaches further comprising: encoding the text description of the low-resolution image to obtain a text embedding (Zhang NPL, Fig. 2, “Text description t” has an arrow into “Embedding φt”); and transforming the text embedding (Zhang ‘171, Fig. 3, 324 “word embeddings”) to obtain a global vector (Zhang ‘171, Fig. 3, 326 “global sentence embeddings”) corresponding to the text description as a whole and a plurality of local vectors (Zhang ‘171, “a local feature embedding for a portion of the particular textual description may be obtained,” Para [0091]) corresponding to individual tokens of the text description (Zhang ‘171, “a portion of the particular textual description” (Para [0091]) such as an individual word in a sentence), wherein the style vector (Zhang NPL, Fig. 2, “Embedding φt”) is generated based on the global vector (Zhang ‘171, Fig. 3, 326 “global sentence embeddings”) and the high-resolution image (Zhang NPL, Fig. 2, “256 x 256 results”) is generated based on the plurality of local vectors (Zhang ‘171, “a local feature embedding for a portion of the particular textual description may be obtained,” Para [0091]). Zhang ‘171 is considered to be analogous to the claimed invention because they are in the same field of generating images from a textual input. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Zhang ‘171 into Zhang NPL, Suresha, and Lee for the benefit of a more accurate high-resolution image result. Claims 22 is a non-transitory computer readable medium claims that corresponds to method claim 2. Implementation of the non-transitory computer readable medium claim would necessitate the method claim. Therefore, the rejection of claims 2 applies to claim 22. Claim(s) 3-4, 7, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., "StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks," arXiv.org, 2017 ("Zhang NPL") in view of Suresha et al. (US-20230082567-A1) and Lee et al. (KR-20210121537-A) as applied to claim 1 above, and further in view of Zhang et al. (US-20230081171-A1) (hereinafter “Zhang ‘171”) and Aberman et al. (US-20240037822-A1). Regarding claim 3, the rejection of claim 2 is incorporated herein. Zhang NPL, Suresha, Lee, and Zhang ‘171 teach the method of claim 2, but fail to teach the following limitations as further claimed. Aberman teaches, further comprising: performing a cross-attention process (Aberman, “the embeddings of the visual and textual features are fused using cross-attention layers that produce spatial attention maps for each textual token,” Para [0022]) based on the plurality of local vectors (“each textual token”), wherein the high-resolution image (Zhang NPL, Fig. 2, “256 x 256 results”) is generated based on the cross-attention process (Aberman, “cross attention layer that produce spatial attention maps”). Aberman is considered to be analogous to the claimed invention because they are in the same field of generating images from a textual input. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Aberman into Zhang NPL, Suresha, Lee, and Zhang ‘171 for the benefit of providing high correlation between the inputted text and the outputted image. Regarding claim 4, the rejection of claim 2 is incorporated herein. Zhang NPL, Suresha, Lee, and Zhang ‘171 teach the method of claim 2, but fail to teach the following limitations as further claimed. Aberman teaches further comprising: obtaining a noise vector (“noise vector”), wherein the style vector (“initial prompt”) is based on the noise vector (“the source image is generated by generating a noise vector for the real image (e.g., using an inversion process) and processing, using an LLI model and the noise vector, the initial prompt to generate the source image that approximates the real image,” Para [0010]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Aberman into Zhang NPL, Suresha, Lee, and Zhang ‘171 for the benefit of avoiding overfitting in generating high-resolution images. Regarding claim 7, the rejection of claim 1 is incorporated herein. Zhang NPL, Suresha, Lee, and Zhang ‘171 teach the method of claim 1, but fail to teach the following limitations as further claimed. Aberman teaches further comprising: performing a self-attention process based on the image feature map (Aberman, “cross-attention output MV is a weighted average of the values V where the weights are the attention maps,” Para [0024], where “Both of them (cross-attention layers and hybrids of cross-attention and self-attention layers) can be referred to as cross-attention since various implementations can intervene only in the cross-attention part of the hybrid attention,” Para [0025]), wherein the high-resolution image (Zhang NPL, Fig. 2, “256 x 256 results”) is generated based on the self-attention process (provided by “IMAGEN and/or other LLI models” (Para [0025]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Aberman into Zhang NPL, Suresha, Lee, and Zhang ‘171 for the benefit of more accurate high-resolution image outptuts. Claim 23 is a non-transitory computer readable medium claim that corresponds to method claim 3. Implementation of the non-transitory computer readable medium claim would necessitate the method claim. Therefore, the rejection of claim 3 applies to claim 23. Claim(s) 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al., "StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks," arXiv.org, 2017 ("Zhang NPL") in view of Suresha et al. (US-20230082567-A1) and Lee et al. (KR-20210121537-A) as applied to claim 1 above, and further in view of Aberman et al. (US-20240037822-A1). Regarding claim 12, the rejection of claim 9 is incorporated herein. Zhang NPL, Suresha, and Lee teach the system of claim 9, but fail to teach the following limitations as further claimed. However, Aberman teaches wherein: the image generation network includes a convolution layer (Lee, “applying the generated convolution filter to a convolutional neural network,” Para [0005]), a self-attention layer, and a cross-attention layer (Aberman, “two types of attention layers: i) cross-attention layers and ii) hybrid attention that acts both as self-attention and cross-attention,” Para [0025]). Aberman is considered to be analogous to the claimed invention because they are in the same field of generating images from a textual input. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Aberman into Zhang NPL, Suresha, and Lee for the benefit of better-quality image output due to variable machine learning weights. Regarding claim 13, the rejection of claim 9 is incorporated herein. Zhang NPL, Suresha, and Lee teach the system of claim 9, but fail to teach the following limitations as further claimed. However, Aberman teaches wherein: the image generation network includes a U-Net architecture (“the 64×64 model starts from a random noise seed, and uses the U-Net,” Para [0021]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Aberman into Zhang NPL, Suresha, and Lee for the benefit of faster analysis or processing of the images. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Fu et al. (US-20190095730-A1) teaches a method for license plate recognition using super-resolution. Gong et al. (US-20240037732-A1) teaches a method for enhancing CT image quality using super-resolution. Any inquiry concerning this communication or earlier communications from the examiner should be directed to RACHEL A OMETZ whose telephone number is (571)272-2535. The examiner can normally be reached 6:45am-4:00pm ET Monday-Thursday, 6:45am-1:00pm ET every other Friday. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at 571-272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Rachel Anne Ometz/Examiner, Art Unit 2668 1/15/26 /VU LE/Supervisory Patent Examiner, Art Unit 2668
Read full office action

Prosecution Timeline

Feb 17, 2023
Application Filed
May 06, 2025
Examiner Interview (Telephonic)
May 15, 2025
Non-Final Rejection — §103
Aug 01, 2025
Interview Requested
Aug 14, 2025
Applicant Interview (Telephonic)
Aug 14, 2025
Examiner Interview Summary
Aug 20, 2025
Response Filed
Sep 23, 2025
Final Rejection — §103
Nov 07, 2025
Interview Requested
Dec 01, 2025
Request for Continued Examination
Dec 15, 2025
Response after Non-Final Action
Jan 15, 2026
Non-Final Rejection — §103
Apr 01, 2026
Interview Requested
Apr 08, 2026
Applicant Interview (Telephonic)
Apr 08, 2026
Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602925
HYPERSPECTRAL IMAGE ANALYSIS USING MACHINE LEARNING
2y 5m to grant Granted Apr 14, 2026
Patent 12555255
ABSOLUTE DEPTH ESTIMATION FROM A SINGLE IMAGE USING ONLINE DEPTH SCALE TRANSFER
2y 5m to grant Granted Feb 17, 2026
Patent 12548354
METHOD FOR PROCESSING CELL IMAGE, ELECTRONIC DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 10, 2026
Patent 12541970
SYSTEM AND METHOD FOR ESTIMATING THE POSE OF A LOCALIZING APPARATUS USING REFLECTIVE LANDMARKS AND OTHER FEATURES
2y 5m to grant Granted Feb 03, 2026
Patent 12530735
IMAGE PROCESSING APPARATUS THAT IMPROVES COMPRESSION EFFICIENCY OF IMAGE DATA, METHOD OF CONTROLLING SAME, AND STORAGE MEDIUM
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
69%
Grant Probability
99%
With Interview (+30.1%)
2y 11m
Median Time to Grant
High
PTA Risk
Based on 26 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month