Last updated: April 19, 2026

Application No. 18/339,883

IDENTIFYING VISUAL TEXT USING VISION-LANGUAGE MODELS

Non-Final OA §101

Filed

Jun 22, 2023

Examiner

DUGDA, MULUGETA TUJI

Art Unit

2653

Tech Center

2600 — Communications

Assignee

Adobe Inc.

OA Round

3 (Non-Final)

Interview Optional

— +18.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 49 resolved cases, 2023–2026

Examiner Intelligence

DUGDA, MULUGETA TUJI View full profile →

Grants 82% — above average

Career Allow Rate

40 granted / 49 resolved

+19.6% vs TC avg

Strong +19% interview lift

Without

With

+18.8%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

19 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

18.0%

-22.0% vs TC avg

§103

57.6%

+17.6% vs TC avg

§102

19.4%

-20.6% vs TC avg

§112

5.0%

-35.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 49 resolved cases

Office Action

§101

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/26/2026 has been entered.

Claims 1-20 are pending and claims 1, 10 and 17 are independent claims.

Response to Arguments
Applicant’s arguments with respect to claims 1-20. see Arguments pages 8-10, filed on 02/26/2026, with respect to the 35 USC § 103 rejections of claims 1-20 have been fully considered and are persuasive. The 35 USC § 103 rejections of claims 1-20 has been withdrawn.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 10-15 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 10 recites “obtaining …; and training … to classify …”  as drafted cover an abstract idea of mental steps and data analysis/retrieval/collection for model training where the model training using contrastive learning that appears to be applied generically to the machine learning model. More specifically, the “obtaining training data including an image, a sentence corresponding to the image, a null image, and a sentence corresponding to the null image; and training a machine learning model using contrastive learning and the training data to classify whether a text is visual text or non-visual text” requires just data analysis/retrieval and mental process A human may use pen/pencil and paper to gather sentences evoking images versus the corresponding images as positive examples/pairs and sentences with opposite effects, evoking no images (Null images). The contrastive training or learning steps for classifying these sentences/text data as visual and non-visual might be performed mentally or at most implemented using a conventional/generic (general-purpose) computer (Spec., paras 0093-0096 and 0098). “Machine learning model” are in the spec defined as generic or any model available as noted in [Para 0030], which states the broadness of what machine learning model is. The claimed invention is, therefore, directed to an abstract idea and a mental process without significantly more and thus, claim 10 is rejected under 35 U.S.C. 101.
Similarly, the dependent 11-15 recite similar claim language as in claims 10. Claim 11 recites “determining whether each page in a plurality of pages includes the image using an object detection algorithm, wherein each page includes a plurality of sentences,” which also requires just a mental step that can be easily performed by determining whether each page in a plurality of pages includes the image using a simple human observation, wherein each page includes a plurality of sentences. No additional elements are present.  Thus, claim 11 is directed to an abstract idea.
Claim 12 recites “determining a positive pair of the training data by: selecting the image, and selecting a sentence of the plurality of sentences that satisfies a similarity threshold using an embedding of the sentence and an embedding of the image, the selected sentence corresponding to the image,” which just requires a mental step or apply a simple mathematical comparison with a threshold. A human can mentally determine about a positive pair of the training data by selecting the image, and a sentence of the plurality of sentences that satisfies a similarity threshold using an embedding of the sentence and an embedding of the image. No additional elements are present. Thus, claim 12 is directed to an abstract idea.
Claim 13 recites “determining a negative pair of the training data by: selecting a sentence of the plurality of sentences that satisfies a negative similarity threshold to the image, the selected sentence corresponding to the null image, and obtaining the null image,” which just requires a mental step or apply a simple mathematical comparison with a threshold.  A human can mentally make the determination about a negative pair of the training data by selecting a sentence of the plurality of sentences that satisfies a negative similarity threshold to the image, the selected sentence corresponding to the null image, which can result in obtaining the null or no image. No additional elements are present.  Thus, claim 13 is directed to an abstract idea.
Claim 14 recites “generating a randomly generated null image by randomly selecting a value of a pixel in the null image,” which require just a mental. For instance, a human can randomly generate null image by randomly selecting a value of a pixel in the null image with a pencil and paper. No additional elements are present. Thus, claim 14 is directed to an abstract idea.
Claim 15 recites “the obtained null image is a common null image,” which requires a mental step in which the obtained null image by a human is just a common null image. No additional elements are present. Thus, claim 15 is directed to an abstract idea.
Thus, claims 10-15 as drafted cover a mental process and abstract idea of data gathering/retrieval and analysis/processing steps, and they are mental processes directed to an abstract idea of implementing mathematical formulae for data processing and data analysis using a conventional/generic (general-purpose) computer as well. No additional elements are present. Thus, all the claims are directed to an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, claims 10-14 recite additional element “machine learning model” as per the dependent and independent claims. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea.
Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional general purpose computer implementation. Claims 10-15, are therefore not drawn to patent eligible subject matter as they are directed to an abstract idea without significantly more. Thus, the claimed invention is directed to an abstract idea and a mental process without significantly more and thus, claims 10-15 are rejected under 35 U.S.C. 101. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (Spec., paras 0093-0096 and 0098). Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claims are not patent eligible.
Dependent claims 11-15 are also directed toward an abstract idea and do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. Therefore, claims 10-15 do not contain patent eligible subject matter that has been identified by the courts.


Allowable Subject Matter
Claims 1-9 and 17-20 are allowed. The reasons for allowance are that the prior art of record do not specifically teach the limitations as recited in the mentioned claims. 
After a thorough examination of the present application and in light of the prior art made of record and the applicant’s amendments, claims 1-9 and 17-20 are found to be in condition for allowance.
The combination of the two references Park et al. Pat App No. US 20240282025 A1 (Park)(EFD: 2023-02-17) in view of Zheng et al. Pat App No. US 20180137551 A1 (Zheng) teach the previously presented unamended claims 1 and 17. For instance, the combination of the two prior arts disclose “receiving a text to be used for generating an image” (Park, para 0059,  In response to receiving a user input to submit button 310, the image generation apparatus displays the text prompt in confirmation box 315, generates an image based on the text prompt); “responsive to determining that the text is a visual text, generating the image using a second machine learning model based on the text” (Park, para 0002, a machine learning model of an image generation system generates style information based on input text, generates a style-adaptive convolution filter based on the style information, and generates an image; Park, para 0040-0044, In the example of FIG. 1, user 105 provides a text prompt (e.g., “a cute hamburger monster character with eyes”) to image generation apparatus 115 via user device 110. Image generation apparatus 115 determines a style vector based on the text input. In some cases, image generation apparatus 115 determines the style vector based on a latent code … In some cases, image generation apparatus 115 generates the image based on the style vector using the adaptive convolution filter. In some cases, image generation apparatus 115 provides the image to user 105 via user device 110. Examples of images generated by image generation apparatus 115 are described with reference to FIGS. 3 and 4; Park, para 0121-0126,  image generation network 725 generates a text-conditioned image by modulating convolutions of feature map 770 using style vector 765, where the content described by the text prompt is passed to image generation network 725 via a combination of style vector 765 and local vectors 755, …and a visual alignment between text prompt 745); “displaying the image and the text” (Park, para 0059, the image generation apparatus displays the text prompt in confirmation box 315, … and displays the image to the user via user interface 300 (e.g., in first image display 320 and second image display 325); Park, para 0044-0045, In some cases, image generation apparatus 115 generates the image based on the style vector using the adaptive convolution filter. In some cases, image generation apparatus 115 provides the image to user 105 via user device 110. Examples of images generated by image generation apparatus 115 are described with reference to FIGS. 3 and 4. According to some aspects, user device 110 is a personal computer, laptop computer, mainframe computer, palmtop computer, personal assistant, mobile device, or any other suitable processing apparatus. In some examples, user device 110 includes software that displays a user interface (e.g., a graphical user interface) provided by image generation apparatus 115); “generating a visual score for the text using a machine learning model, wherein the visual score is a similarity of the text to a null image, wherein each pixel in the null image is a randomly selected value” (Zheng, para 0093-0095, An aspect prediction block 810 may help determine to which descriptive aspects in an electronic marketplace a given image is related (e.g., “color”, “brand”, “sleeve style”) based on the visual text content provided. Aspects may be common across a number of categories, though this is not always the case. The aspect prediction block 810 may operate on categories from the leaf category prediction block 806 as well as the localized and isolated visual text content provided by the deep neural network 804. The predicted descriptive aspects may be passed on for further consideration and use in a product search in an electronic marketplace. A visual search block 812 may calculate a visual similarity measure between input images, such as an image of a candidate product and the input query image. More precisely, in one embodiment, the visual search block 812 may calculate the visual similarity measure between the localized and isolated visual text content portions of those images. The visual similarity measure may be based on the image signature or hash value that semantically represents a localized and isolated visual text portion, for example. The similarity between two images or two image portions may be estimated by calculating a distance value between two image signatures produced for example by the image signature generation block 808. The distance may comprise a Hamming distance, by way of example but not limitation. A Hamming distance generally describes the number of bits that are different in two binary vectors. Similar images or image portions being compared may therefore have a smaller Hamming distance between them, and thus a higher visual similarity measure. The visual similarity measure is therefore useful as a search result score; Zheng, para 0087, Visual text content may comprise those pixels of an image that represent text in at least one human language);
Similarly, some newly searched and discovered prior arts like Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or, "Null-text Inversion for Editing Real Images using Guided Diffusion Models," arXiv:2211.09794v1 [cs.CV] , Thu, 17 Nov 2022 18:58:14 UTC (27,205 KB), Smetanin et al., “Automatic Image Generation in an Interaction system,” Pat App No. US 20240296606 A1, and Zhang et al., “Structured Document Generation from Text Prompts,” Pat App No. US 20240346234 A1 teach the unamended and previously presented claims 1 and 17.
The above prior arts of record did not specifically teach “generating a visual score for the text of the input document using a machine learning model, wherein the visual score is a similarity of the text of the input document to a null image, wherein each pixel in the null image is a randomly selected value.”
The following is an examiner’s statement of reasons for allowance:
The newly amended claims, filed on 02/26/2026, are allowable over the prior art of record since the cited references taken individually or in combination fail to particularly disclose, inter alia, “receiving an input document comprising [[a]] text to be used for generating an image; generating a visual score for the text of the input document using a machine learning model, wherein the visual score is a similarity of the text of the input document to a null image, wherein each pixel in the null image is a randomly selected value; responsive to determining that the text of the input document is a visual text based on the visual score, generating the image using a second machine learning model based on the text of the input document; and displaying the input document comprising the image and the text.”

Claims 10-15 are rejected due to the 35 USC § 101 abstract idea rejectionsbut all claims 10-15 would be allowable if all these claims overcome the 35 USC § 101 rejections. 

Claim 16 is objected to as being dependent upon rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and also if all these claims overcome the § 101 rejections.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MULUGETA T. DUGDA whose telephone number is (703)756-1106. The examiner can normally be reached Mon - Fri, 4:30am - 7:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MULUGETA TUJI DUGDA/Examiner, Art Unit 2653                                                                                                                                                                                                        

/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        
03/07/2026

Read full office action

Prosecution Timeline

Jun 22, 2023

Application Filed

Jun 28, 2025

Non-Final Rejection — §101

Sep 22, 2025

Interview Requested

Oct 02, 2025

Applicant Interview (Telephonic)

Oct 02, 2025

Examiner Interview Summary

Oct 02, 2025

Response Filed

Dec 19, 2025

Final Rejection — §101

Feb 26, 2026

Request for Continued Examination

Feb 27, 2026

Response after Non-Final Action

Mar 07, 2026

Non-Final Rejection — §101 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/912,112

Patent 12597424

METHOD AND APPARATUS FOR DETERMINING SKILL FIELD OF DIALOGUE TEXT

2y 5m to grant Granted Apr 07, 2026

17/912,912

Patent 12592244

REDUCED-BANDWIDTH SPEECH ENHANCEMENT WITH BANDWIDTH EXTENSION

2y 5m to grant Granted Mar 31, 2026

17/662,896

Patent 12579366

DEVELOPMENT PLATFORM FOR FACILITATING THE OPTIMIZATION OF NATURAL-LANGUAGE-UNDERSTANDING SYSTEMS

2y 5m to grant Granted Mar 17, 2026

18/015,732

Patent 12573417

A COMPUTER-IMPLEMENTED METHOD OF PROVIDING DATA FOR AN AUTOMATED BABY CRY ASSESSMENT

2y 5m to grant Granted Mar 10, 2026

18/303,754

Patent 12567419

VOICEPRINT DRIFT DETECTION AND UPDATE

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

82%

Grant Probability

99%

With Interview (+18.8%)

3y 1m

Median Time to Grant

High

PTA Risk

Based on 49 resolved cases by this examiner. Grant probability derived from career allow rate.

IDENTIFYING VISUAL TEXT USING VISION-LANGUAGE MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email