Last updated: May 29, 2026

Application No. 18/180,566

METHODS AND SYSTEMS FOR GENERATING TEXT WITH TONE OR DICTION CORRESPONDING TO STYLISTIC ATTRIBUTES OF IMAGES

Non-Final OA §103

Filed

Mar 08, 2023

Priority

Jan 31, 2023 — provisional 63/482,496 +1 more

Examiner

MILIA, MARK R

Art Unit

2681

Tech Center

2600 — Communications

Assignee

Shopify Inc.

OA Round

3 (Non-Final)

This examiner grants 58% of cases after interview

— +22.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 586 resolved cases, 2023–2026

Examiner Intelligence

MILIA, MARK R View full profile →

Grants 58% of resolved cases

Career Allowance Rate

342 granted / 586 resolved

-3.6% vs TC avg

Strong +23% interview lift

Without

With

+22.8%

Interview Lift

resolved cases with interview

Typical timeline

3y 4m

Avg Prosecution

21 currently pending

Career history

611

Total Applications

across all art units

Statute-Specific Performance

§101

0.2%

-39.8% vs TC avg

§103

87.6%

+47.6% vs TC avg

§102

11.8%

-28.2% vs TC avg

§112

0.3%

-39.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 586 resolved cases

Office Action

§103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/26/26 has been entered. Currently, claims 1-25 are pending.

Response to Arguments

Applicant’s arguments with respect to claim(s) 1, 13, and 25 have been considered but are moot in view of a new ground(s) of rejection.

Claim Rejections - 35 USC § 103

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-25 are rejected under 35 U.S.C. 103(a) as being unpatentable over Hamedi (US 2020/0210764) in view of Xie et al. (US 2023/0394855).
Regarding claims 1, 13, and 25, Hamedi discloses a non-transitory computer-readable medium storing instructions, a computer-implemented method, and a system comprising: 
a processor configured to execute a plurality of instructions to cause the system to: 
extract, from an image, one or more stylistic visual attributes of the image using a first trained machine learning model (see paras 49-50, 76-78, and 92, high-level stylistic features of an image are extracted, such as type of object shown, dominant color scheme, brightness or contrast of the image, etc.); and
map the one or more stylistic visual attributes to one or more emotion attributes using a second trained machine learning model (see paras 76-78 and 93, stylistic features can propagate through a plurality of layers of a trained machine learning model to generate emotion attributes).
Hamedi does not disclose expressly generate a prompt to a large language model (LLM), the prompt being based on the one or more emotion attributes; provide the generated prompt to the LLM; and obtain, from the LLM, a generated description of the image.
Xie discloses generate a prompt to a large language model (LLM), the prompt being based on the one or more emotion attributes (see Fig. 2 and paras 15 and 18, a prompt based on visual cues from an image are generated); 
provide the generated prompt to the LLM (see Fig. 2 and paras 15 and 18, a prompt based on visual cues from an image are generated and provided to a large language model); and 
obtain, from the LLM, a generated description of the image (see paras 19-21 and 27, the large language model generates a description of the image).
Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to combine the large language model generating a description of an image, as described by Xie, with the system of Hamedi.
The suggestion/motivation for doing so would have been to eliminate the need for manual captioning thereby saving time and increasing system efficiency.
Therefore, it would have been obvious to combine Xie with Hamedi to obtain the invention as specified in claims 1, 13, and 25.

Regarding claims 2 and 14, Hamedi further discloses wherein the first trained machine learning model is a trained deep neural network (see Fig.4 and paras 80 and 93, a multi-layered machine learning model is used to extract visual attributes from images).  
Regarding claims 3 and 15, Hamedi further discloses wherein the second trained machine learning model is a trained neural network (see Fig.4 and paras 80 and 93, one or more multi-layered machine learning model is used to extract visual attributes from images).  
Regarding claims 4 and 16, Hamedi further discloses wherein the prompt includes at least one of the one or more emotion attributes (see para 78, stylistic features are fed to a multi-layered machine learning model to generate an emotional attribute).  
Regarding claims 5 and 17, Xie further discloses wherein the processor is further configured to execute the instructions to cause the system to incorporate a generic description of the image into the prompt (see Fig. 2 and paras 19-21, generic descriptions can be utilized).  
Regarding claims 6 and 18, Xie further discloses wherein the processor is further configured to execute the instructions to cause the system to retrieve the generic description of an object from a description database (see Fig. 2 and paras 19-21, generic descriptions can be utilized).  
Regarding claims 7 and 19, Xie further discloses wherein the processor is further configured to execute the instructions to cause the system to provide the image to a descriptor text generator to obtain the generic description for incorporation into the prompt (see paras 18-21, the large language model generates a description of the image based on image tags and object attributes).  
Regarding claims 8 and 20, Hamedi further discloses wherein the image comprises an object (see paras 50-51, 76, and 78, the image can contain objects, high-level stylistic features of an image are extracted, such as type of object shown).  
Regarding claim 9, Xie further discloses wherein the generated prompt further comprises a name of the object in the image (see Fig. 2 and para 19, a name, such as “man”, is an object name).  
Regarding claims 10 and 21, Hamedi further discloses wherein the processor is further configured to execute the instructions to cause the system to incorporate physical attributes of the object into the prompt (see para 78, physical attributes, such as facial attributes are utilized).  
Regarding claims 11 and 23, Hamedi further discloses wherein the visual attributes are extracted from a plurality of multiple images of the object (see paras 50 and 74, a plurality of images can be used, the visual attributes are then extracted from each one of the multiple images).  
Regarding claims 12 and 24, Hamedi further discloses wherein the visual attributes are common visual attributes to each of the multiple images (see paras 50 and 78, high-level stylistic features of the images are extracted, such as type of object shown, dominant color scheme, brightness or contrast of the image, etc.).
Regarding claim 22, Hamedi further discloses extracting the physical attributes of the object from an object attribute database (see paras 76-78, object attributes are selected from a predetermined list).

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. To further show the state of the art please refer to the attached Notice of References Cited.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK R MILIA whose telephone number is (571) 272-7408. The examiner can normally be reached Monday-Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Akwasi Sarpong can be reached at 571-270-3438. The fax number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARK R MILIA/             Primary Examiner, Art Unit 2681

Read full office action

Prosecution Timeline

Mar 08, 2023

Application Filed

Jul 15, 2025

Non-Final Rejection mailed — §103

Sep 16, 2025

Response Filed

Dec 18, 2025

Final Rejection mailed — §103

Jan 27, 2026

Response after Non-Final Action

Feb 26, 2026

Request for Continued Examination

Feb 27, 2026

Response after Non-Final Action

Mar 05, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/951,083

Patent 12639016

IMAGE FORMING SYSTEM AND NON-TRANSITORY COMPUTER READABLE MEDIUM STORING PROGRAM

3y 8m to grant Granted May 26, 2026

17/702,662

Patent 12620209

METHOD AND SYSTEM FOR GENERATING IMAGE ADVERSARIAL EXAMPLES BASED ON AN ACOUSTIC WAVE

4y 1m to grant Granted May 05, 2026

18/210,467

Patent 12614248

COORDINATED SUPER-RESOLUTION PROCESSING BY NON-NATIVE HARDWARE PROCESSING SYSTEMS

2y 10m to grant Granted Apr 28, 2026

18/365,486

Patent 12615867

DETECTION DEVICE

2y 8m to grant Granted Apr 28, 2026

18/590,995

Patent 12602843

METHOD FOR CONVERTING ENDOSCOPE IMAGES TO NARROW BAND IMAGES

2y 1m to grant Granted Apr 14, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

58%

Grant Probability

81%

With Interview (+22.8%)

3y 4m (~2m remaining)

Median Time to Grant

High

PTA Risk

Based on 586 resolved cases by this examiner. Grant probability derived from career allowance rate.