Last updated: April 19, 2026

Application No. 18/176,668

METHOD AND SYSTEM OF GENERATING CUSTOMIZED IMAGES

Non-Final OA §103

Filed

Mar 01, 2023

Examiner

ZHAI, KYLE

Art Unit

2611

Tech Center

2600 — Communications

Assignee

Microsoft Technology Licensing, LLC

OA Round

3 (Non-Final)

Interview Optional

— +18.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 473 resolved cases, 2023–2026

Examiner Intelligence

ZHAI, KYLE View full profile →

Grants 75% — above average

Career Allow Rate

353 granted / 473 resolved

+12.6% vs TC avg

Strong +19% interview lift

Without

With

+18.6%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

31 currently pending

Career history

504

Total Applications

across all art units

Statute-Specific Performance

§101

10.6%

-29.4% vs TC avg

§103

61.2%

+21.2% vs TC avg

§102

7.9%

-32.1% vs TC avg

§112

15.1%

-24.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 473 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 01/20/26 has been entered.
 
Response to Arguments
Applicant's arguments filed 1/20/26 have been fully considered but they are not persuasive.
Applicant’s arguments with respect to claims 7, 21 and 28 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 7, 10-13, 21 and 24-27 are rejected under 35 U.S.C. 103 as being unpatentable over Green (US 2024/0282130) in view of Endras et al. (US 2019/0294878) in view of Ding et al. (CogView: Mastering Text-to-Image Generation via Transformers, 35th Conference on Neural Information Processing System, 2021) in view of Ruiz et al. (DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation, Computer Vision and Pattern Recognition, 2022) in view of Park et al. (US 2022/0374602).
Regarding claim 7, Green discloses a method of generating a customized image with an image- generating artificial intelligence engine (Green, [0006], “a method for image generation”. In addition, in paragraph [0022], “modification of an image using an artificial intelligence (AI) generation model”), the method comprising:
receiving a command from a user for the customized image (Green, [0048], “input request 204 could include an image of the rocket”), the request including an image of a person or an object (Green, [0048], “A user interface tool 220 may be used to enable a user to provide an input request 204…input request 204 could include an image of the rocket”) and a textual description of the customized image (Green, [0048], “the text can say “make the rocket wider” or “add more flames” or “make it stronger””);
the image-generating artificial intelligence engine (Green, [0022], “modification of an image using an artificial intelligence (AI) generation model”); 
submitting the image and text to the image-generating artificial intelligence engine to generate the customized image (Green, [0042], “the generation of an output image, graphics, and/or three-dimensional representation by an image generation AI (IGAI), can include one or more artificial intelligence processing engines and/or models”); and
receiving from the image-generating artificial intelligence engine the customized image that includes a depiction of the person or the object (Green, [0031], “a display 131 configured for displaying an interface for purposes of facilitating interaction with a user, and the original image 101 and/or the modified image 102”. In addition, in paragraph [0043], “The IGAI is therefore a custom tool that is engineered to processing specific types of input and render specific types of outputs. When the IGAI is customized, the machine learning and deep learning algorithms are tuned to achieve specific custom outputs”).
Green does not expressly disclose “a natural language instruction that includes terms related to the person or the object”;
Endras et al. (hereinafter Endras) discloses a natural language instruction that includes terms related to an object (Endras, [0161], “an NLP model that is trained to identify specific terms (e.g., to identify terms such as terms related to damage (e.g., “scratch”, “dent”, “chip”, “tear”, etc.) and terms related to location on the vehicle (e.g., “door”, “fender”, “window”, “seat”, “hood”, “engine”, etc.)”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use Endras’s NLP model to identify terms related to Green’s user text input. The motivation for doing so would have been enabling fast identification of relevant information by analyzing the intent behind user queries.
Green as modified by Endras does not expressly disclose “a fine-tuning mechanism comprising a set of tokens defining an appearance of the person or the object”;
Ding et al. (hereinafter Ding) discloses generating a fine-tuning mechanism comprising a set of tokens defining an appearance of an object (Ding, Figure 3: The framework of CogView illustrates a set of tokens defining an appearance of an object);
generating the fine-tuning mechanism for the image-generating artificial intelligence engine (Ding, Fig. 3 illustrates text tokens and image tokens are send to Transformer (GPT));
the object tokenized in the fine-tuning mechanism (Ding, Fig. 3 illustrates image tokenizer).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to perform Green’s image modification using Ding’s image tokens defining an appearance of an object. The motivation for doing so would have been providing simplified and compressed way to represent image content.
Green as modified by Endras and Ding does not expressly disclose “utilizes its underlying training set to determine how the customized image should appear based on the command and further utilizes the set of tokens and the NLP layer of the fine-tuning mechanism to generate the customized image”;
Ruiz et al. (hereinafter Ruiz) discloses a plurality of customization images comprising depiction of plurality of elements that user wants to include in a customized image (Ruiz, Figs. 2-3, 5 and 11 illustrate a plurality of input images (customization images) comprising depiction of plurality of elements that user wants to include in a output (customized) image), 
terminology related to the plurality of elements (Ruiz, 1 Introduction, [0004], “We first fine-tune the low-resolution text-to image model with the input images and text prompts containing a unique identifier followed by the class name of the subject”. The class name reads on terminology related to the plurality of elements);
an image-generating artificial intelligence engine (Ruiz, 1 Introduction, [0003], “a new approach for “personalization” of text-to-image diffusion models”) utilizes its underlying training set to determine how the customized image should appear based on a command (Ruiz, 4 Method, [0003], “we use the pre-trained Imagen model as the base model [56]”. Fig. 3) and further utilizes the set of tokens (Ruiz, 3 Preliminaries, [0004], “transform a text prompt P into a conditioning embedding c, the text is first tokenized using a tokenizer f using a learned vocabulary”) and the NLP layer of a fine-tuning mechanism (Ruiz, 3 Preliminaries, [0004], “Language models like T5-XXL generate embeddings of a tokenized text prompt, and vocabulary encoding is an important pre-processing step for prompt embedding…Finally, the text-to-image diffusion model is directly conditioned on c”. T5-XXL is considered a natural language processing layer for a fine-tuning mechanism. In addition, Fig. 3 High level method including returns a fine-tuned text-to-image model) to generate the customized image specific to the user (Ruiz, Fig. 5);
submitting the fine-tuning mechanism (Ruiz, 4 Method, [0001], “implant the subject instance into the output domain of the model and to bind the subject with a unique identifier”. In addition, in section 4.1 Representing the subject with a rare-token identifier, [0001], “Our goal is to “implant” a new (key, value) pair into the diffusion model’s “dictionary” such that, given the key for our subject, we are able to generate fully-novel images of this specific subject with meaningful semantic modifications guided by a text prompt”) and the command (Ruiz, Fig. 4 illustrates Given ∼ 3- 5 images of a subject) to the image-generating artificial intelligence (Ruiz, 1 Introduction, [0003], “a new approach for “personalization” of text-to-image diffusion models”) to execute the command and generate the customized image (Ruiz, Figs. 3 and 4);
the customized image that includes the depictions of the plurality of elements (Ruiz, Figs. 2-3, 5 and 11).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use Ruiz’s fine-tuning text-to-image diffusion models to generate Green’s customized images. The motivation for doing so would have been enabling personalized text-to-image generation using diffusion models.
In addition, Green as modified by Endras, Ding, Ruiz does not expressly disclose “a Natural Language Processing (NLP) layer that determines the terminology corresponding to each of the plurality of elements with the set of tokens”;
Park discloses a natural language processing (NLP) layer that determines terminology corresponding to element with a token (Park, [0051], “the NLP embedding engine 140 processes the tokens of the input 130 from the tokenizer 125 to generate word embeddings 122 and character embeddings 124”. Each generated embedding corresponds to a respective token, and each token corresponds to a respective element in the input. Therefore, generating word embeddings and character embeddings reads on determining terminology corresponding to the tokens).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the natural language processing layer of Park into the customized image generation system of Green as modified by Endras, Ding and Ruiz. The motivation for doing so would have been enabling the AI engine to understand which image elements to modify.
Regarding claim 10, Green discloses submitting to the image-generating artificial intelligence engine via an Application Programming Interface (API) of the image-generating artificial intelligence engine (Green, [0044], “the IGAI can be used online via one or more Application Programming Interface (API) calls”).
Green as modified by Endras and Ding with the same motivation from claim 7 discloses the fine-tunning mechanism (Ding, Figure 3: The framework of CogView illustrates a set of tokens defining an appearance of an object).
Regarding claim 11, Green discloses a textual command entered by a user describing the customized image to be generated (Green, [0048], “the text can say “make the rocket wider” or “add more flames” or “make it stronger””);
Green as modified by Endras, Ding, Ruiz and Park with the same motivation from claim 7 discloses generating the NLP layer based on the terminology used in a textual (Park, [0088], “The natural language content is processed by a tokenizer to break down the natural language content into tokens (step 520). The tokens are provided to an NLP embedding engine to generate word embeddings and/or character embeddings (step 530)”).
Regarding claim 12, Green discloses operating a client application specific to the image-generating artificial intelligence engine (Green, [0031], “a user interface 130 is configured to facilitate implementing changes to an image 101 based on user feedback. In particular, the user interface may include a display 131 configured for displaying an interface for purposes of facilitating interaction with a user, and the original image 101 and/or the modified image 102”. Fig. 1A).
Green as modified by Endras and Ding with the same motivation from claim 7 discloses generate the fine-tuning mechanism (Ding, Figure 3: The framework of CogView illustrates a set of tokens defining an appearance of an object).
Regarding claim 13, Green discloses submitting to the image-generating artificial intelligence engine with a textual command entered by a user describing the customized image to be generated (Green, [0042], “the generation of an output image, graphics, and/or three-dimensional representation by an image generation AI (IGAI), can include one or more artificial intelligence processing engines and/or models”. In addition, in paragraph [0048], “the text can say “make the rocket wider” or “add more flames” or “make it stronger””).
Green as modified by Endras and Ding with the same motivation from claim 7 discloses submitting the fine-tuning mechanism to the image-generating artificial intelligence engine (Ding, Fig. 3 illustrates text tokens and image tokens are send to Transformer (GPT)).
Regarding claim 21, Green discloses a method of generating a customized image with an image-generating artificial intelligence engine (Green, [0006], “a method for image generation”. In addition, in paragraph [0022], “modification of an image using an artificial intelligence (AI) generation model”), the method comprising:
the image-generating artificial intelligence engine retains its training set while adapting to the customized image following generation of the customized image (Green, [0046], “the input 206 is configured to convey the intent of the user that wishes to utilize the IGAI to generate some digital content…the data set used to train the IGAI and input 206 can be used to customized the way artificial intelligence, e.g., deep neural networks process the data to steer and tune the desired output image”. In addition, in paragraph [0051], “the original image may have been generated using an IGAI model implementing latent diffusion in response to user input (e.g., description of a desired image that is encoded into a latent space vector). For example, the image is generated by the IGAI processing model of FIGS. 1 and FIGS. 2A-2C. The extracted features are relevant to generating images using latent diffusion, and in one embodiment, an AI model is configured for extracting those features”. The AI model keeps the training data set, and adjust to create a custom output image based on user input. Moreover, the AI model is designed to extract features from the images generated by the IGAI and continues to adapt to modifications even after image generation is complete).
The remaining limitations recite in claim 21 are similar in scope to the method recited in claim 7 and therefore are rejected under the same rationale.
Regarding claims 24-27, claims 24-27 recite method steps that are similar in scope to the method steps recited in claims 10-13 and therefore are rejected under the same rationale.

Claim 8 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Green (US 2024/0282130) in view of Endras et al. (US 2019/0294878) in view of Ding et al. in view of Ruiz et al. in view of Park et al. (US 2022/0374602), as applied to claims 7 and 21, in further view of Gaither et al. (US 2014/0047429).
Regarding claim 8, Green teaches the image-generating artificial intelligence engine; Green as modified by Endras and Ding with the same motivation from claim 7 teaches the fine-tunning mechanism; Green as modified by Endras, Ding, Ruiz and Park does not expressly disclose “an add-in or plug-in”;
Gaither et al. (hereinafter Gaither) discloses an add-in or plug-in for a program (Gaither, [0013], “Plug-ins are generally used for customizing the functionality of a software application. Applications generally support plug-ins for various reasons. For example, plug-ins can be used to enable third-party developers to, for example, create new features and/or functionality that extends an application”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to perform Green as modified by Endras and Ding’s fine-tunning for the image-generating artificial intelligence engine using the concept of Gaither’s plug-in. The motivation for doing so would have been allowing users to customize and expand existing software capabilities as needed.
Regarding claim 22, claim 22 recites method that is similar in scope to the method recited in claim 8 and therefore is rejected under the same rationale.

Claims 28 and 31-34 are rejected under 35 U.S.C. 103 as being unpatentable over Green (US 2024/0282130) in view of More et al. (US 2019/0303403) in view of Saharia et al. (Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, 36th Conference on Neural Information Processing Systems, 2022) in view of Endras et al. (US 2019/0294878) in view of Ding et al. in view of Ruiz et al. in view of Park et al. (US 2022/0374602).
Regarding claim 28, Green discloses a method of generating a customized image with an image-generating artificial intelligence engine (Green, [0006], “a method for image generation”. In addition, in paragraph [0022], “modification of an image using an artificial intelligence (AI) generation model”), the method comprising:
a processor (Green, [0008], “a processor”);
a memory storing executable instructions that, when executed, cause the processor alone or in combination with other processors to perform operations (Green, [0008], “memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to execute a method”);
Green does not expressly “a background image”;
More et al. (hereinafter More) discloses a background image (More, [0126], “a list of visual content elements (e.g., background elements such as landscapes, foreground elements such as people, animals, inanimate objects, etc.)”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to perform Green’s image modification using the concept of More’s foreground and background elements. The motivation for doing so would have been providing distinct foreground and background layers in order to easily modify individual elements without affecting the entire image.
Green as modified by More does not expressly disclose “the customized image that includes a depiction of the product image and the background image”;
Saharia et al. (hereinafter Saharia) discloses customized image that includes a depiction of a product image and a background image (Saharis, Fig. 1 illustrates a high contrast portrait of a very happy fuzzy panda dressed as a chef in a high-end kitchen making dough. There is a painting of flowers on the wall behind him).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the concept of Saharia’s photorealistic Text-to-Image Diffusion Models including foreground and background images in the image generation system, as taught by Green as modified by More. The motivation for doing so would have been allowing more realistic scene composition.
The remaining limitations recite in claim 28 are similar in scope to the method recited in claim 7 and therefore are rejected under the same rationale.
Regarding claims 31-34, claims 31-34 recite method steps that are similar in scope to the method steps recited in claims 10-13 and therefore are rejected under the same rationale.

Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Green (US 2024/0282130) in view of More et al. (US 2019/0303403) in view of Saharia et al. in view of Endras et al. (US 2019/0294878) in view of Ding et al. in view of Ruiz et al. in view of Park et al. (US 2022/0374602), as applied to claim 28, in further view of Gaither et al. (US 2014/0047429).
Regarding claim 29, claim 29 recites method that is similar in scope to the method recited in claim 8 and therefore is rejected under the same rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYLE ZHAI whose telephone number is (571)270-3740. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ke Xiao can be reached at (571) 272 - 7776. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KYLE ZHAI/               Primary Examiner, Art Unit 2612

Read full office action

Prosecution Timeline

Mar 01, 2023

Application Filed

Jun 27, 2025

Non-Final Rejection — §103

Aug 01, 2025

Interview Requested

Aug 18, 2025

Examiner Interview Summary

Aug 18, 2025

Applicant Interview (Telephonic)

Sep 29, 2025

Response Filed

Oct 17, 2025

Final Rejection — §103

Nov 22, 2025

Interview Requested

Dec 02, 2025

Examiner Interview Summary

Dec 02, 2025

Applicant Interview (Telephonic)

Jan 20, 2026

Request for Continued Examination

Jan 27, 2026

Response after Non-Final Action

Feb 13, 2026

Non-Final Rejection — §103

Mar 12, 2026

Interview Requested

Mar 19, 2026

Applicant Interview (Telephonic)

Mar 19, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

17/916,043

Patent 12602879

METHOD AND DEVICE FOR PROVIDING SURGICAL GUIDE USING AUGMENTED REALITY

2y 5m to grant Granted Apr 14, 2026

18/412,804

Patent 12594123

VIRTUAL REALITY SYSTEM WITH CUSTOMIZABLE OPERATION ROOM

2y 5m to grant Granted Apr 07, 2026

18/229,234

Patent 12590811

METHOD, APPARATUS, AND PROGRAM FOR PROVIDING IMAGE-BASED DRIVING ASSISTANCE GUIDANCE IN WEARABLE HELMET

2y 5m to grant Granted Mar 31, 2026

18/356,303

Patent 12573162

MODELLING METHOD FOR MAKING A VIRTUAL MODEL OF A USER'S HEAD

2y 5m to grant Granted Mar 10, 2026

17/948,478

Patent 12566580

HOLOGRAPHIC PROJECTION SYSTEM, METHOD FOR PROCESSING HOLOGRAPHIC PROJECTION IMAGE, AND RELATED APPARATUS

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

75%

Grant Probability

93%

With Interview (+18.6%)

3y 0m

Median Time to Grant

High

PTA Risk

Based on 473 resolved cases by this examiner. Grant probability derived from career allow rate.

METHOD AND SYSTEM OF GENERATING CUSTOMIZED IMAGES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email