Last updated: April 19, 2026
Application No. 18/527,016
IMPLEMENTING DIALOG-BASED IMAGE EDITING

Final Rejection §103
Filed
Dec 01, 2023
Examiner
TSWEI, YU-JANG
Art Unit
2614
Tech Center
2600 — Communications
Assignee
Lemon Inc.
OA Round
2 (Final)
Interview Optional

— +17.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 447 resolved cases, 2023–2026
Examiner Intelligence

TSWEI, YU-JANG View full profile →
Grants 84% — above average
Career Allow Rate
376 granted / 447 resolved
+22.1% vs TC avg
Strong +17% interview lift
Without
With
+17.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
44 currently pending
Career history
491
Total Applications
across all art units
Statute-Specific Performance

§101
5.5%
-34.5% vs TC avg
§103
66.4%
+26.4% vs TC avg
§102
5.6%
-34.4% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 447 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the Amendment filed on 02/05/2026.
Claims 1-20 are pending. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4, 12, 14, 17, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wei et al. (“DialogPaint A Dialog-based Image Editing Model”, 20231018, arXiv, hereinafter Wei) in view of Song et al. (“LLM-Planner Few-Shot Grounded Planning for Embodied Agents with Large Language”, 2023, ICCV, hereinafter Song).
Regarding Claim 12, Wei teaches a system, comprising: at least one processor;  and at least one memory comprising computer-readable instructions that upon execution by the at least one processor cause the system to perform operations comprising (Wei, Page 6, Section 4.1 Experimental Setup, " we initialized our model using the weights from Instruct-Pix2Pix…The model was adapted to our image editing dataset using 8 Nvidia Tesla A100 40G GPUs" indicating execution by processors; “we initialized our model using the weights from Instruct-Pix2Pix” implying stored model weights/instructions in memory that are loaded and executed; Page 1, Abstract, "We introduce … a novel framework that bridges conversational interactions with image editing, enabling users to modify images through natural dialogue):  receiving text indicating a task of editing an image (Wei, Page 1, Abstract, "enabling users to modify images through natural dialogue"; Page 5, Section 3.2.1, “Given a natural language prompt describing an image and an editing task”);  generating a list of objects and attributes associated with each of the objects based on the text and the image, wherein the objects are comprised in the image (Wei, Page 4, Section 3.2.2 Image Editing Model, "The textual instruction Tis encoded through CLIP ... Concurrently, the input image I is processed through an encoder ... ," " ... emphasizes context preservation and object-specific edits."; Page 5, Section 3.2.1, “The culmination of these interactions is a clear and precise instruction”; Page 6, Section 3.2.1, “The model’s capability to generate such precise instructions is crucial, as it bridges the gap between user intent and the subsequent image editing operations”), determining operations to be performed on each of the objects;" (Wei, Abstract, “handling tasks such as object replacement, style transfer, and color modification”; Page 4, Section 3.2.1 Dialogue Model -+ Page 5, Section 3.2.2, "The model's capability to generate such precise instructions is crucial, as it bridges the gap between user intent and the subsequent image editing operations performed by the Image Editing Model"), [[ determining an order of performing the operations on an object-by-object basis ]];  generating a plan of implementing the task based on the text and the order of performing the operations, wherein the plan comprises information indicating a set of algorithm tools selected for the task (Wei, Page 4, Section 3.2, "Subsequently, the Image Editing Model is invoked to perform image editing based on the explicit textual instructions derived from the dialogue interactions."
Page 6, Section 4.1 Experimental Setup, "employs the Stable Diffusion architecture, leveraging the foundational principles from lnstructPix2Pix, to execute image editing based on the explicit textual instructions <read on algorithm tools>"); generating an edited image based at least in part on the plan (Wei, Page 1, Abstract, "employs these instructions, along with the input image, to produce the desired output <read on edited image>" Page 6, Section 4.2 Results, "the model edits images according to dialog instructions.")
But Wei does not explicitly disclose determining an order of performing the operations on an object-by-object basis.
However, Song teaches generating a list of objects (Song, Page 2099, “we add the list of objects perceived in the environment so far into the prompt”; Page 3002, “The prompt begins with an intuitive explanation of the task and the list of allowable high-level actions” “further constrain the output space of the LLM to the allowed set of actions and objects.”); determining operations to be performed on each of the objects (Song, Page 2999, Section 1, “We use LLMs to generate high-level plans (HLPs), i.e., a sequence of subgoals (e.g., [Navigation potato, Pickup potato…”]); determining an order of performing the operations on an object-by-object basis (Song, Page 2099, Section 1, “We use LLMs to generate high-level plans (HLPs), i.e., a sequence  <read on order> of subgoals … that the agent needs to achieve, in the specified order”); generating a plan of implementing the task based on the text and the order of performing the operations, wherein the plan comprises information indicating a set of algorithm tools selected for the task (Song, Page 3002, Section 4.2 Prompt Design, "The prompt begins with an intuitive explanation of the task and the list of allowable high-level actions <read on algorithm tools> "; Page 3002, Section 4.1, "We also use logit biases to further constrain the output space of the LLM to the allowed set of actions and objects"; Page 3002, Section 4.1, "generate high-level plans (HLPs) ... as an ordered list of actions/subgoals");
determining an order of performing the operations on an object-by-object basis (Song, Page 3002, Section 4.1, "a sequence <read on order> of subgoals ... in the specified order"; Page 2999, Section 1, "we add the list of objects perceived in the environment so far into the prompt... A new continuation ... will be generated ... based on the observed objects");
Song and Wei are analogous since both of them are dealing with natural language driven task execution grounded in perceptual entities leading to ordered actions that operate on visual inputs. Wei provided a way of dialog-based image editing that maps natural language to editing actions and produces edited images.
Song provided a way of generating ordered, grounded plans that select actions from a set of tools/skills and execute them by using explicit object lists and ordered objected-based plans for selectable actions . Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate ordered plan generation and explicit tool-set selection taught by Song into modified invention of Wei such that [a system with at least one processor and memory forms an explicit plan that orders per-object operations and identifies which algorithm tools to use, then executes to generate the edited image to improve multi-step task reliability and to provide explicit ordered execution of object edits which will enhance interpretability of editing operations. The motivation is to ensure reliable, multi-step task execution grounded to detected objects and !available tools, improving interpretability and performance which is discussed by Song in Section 1 and Section 3.
Regarding Claim 1, it recites limitations similar in scope to the limitations of Claim 12 but as a method and the combination of Wei and Song teaches all the limitations as of Claim 12. Therefore is rejected under the same rationale.
Regarding Claim 4, the combination of Wei and Song teaches the invention in Claim 1.
The combination further teaches further comprising: determining a plurality of visual algorithms based on the image and the task; and generating the list of objects (Wei, Page 6, Section 3.2.2, " Our approach employs the Stable Diffusion architecture, leveraging the foundational principles from InstructPix2Pix, to execute image editing based on the explicit textual instructions Page 4, Section 3.2.2, " The textual instruction T is encoded through CLIP to obtain a latent vector representation CT . Concurrently, the input image I is processed through an encoder to derive its latent representation CI")
Wei does not explicitly disclose but Song teaches the attributes associated with each of the objects using the plurality of visual algorithms (Song, Page 3002, Section 4.2, "The prompt begins with an intuitive explanation of the task and the list of allowable high-level actions <read on plurality of ... tools/algorithms>" Page 2999, Section 1, "We add the list of objects perceived in the environment so far into the prompt”).
Song and Wei are analogous since both of them are dealing with natural language driven task execution grounded in perceptual entities leading to ordered actions that operate on visual inputs. Wei provided a way of dialog-based image editing that maps natural language to editing actions and produces edited images. Song provided a way of using different actions and task to the prompt during the process. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate action and tasks taught by Song into modified invention of Wei such that when selecting among multiple algorithms (Stable Diffusion, lnstructPix2Pix) can be based on the image and task and organizes operation per object, a predictable integration within system architecture to improve selection.
Regarding Claim 14, it recites limitations similar in scope to the limitations of Claim 4 and therefore is rejected under the same rationale.
Regarding Claim 17, it recites limitations similar in scope to the limitations of claim 12 and the combination of Wei and Song teaches all the limitations as of Claim 12. And Wei discloses these features can be implemented on a computer-readable storage medium (Wei, Page 7, “we initialized our model using the weights from Instruct- Pix2Pix”; Page 6, “the model was trained on 8 Nvidia Tesla A100 40G GPUs; Page 9, “We constructed a unique dataset containing both dialogue and image editing samples, which played a pivotal role in training our model to understand and execute user instructions effectively”; it is noted the 8 Nvidia Tesla A100 40G GPUs have massive combined memory which can store the instructions for execution).
Regarding Claim 19, it recites limitations similar in scope to the limitations of Claim 4 and therefore is rejected under the same rationale.

Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wei et al. (“DialogPaint A Dialog-based Image Editing Model”, 20231018, arXiv, hereinafter Wei) in view of Song et al. (“LLM-Planner Few-Shot Grounded Planning for Embodied Agents with Large Language”, 2023, ICCV, hereinafter Song) as applied to Claim 1 above and further in view of Aziz et al. (US 20240045780 A1, hereinafter Aziz).

Regarding Claim 2, the combination of Wei and Song teaches the invention in Claim 1.
The combination further teaches comprising: causing to display at least one sentence in natural language in response to receiving the text (Wei, Page 6, Section 3.2.1, “Given a natural language prompt describing an image and an editing task, the model is trained to generate a series of dialogue responses” Page 1, Abstract , "modify images through natural dialogue < read on sentence in natural language>",  Page 3, Section 3.1 Framework Overview, references to "dialogue interactions" where the system engages users via text messages during the editing workflow < read on causing to display at least one sentence in natural language in response to receiving the text>”).
The combination does not explicitly disclose but Aziz teaches the at least one sentence configured to guide a user to upload the image (Aziz, Paragraph [0109], “Popup window 1000 may include additional features such as a button to silence notifications from the robot, a button for the user to select only those notifications that the user deems important, or an upload button that allows the user to upload a media file instructing”
Aziz and Wei are analogous since both of them are dealing with Large Language Modeling process. Wei provided a way of dialog-based image editing that maps natural language to editing actions and produces edited images. Aziz provided a way of allow user to upload the media during the LLM process. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate user guided media upload taught by Aziz into modified invention of Wei such that during the image editing, system will be able to guide the user to upload the media in order for further process which will provide more intuitive and user friendly editing environment.

Claim(s) 3, 13, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wei et al. (“DialogPaint A Dialog-based Image Editing Model”, 20231018, arXiv, hereinafter Wei) in view of Song et al. (“LLM-Planner Few-Shot Grounded Planning for Embodied Agents with Large Language”, 2023, ICCV, hereinafter Song) as applied to Claim 1, 12 above and further in view of Hsu et al. (US 7177798 B2, hereinafter Hsu).

Regarding Claim 3, the combination of Wei and Song teaches the invention in Claim 1.
The combination further teaches further comprising: causing to display at least one sentence in natural language based on determining additional information is needed to complete the task (Wei, Page 3, Section 3.1 Framework Overview, references to "dialogue interactions" where the system issues messages during the interaction when more detail is needed to proceed <read on determining additional information is needed> and those messages are in text form <read on causing to display at least one sentence in natural language> that function to "request" specifics < read on request a user to input the additional information>) Page 1, Abstract, "dialog <read on sentence in natural language>").,
The combination does not explicitly disclose but Hsu teaches the at least one sentence configured to request a user to input the additional information (Hsu, Column 10, Line 10-11, “system 101 may prompt the user to "Please enter a search (natural language or keyword)”).
Hsu and Wei are analogous since both of them are dealing with natural language driven task execution grounded in perceptual entities leading to ordered actions that operate on visual inputs. Wei provided a way of dialog-based image editing that maps natural language to editing actions and produces edited images. Hsu provided a way of generating next action based on user input natural language of additional information Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate ordered user input of additional information taught by Hsu into modified invention of Wei such that when dealing with natural language driven task, system will be able to provide user with chance to input their request through input prompt which will add additional function of interactive process and increase the efficiency to generate the edited image. 
Regarding Claim 13, it recites limitations similar in scope to the limitations of Claim 3 and therefore is rejected under the same rationale.
Regarding Claim 18, it recites limitations similar in scope to the limitations of Claim 3 and therefore is rejected under the same rationale.

Claim(s) 5, 6, 9, 10, 15, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wei et al. (“DialogPaint A Dialog-based Image Editing Model”, 20231018, arXiv, hereinafter Wei) in view of Song et al. (“LLM-Planner Few-Shot Grounded Planning for Embodied Agents with Large Language”, 2023, ICCV, hereinafter Song) as applied to Claim 1 and further in view of Shen et al. (“HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face”, 2023, hereinafter Shen).

Regarding Claim 5, the combination of Wei and Song teaches the invention in Claim 1.
The combination does not explicitly disclose but Shen teaches generating textual descriptions corresponding to algorithm tools based on specifications of the algorithm tools  (Shen, §3 .2 Model Selection, "we use model descriptions as the language interface lo connect each model ... we first gather the descriptions of expert models from the ML community (e. g. , Hugging Face)"), and wherein a description corresponding to each algorithm tool comprises partial information about each algorithm tool (Shen, §5 Limitations, "maximum token length is always limited ... how to briefly and effectively summarize model descriptions is also worthy of exploration"), and a specification comprises complete information about each algorithm tool (Shen, Appx. A.1.2 Model Descriptions, " These descriptions encompass various aspects of the model, such as its function, architecture, supported languages and domains, licensing, and other relevant details. These comprehensive model descriptions play a crucial role ").
Shen and Wei are analogous since both deal with LLM-mediated orchestration of external models via a language interface. Wei provided dialog-based editing and execution of image models; Shen provided description-from-specification and summarization for token-limited LLM inputs. Therefore, it would have been obvious to one of ordinary skill to incorporate Shen's description-generation scheme into Wei so that each tool has a complete specification and a short text description. The motivation is to enable an LLM controller to reason over tool capabilities while respecting context limits (HuggingGPT §3 .2 and §5).
	Regarding Claim 6, the combination of Wei and Song and Shen teaches the invention in Claim 5.
The combination further teaches inputting the descriptions into a large language model (Shen, §3.2 Model Selection,"available models are presented as options within a given context ... HuggingGPT is able to select the most appropriate model"), in response to determining that a total size of the descriptions is less than or equal to an input limit of the large language model (Shen, Page 6, “Due to the limits of maximum context length, it is not feasible to encompass the information of all relevant models within one prompt”); and selecting one or more algorithm tools related to the task based on the descriptions (Shen, Introduction, "the LLM acts as a controller to manage Al models ... select models according to their function descriptions and execute each subtask'') .
Shen and Wei are analogous since both involve LLM-driven model control for image editing. Wei provided dialogue input and model execution; Shen provided feeding model descriptions into an LLM and selection from them. Therefore, it would have been obvious to incorporate Shen's description-based tool selection into Wei so that the LLM receives tool descriptions and selects the appropriate editing algorithm(s). The motivation is to improve automatic matching between user instruction and tool choice (HuggingGPT §3.2, Intro).

Regarding Claim 9, the combination of Wei and Song teaches the invention in Claim 1.
The combination does not explicitly disclose but Shen teaches generating information indicating a particular object in the list to which each of the set of algorithm tools is applied (Shen 's task-parsing/specification and resource-dependency records constitute the claimed " information": "a standardized template for tasks ... ' task', ' id', 'dep', and ' args"' (template; <read on information>) and "resource dependency ... sets this symbol (i.e., <resource>-task_id) to the corresponding resource subfield in the arguments ... dynamically replaces this symbol with the resource generated by the prerequisite task" ( <read on indicating a particular object in the list ... to which each ... tool is applied>); the worked example shows object detection with "bounding box" and a chain of selected models (<read on set of algorithm tools applied to objects)),
Shen and Wei are analogous- both orchestrate LLM-driven pipelines that select and apply tools/models to visual tasks. Wei provides dialogue-driven image editing with object-specific edits; Shen provides an explicit task/argument/resource record that captures which tool acts on which detected object (e.g., bounding boxes) in a plan. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate Shen's task-specification and resource-dependency recording into Wei so that DialogPaint generates information indicating a particular object for which each tool is applied, improving traceability and determinism of multi-tool edits.

 Regarding Claim 10, the combination of Wei and Song teaches the invention in Claim 1.
The combination further teaches further comprising: generating executable code based at least in part on the plan, wherein the generating executable code based at least in part on the plan  (Wei, Page 1, Abstract, "employs these instructions, along with the input image, to produce the desired output <read on edited image>" Page 6, Section 4.2 Results, "the model edits images according to dialog instructions.")
The combination does not explicitly disclose but Shen teaches generating the executable code based on a complete specification corresponding to each of the set of algorithm tools (Shen, §5 Limitations, "maximum token length is always limited ... how to briefly and effectively summarize model descriptions is also worthy of exploration"  Appx. A.1.2 Model Descriptions, " These descriptions encompass various aspects of the model, such as its function, architecture, supported languages and domains, licensing, and other relevant details. These comprehensive model descriptions play a crucial role"; Page 3000, “we show that LLMPlanner can generate complete and high-quality high-level plans that are grounded in the current environment with a fraction of labeled data.”).
Shen and Wei are analogous since both deal with LLM-mediated orchestration of external models via a language interface. Wei provided dialog-based editing and execution of image models; Shen provided a LLM-Planner with complete code generation for set of tools. Therefore, it would have been obvious to one of ordinary skill to incorporate Shen's description-generation scheme into Wei so that each tool has a complete specification and a short text description which will enable an LLM controller to reason over tool capabilities.
Regarding Claim 15, the combination of Wei and Song and Shen teaches the invention in Claim 12.
The combination further teaches  inputting the descriptions into a large language model (Shen, §3.2 Model Selection,"available models are presented as options within a given context ... HuggingGPT is able to select the most appropriate model"), and selecting one or more algorithm tools related to the task based on the descriptions (Shen, Introduction, "the LLM acts as a controller to manage Al models ... select models according to their function descriptions and execute each subtask''). generating descriptions corresponding to algorithm tools based on specifications of the algorithm tools, wherein a description corresponding to each algorithm tool (Shen, §3 .2 Model Selection, "we use model descriptions as the language interface lo connect each model ... we first gather the descriptions of expert models from the ML community (e. g. , Hugging Face)"), comprises partial information about each algorithm tool (Shen, §5 Limitations, "maximum token length is always limited ... how to briefly and effectively summarize model descriptions is also worthy of exploration"), and a specification comprises complete information about each algorithm tool (Shen, Appx. A.1.2 Model Descriptions, " These descriptions encompass various aspects of the model, such as its function, architecture, supported languages and domains, licensing, and other relevant details. These comprehensive model descriptions play a crucial role ");  inputting the descriptions into a large language model (Shen, §3.2 Model Selection,"available models are presented as options within a given context ... HuggingGPT is able to select the most appropriate model") in response to determining that a total size of the descriptions is less than or equal to an input limit of the large language model (Shen, Page 6, “Due to the limits of maximum context length, it is not feasible to encompass the information of all relevant models within one prompt”); and selecting one or more algorithm tools related to the task of editing the image based on the descriptions (Shen, Introduction, "the LLM acts as a controller to manage Al models ... select models according to their function descriptions and execute each subtask'').
Shen and Wei are analogous since both involve LLM-driven model control for image editing. Wei provided dialogue input and model execution; Shen provided feeding model descriptions into an LLM and selection from them. Therefore, it would have been obvious to incorporate Shen's description-based tool selection into Wei so that the LLM receives tool descriptions and selects the appropriate editing algorithm(s). The motivation is to improve automatic matching between user instruction and tool choice (HuggingGPT §3.2, Intro).
Regarding Claim 20, it recites limitations similar in scope to the limitations of Claim 15 and therefore is rejected under the same rationale.


Claim(s) 7, 16  is/are rejected under 35 U.S.C. 103 as being unpatentable over Wei et al. (“DialogPaint A Dialog-based Image Editing Model”, 20231018, arXiv, hereinafter Wei) in view of Song et al. (“LLM-Planner Few-Shot Grounded Planning for Embodied Agents with Large Language”, 2023, ICCV, hereinafter Song), further in view of Shen et al. (“HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face”, 2023, hereinafter Shen) as applied to Claim 5 above and further in view of Wu et al. (“Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models”, hereinafter Wu).

Regarding Claim 7, the combination of Wei and Song and Shen teaches the invention in Claim 5.
The combination further teaches dividing the algorithm tools into a plurality of batches . .. input limit ... (Shen, §3.2 In-context Task-model Assignment, "due to the limits of maximum context length ... not feasible to encompass ... within one prompt. To mitigate this issue, we first filter ... then select the top-K models as the candidates ... substantially reduce the token usage in the prompt"); and selecting one or more algorithm tools in each ... based on the descriptions (Shen, §3.2, "select the most appropriate model for each task ... use model descriptions as the language interface")
Shen and Wei are analogous since both concern LLM-driven model orchestration under context length limits. Wei provided dialog-editing without input-limit handling; Shen provided subset (top-K) prompt filtering for token limits; Therefore, it would have been obvious to one of ordinary skill to incorporate Shen's techniques into Wei so that the system divides tool descriptions into manageable batches, inputs them sequentially, and selects tools per batch. The motivation is to ensure LLM processing within token limits while preserving accurate tool selection.
The combination does not explicitly disclose but Wu teaches sequentially inputting descriptions ... input limit of a large language model (Wu, §3, "we truncate the dialogue history with a maximum length threshold to meet the input length of ChatGPT model" ), and describes a Prompt Manager that "records the interaction history and controls the inputs/outputs of various visual foundation models" (§3 Prompt Manager ).
Wu and Wei are analogous since both deal with dialogue-based multi-model image editing and explicit input-length management. Wei provided dialog-editing without input-limit handling; Wu provided history truncation and prompt manager sequencing. Therefore, it would have been obvious to one of ordinary skill to incorporate Wu's techniques into Wei so that the system can adjust the data based on the limit for inputs when selects tools per batch which maintain the suitability of the system.
Regarding Claim 16, the combination of Wei and Song and Shen teaches the invention in Claim 12.
The combination does not explicitly disclose but Shen teaches generating descriptions corresponding to algorithm tools based on specifications of the algorithm tools (Shen, §3 .2 Model Selection, "we use model descriptions as the language interface lo connect each model ... we first gather the descriptions of expert models from the ML community (e. g. , Hugging Face)"),, wherein a description corresponding to each algorithm tool comprises partial information about each algorithm tool, and a specification comprises complete information about each algorithm tool (Shen, §5 Limitations, "maximum token length is always limited ... how to briefly and effectively summarize model descriptions is also worthy of exploration"), and a specification comprises complete information about each algorithm tool (Shen, Appx. A.1.2 Model Descriptions, " These descriptions encompass various aspects of the model, such as its function, architecture, supported languages and domains, licensing, and other relevant details. These comprehensive model descriptions play a crucial role "); dividing the algorithm tools into a plurality of batches . .. input limit ... (Shen, §3.2 In-context Task-model Assignment, "due to the limits of maximum context length ... not feasible to encompass ... within one prompt. To mitigate this issue, we first filter ... then select the top-K models as the candidates ... substantially reduce the token usage in the prompt")' and selecting one or more algorithm tools in each ... based on the descriptions (Shen, §3.2, "select the most appropriate model for each task ... use model descriptions as the language interface").
 Shen and Wei are analogous since both concern LLM-driven model orchestration under context length limits. Wei provided dialog-editing without input-limit handling; Shen provided subset (top-K) prompt filtering for token limits; Therefore, it would have been obvious to one of ordinary skill to incorporate Shen's techniques into Wei so that the system divides tool descriptions into manageable batches, inputs them sequentially, and selects tools per batch. The motivation is to ensure LLM processing within token limits while preserving accurate tool selection.
The combination does not explicitly disclose but Wu teaches the combination does not explicitly disclose but Wu teaches sequentially inputting descriptions ... input limit of a large language model (Wu, §3, "we truncate the dialogue history with a maximum length threshold to meet the input length of ChatGPT model" ), and describes a Prompt Manager that "records the interaction history and controls the inputs/outputs of various visual foundation models" (§3 Prompt Manager ).
Wu and Wei are analogous since both deal with dialogue-based multi-model image editing and explicit input-length management. Wei provided dialog-editing without input-limit handling; Wu provided history truncation and prompt manager sequencing. Therefore, it would have been obvious to one of ordinary skill to incorporate Wu's techniques into Wei so that the system can adjust the data based on the limit for inputs when selects tools per batch which maintain the suitability of the system.
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wei et al. (“DialogPaint A Dialog-based Image Editing Model”, 20231018, arXiv, hereinafter Wei) in view of Song et al. (“LLM-Planner Few-Shot Grounded Planning for Embodied Agents with Large Language”, 2023, ICCV, hereinafter Song) as applied to Claim 1 respectively and further in view of Masuoka et al. ( US 20040230636 A1, hereinafter Masuoka).

Regarding Claim 8, the combination of Wei and Song teaches the invention in Claim 1.
The combination does not explicitly disclose but Shen teaches establishing a mapping relationship between a plurality of editing tasks and a plurality of sets of algorithm tools selected for the plurality of editing tasks; and determining, based on the mapping relationship, a set of selected algorithm tools related to one of the plurality of editing tasks [[in response to receiving an editing task that is the same or similar to the one of the plurality of editing tasks]]" (Shen, Page 4, "workflow includes four stages: task planning, model selection, task execution, and response generation" <read on mapping/selection>; Page 14,  "To format the parsed task, we define the template [{"task": task, "id", task_id, "dep": dependency task_ids, "args":" <read on mapping relationship>; Page 6,  "selecting the most appropriate model for each task in the parsed task list" <read on determining ... Jet of selected algorithm tools>”).
Shen and Wei are analogous since both deal with LLM-mediated orchestration of external models via a language interface. Wei provided dialog-based editing and execution of image models; Shen provided a way of adapt mapping relationship to the task during the LLM data processing. Therefore, it would have been obvious to one of ordinary skill to incorporate Shen's mapping relationship into Wei so that system will be able to create a concrete ask-tool assignment. 
The combination does not explicitly disclose but Masuoka teaches in response to receiving an editing task that is the same or similar to the one of the plurality of editing tasks (Masuoka, Paragraph [0053], "if the user needs to perform the same or similar task in the future, she will have to do all of that again");
Masuoka and Wei are analogous since both of them are dealing with user-driven, multi-step task execution pipelines that interpret user intent and orchestrate a sequence of operations/tools to accomplish the task. Wei provided a way of performing dialogue-based image editing by interpreting user instructions and invoking selected editing operations (algorithm tools) within a plan. Masuoka provided a way of recognizing and handling recurring tasks by expressly addressing when a later task is the "same or similar" to a previous task and motivating reuse of prior task configurations/steps to avoid re-specification (e.g., " if the user needs to perform the same or similar task in the future, she will have to do all of that again. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the reuse trigger taught by Masuoka into the modified invention of Wei such that in response to receiving an editing task that is the same or similar to a previously handled editing task, the system consults the stored mapping relationship and determines the set of selected algorithm tools for that task.
Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wei et al. (“DialogPaint A Dialog-based Image Editing Model”, 20231018, arXiv, hereinafter Wei) in view of Song et al. (“LLM-Planner Few-Shot Grounded Planning for Embodied Agents with Large Language”, 2023, ICCV, hereinafter Song) as applied to Claim 1 and further in view Chan et al. (US 20180081417 A1, hereinafter Chan).

Regarding Claim 11, the combination of Wei and Song teaches the invention in Claim 1.
The combination does not explicitly disclose but Chan teaches sharing or storing the plan; uploading the plan to a server computing system; or exporting the plan to another platform for creating an effect in the another platform (Chan, Paragraph  [0065], “,planning program 200 is offered to a user as-a-service that includes a sharing of plans of activities” [0030], “UI 122 receives input in response to a user of device 120 utilizing natural language, such as written words or spoken words, that device 120 identifies as information and/or commands” [0028], “device 120 may upload one or more plans of activities from user plans 126 to system 102 on a periodic basis and/or as dictated by a user of device 120”).
Chan and Wei are analogous since both of them are dealing with natural language modeling process. Wei provided a way of dialog-based image editing that maps natural language to editing actions and produces edited images. Chan provided a way of allow user sharing and transferring data in between different system with natural language data. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate data sharing between systems taught by Chan into modified invention of Wei such that during the image editing, system will be able to allow user to input the natural language input and sharing in between different system and/or platform which increase the flexibility of the system.
Response to Arguments
Applicant’s arguments with respect to claim 1, 12, 17, filed on 2/5/2026, with respect to rejection under 35 USC § 103 have been considered but is not persuasive.
Applicant assert in claim 1 that prior art does not teaches the limitation “generating a list of objects and attributes associated with each of the objects"
In response to the argument, the rejection does not rely on Wei's latent representations alone as the claimed list. Rather, Wei teaches an image editing framework that necessarily performs object-specific interpretation of image content based on text instructions, which reasonably reads on generating object information used for editing operations. As a matter of fact, prior art Wei teaches in Section 4.2 that "Dialog Paint ... emphasizes context preservation and object-specific edits", Wei further teaches in Section 3.2.1-3.2.2 that the system interprets textual instruction and image jointly via encoded representations and generates precise editing instructions. Wei further teaches in Section 3.2.1 that The Dialogue Model produces explicit instructions that bridge user intent and image editing operations. Since the claim language currently filed does not require a particular data structure format (e.g. , explicit table or symbolic enumeration). A "list" broadly encompasses structured internal representations used by a system to identify objects and associated characteristics for downstream processing. Wei's disclosed mechanism necessarily identifies objects/regions and their attributes to perform the disclosed object-specific edits (e.g., color modification, object replacement). Prior art Song further teaches grounded planning involving objects in the environment, which including Section 4.4 “list of objects perceived in the environment” and in Sections (4.1-4.2) of plans generated based on allowed actions and objects. Thus one of ordinary skill would have understood the combined system to generate object-level structured information from visual input for planning and execution. The claimed "list of objects and attributes" is therefore taught or at least rendered obvious by the combination. Therefore, applicant remark cannot be considered persuasive.
Applicant further assert the prior art does not teaches the limitation "determining operations to be performed on each of the objects"
In response to the argument, Prior art Wei explicitly teaches generation of precise image editing instructions that guide subsequent editing operations, Wei teaches in Section 3.2.1 that Dialogue Model generates "clear and precise instruction ... for the Image Editing Model" and These instructions bridge user intent and "subsequent image editing operations", also in Abstract that DialogPaint supports operations such as object replacement, style transfer, and color modification. All of these disclosures show that the system determines editing operations directed toward specific image content (objects). Prior art Song teaches decomposition of tasks into ordered subgoals tied to objects, Song teaches in Section 4.1 "high-level plans ... as a sequence of subgoals" and subgoals are object-centric actions (e .g. , PickupObject potato, PutObject potato recyclebin). Hence combining Song's explicit per-object action planning with Wei's object specific editing framework would have resulted in determining operations on a  per-object basis as claimed. Therefore, applicant remark cannot be considered persuasive.
Applicant further assert that the prior art does not teaches the limitation “determining an order of performing the operations on an object-by-object basis"
In response to the argument, as stated in the rejection above, prior art Song teaches very clearly in Section 4.1 and 4.4 that generation of high-level plans as an ordered list of actions/subgoals and execution "in the specified order" also re-planning based on observed objects, re inforcing object-oriented ordering. Accordingly, Song squarely teaches ordered per-object operations. Hence the combination fully anticipate the limitation. Therefore, applicant remark cannot be considered persuasive.
Applicant further assert that prior art Wei allegedly does not disclose generating a plan that includes algorithm tools selected for the task.
In response to the argument, Wei teaches in Sections 3.2 and 3.2.2 that integrated framework combining dialogue model and image editing model and execution of editing via Stable Diffusion/lnstructPix2Pix architecture based on explicit instructions. These clearly constitutes selection and invocation of algorithmic components to perform the task. Prior art Song further teaches in Section 4.1, 4.2 that prompts include "allowable high-level actions" and plans constructed using selectable actions/tools constrained from a set and using LLM outputs ordered high-level plans formed from these available actions. Thus, Song provides explicit planning structures involving selectable action/tool sets, while Wei provides concrete image-editing algorithm modules. Hence the combination of prior arts fully anticipate all the limitations. Therefore, applicant remark cannot be considered persuasive.
Applicant further assert that the prior art Wei lacks ordered operations, Wei cannot generate a plan based on such order.
In response to the argument, prior a Wei provides dialogue-driven image editing execution; and prior art Song ordered high-level planning and action sequencing and generation of plans grounded to objects and tools. Thus the combination of the prior art Wei and Song can fulfill the purpose  because obviousness analysis evaluates the combination, not Wei in isolation.
Therefore, the rejection of claims 1, 12 and 17 under 35 U.S.C. §103 over Wei in view of Song is maintained.
In regard to Claims 2-11, 13-16, 18-20, they directly/indirectly depends on independent Claim 1, 12, 17 respectively. Applicant does not argue anything other than the independent claim 1, 12, 17. The limitations in those claims in conjunction with combination previously established as explained.
	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20190340819 A1	MANAGED ACTIONS USING AUGMENTED REALITY
US 20180197103 A1	Techniques for automatically testing/learning the behavior of a system under test (SUT)
US 20220076455 A1	AUGMENTED REALITY HEAD MOUNTED DISPLAYS AND METHODS OF OPERATING THE SAME FOR INSTALLING AND TROUBLESHOOTING SYSTEMS
US 12216996 B2	Reasonable language model learning for text generation from a knowledge graph
US 20190392640 A1	PRESENTATION OF AUGMENTED REALITY IMAGES AT DISPLAY LOCATIONS THAT DO NOT OBSTRUCT USER'S VIEW
US 20230410441 A1	GENERATING USER INTERFACES DISPLAYING AUGMENTED REALITY GRAPHICS
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YUJANG TSWEI whose telephone number is (571)272-6669. The examiner can normally be reached 8:30am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571)272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/YuJang Tswei/Primary Examiner, Art Unit 2614
Read full office action
Prosecution Timeline

Dec 01, 2023
Application Filed
Nov 01, 2025
Non-Final Rejection — §103
Feb 05, 2026
Response Filed
Feb 19, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/689,933
Patent 12579805
AUGMENTED, VIRTUAL AND MIXED-REALITY CONTENT SELECTION & DISPLAY FOR TRAVEL
2y 5m to grant Granted Mar 17, 2026
18/420,243
Patent 12579838
Perspective Distortion Correction on Faces
2y 5m to grant Granted Mar 17, 2026
18/007,045
Patent 12567213
COMPUTER VISION AND ARTIFICIAL INTELLIGENCE METHOD TO OPTIMIZE OVERLAY PLACEMENT IN EXTENDED REALITY
2y 5m to grant Granted Mar 03, 2026
18/657,567
Patent 12567189
RELATIONAL LOSS FOR ENHANCING TEXT-BASED STYLE TRANSFER
2y 5m to grant Granted Mar 03, 2026
18/512,461
Patent 12561930
PARAMETRIC EYEBROW REPRESENTATION AND ENROLLMENT FROM IMAGE INPUT
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
84%
Grant Probability
99%
With Interview (+17.0%)
2y 5m
Median Time to Grant
Moderate
PTA Risk
Based on 447 resolved cases by this examiner. Grant probability derived from career allow rate.