Last updated: April 19, 2026

Application No. 18/595,096

CUSTOM IMAGE AND CONCEPT COMBINER USING DIFFUSION MODELS

Non-Final OA §103

Filed

Mar 04, 2024

Examiner

LHYMN, SARAH

Art Unit

2613

Tech Center

2600 — Communications

Assignee

Adobe Inc.

OA Round

1 (Non-Final)

Interview Optional

— +15.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 546 resolved cases, 2023–2026

Examiner Intelligence

LHYMN, SARAH View full profile →

Grants 65% — above average

Career Allow Rate

357 granted / 546 resolved

+3.4% vs TC avg

Strong +15% interview lift

Without

With

+15.2%

Interview Lift

resolved cases with interview

Typical timeline

2y 4m

Avg Prosecution

30 currently pending

Career history

576

Total Applications

across all art units

Statute-Specific Performance

§101

5.4%

-34.6% vs TC avg

§103

63.2%

+23.2% vs TC avg

§102

5.9%

-34.1% vs TC avg

§112

15.3%

-24.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 546 resolved cases

Office Action

§103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Election/Restrictions
Claims 10-16 are withdrawn (now, cancelled) from further consideration pursuant to 37 CFR 1.142(b) as being drawn to a nonelected invention of Group II, there being no allowable generic or linking claim. Election was made without traverse in the reply filed on 13 February 2026.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 3, 9, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Lin (U.S. Patent App. Pub. No. 2022/0036127 A1) in view of Anandkumar (U.S. Patent App. Pub. No. 2024/0095534 A1). 

	Regarding claim 1: 
	Lin teaches: a computer-implemented method (para. 5, methods executed by computers), comprising: 
	receiving a plurality of input modalities comprising multiple images and a text input in a natural language (Fig. 2: receiving input image 110 and language based editing instruction 115 – these are two input modalities (image and language). 
	In terms of “multiple images”, the Anandkumar reference teaches that it is known for neural network systems, to receive multiple images as inputs (claim 2, para. 573) (along with text inputs, see claim 3).  Lin is also neural network relevant (para 33)); 
	generating image embeddings for the multiple images (Lin, Fig. 3A: 330, embed image feature maps from the input image(-s)) and a text embedding for the text input (Lin, Fig. 3A; 335: embedding the text input); and 
	generating an output image based on the image embeddings and the text embedding by a machine learning model…. (Fig.3A: 350, construct a new image including modified visual attributes of the input image).  
	Regarding the output image comprising portions of the multiple images, Anandkumar teaches that modifications or variations of one or more images based on text prompts is known (para. 135, 202).  Portions of multiple images being used to generate an output image, based also on a text prompt, per both references, is an obvious modification taught by the prior art and within the purview of one of ordinary skill in the art.  Modifying the applied references, such to include multiple input images, per Anandkumar, to generate the output image of Lin, is all of taught and suggested by the prior art, and would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A).  
	The prior art included each element recited in claim 1, although not necessarily in a single embodiment, with the only difference being between the claimed element and the prior art being the lack of actual combination of certain elements in a single prior art embodiment, as described above. 
	One of ordinary skill in the art could have combined the elements as claimed by known methods, and in that combination, each element merely performs the same function as it does separately. One of ordinary skill in the art would have also recognized that the results of the combination were predictable as of the effective filing date of the claimed invention.


	Regarding claim 3:
	Lin and/or Anandkumar teach: the method of claim 1, wherein the plurality of input modalities includes at least one of: one or more images, one or more text inputs, and any combination thereof (Lin, Fig. 2: receiving input image 110 and language based editing instruction 115 – these are two input modalities (image and language)) (Anandkumar, para. 620, text and images as inputs). 
	It would have been obvious for one of ordinary skill in the art, as of the effective filing date of Applicant’s claims, to have further modified the applied reference(-s) in view of same to have obtained the above, motivated to make use of known machine learning to receive and/or modify inputs of varied style or type.
	

	Regarding claim 9:
	Anandkumar teaches: the method of claim 1, wherein the machine learning model includes at least one of: a diffusion machine learning model, a generative machine learning model, and any combination thereof (para. 586, generative model).  
	It would have been obvious for one of ordinary skill in the art, as of the effective filing date of Applicant’s claims, to have further modified the applied reference(-s) in view of same to have obtained the above, motivated to make use of known machine learning to receive and/or modify inputs. 


	Regarding claim 17: see also claim 1. 
	Lin teaches: a non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations (claim 1) comprising. The operations of claim 17 correspond to the method of claim 1; the same rationale for rejection applies. 



	Regarding claim 20: see claim 9. 
	These claims are similar; the same rationale for rejection applies. 



Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Anandkumar and further in view o  Kouris (U.S. Patent App. Pub. No. 2023/0128637)

	Regarding claim 2:
	It would have been obvious for one of ordinary skill in the art to have combined and modified the applied reference(-s), in view of same, to have obtained: the method of claim 1, wherein the machine learning model has been trained using a reference image and a plurality of portions of the reference image by semantically arranging the plurality of portions of the reference image in accordance with a structure of the reference image, and the results of the modification would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A).  
	Anandkumar teaches that it is known to compare training outputs to a set of expected or desired outputs, such as ground truth in supervised learning (para. 140-42). Labeled data (par. 142, 587). Ground truth data corresponds to a reference image. Regarding a reference image whereby portions are semantically arranged in accordance with the reference image structure, Kouris teaches a training a system for semantic image segmentation, whereby image portions are arranged by image structure (e.g. Figs. 5A-6A).  Modifying the applied references, in view of Kouris, such that the reference image and training is done via semantically arranging image portions in accordance with the reference image, per Kouris, motivated to train a system for effective image segmentation, to, i.e. better process input images (Lin, Fig. 6B, the cat or sink), and/or image segmentation, classification, object detection (Anandkumar, para. 145), is all of taught and suggested by the prior art, and would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A).  
	The prior art included each element recited in claim 2, although not necessarily in a single embodiment, with the only difference being between the claimed element and the prior art being the lack of actual combination of certain elements in a single prior art embodiment, as described above. 
	One of ordinary skill in the art could have combined the elements as claimed by known methods, and in that combination, each element merely performs the same function as it does separately. One of ordinary skill in the art would have also recognized that the results of the combination were predictable as of the effective filing date of the claimed invention.

Claim(s) 4-7 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Anandkumar, and further in view of Qu (U.S. Patent App. Pub. No. 2025/0104309 A1). 

	Regarding claim 4:
	It would have been obvious for one of ordinary skill in the art to have combined and modified the applied reference(-s), in view of same, to have obtained: the method of claim 3, wherein the image embeddings are generated based on the one or more images and one or more image portions embeddings generated based on one or more portions of the one or more images, and the results of the modification would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A).  
	Qu teaches that it is known to generate image embeddings based on one or more images (e.g. Figs. 4, 5 and 7), and one or more image portions embeddings (Fig. 7, objects 1, 2….N from an input image, used to generate a feature map for said object. See also para. 16).  Modifying the applied references, such to include the above features of Qu in the systems of Lin and Anandkumar, such to be able to receive and process inputs of various modalities (all three references teach this), is all of taught and suggested by the prior art, and would have been obvious and predictable to one of ordinary skill. 
	The prior art included each element recited in claim 4, although not necessarily in a single embodiment, with the only difference being between the claimed element and the prior art being the lack of actual combination of certain elements in a single prior art embodiment, as described above. 
	One of ordinary skill in the art could have combined the elements as claimed by known methods, and in that combination, each element merely performs the same function as it does separately. One of ordinary skill in the art would have also recognized that the results of the combination were predictable as of the effective filing date of the claimed invention.


	Regarding claim 5:
	Anandkumar teaches: the method of claim 4, further comprising converting the image embeddings and the one or more image portions embeddings to a uniform dimension (paras. 74-75, OpenAI’s CLIP model generates embeddings with the same number of dimensions. Modifying the applied references, in view of same, such to use CLIP for embeddings, known perform said task, is all of taught and suggested by the prior art, and would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A).  
	One of ordinary skill in the art could have combined the elements as claimed by known methods, and in that combination, each element merely performs the same function as it does separately. One of ordinary skill in the art would have also recognized that the results of the combination were predictable as of the effective filing date of the claimed invention.


	Regarding claim 6:
	Anandkumar teaches: the method of claim 4, wherein the one or more portions of the one or more images include one or more randomly generated portions of the one or more images(paras. 65-68, randomly generated images as a pre-processing is known. Applying this to the portions of images, per mapping in claim 4, is all of taught and suggested by the prior art, and would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A)).  
	One of ordinary skill in the art could have combined the elements as claimed by known methods, and in that combination, each element merely performs the same function as it does separately. One of ordinary skill in the art would have also recognized that the results of the combination were predictable as of the effective filing date of the claimed invention.


	Regarding claim 7:
	Anandkumar teaches: the method of claim 6, wherein a size of each of the one or more portions of the one or more images is randomly determined (paras. 65-68, random cropping, zooming, scaling or otherwise adjusting as a pre-processing step for images is known. Applying this to the portions of images, per mapping in claim 4, is all of taught and suggested by the prior art, and would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A)).  
	One of ordinary skill in the art could have combined the elements as claimed by known methods, and in that combination, each element merely performs the same function as it does separately. One of ordinary skill in the art would have also recognized that the results of the combination were predictable as of the effective filing date of the claimed invention.


	Regarding claim 18: see also claims 3, 4. 
	Claim 18 is a combination of claims 3 and 4, with an additional feature below, mapped to either Qu or Lin. See below. 
	the non-transitory computer-readable medium of claim 17, wherein the plurality of input modalities include at least one of: one or more images, one or more text inputs, and any combination thereof (claim 3); 
	wherein the image embeddings are generated based on the one or more images and one or more image portions embeddings generated based on one or more portions of the one or more images (claim 4); and 
	one or more text embeddings generated based on the text input (Qu, Fig. 3, or Lin, Fig. 2).  
	It would have been obvious for one of ordinary skill in the art to have further modified the applied reference(-s), in view of same, to have obtained the above, and the results of the modification would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A).  
	The prior art included each element recited in claim 18, although not necessarily in a single embodiment, with the only difference being between the claimed element and the prior art being the lack of actual combination of certain elements in a single prior art embodiment, as described above. 
	One of ordinary skill in the art could have combined the elements as claimed by known methods, and in that combination, each element merely performs the same function as it does separately. One of ordinary skill in the art would have also recognized that the results of the combination were predictable as of the effective filing date of the claimed invention.



Claim(s) 8 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin in view of Anandkumar and further in view of Huang (U.S. Patent No. 11,463,455 B1).

	Regarding claim 8:
	It would have been obvious for one of ordinary skill in the art to have combined and modified the applied reference(-s), in view of same, to have obtained: the method of claim 1, further comprising assigning one or more weights to each of the image embeddings and the text embedding, each weight in the one or more weights is associated with a probability of not using at least one of the image embeddings and the text embedding, and the results of the modification would have been obvious and predictable to one of ordinary skill in the art as of the effective filing date of the claimed invention.  See MPEP §2143(A).  
	Huang teaches that it is known to assign weights to embeddings (i.e. a character embedding, per claim 3, as an example).  Each weight is associated with a probability (example: claim 3, a probability of obfuscation). Huang also teaches using a Softmax function (C9, last partial paragraph to C10), which is a mathematical function that converts a tuple of real numbers into a probability distribution over those numbers.  See also Huang, C11, last partial paragraph to C12. While the probability of interests in Huang is related to obfuscation, this is non-limiting use of Softmax (which does not limit itself to any specific intended use probability focus).  Modifying the applied references, such to apply the teaching of Huang and probabilities to embeddings, per Huang, to the embeddings mapped in claim 1, and the probability of not using one of the embeddings, is all of taught and suggested by the prior art, and would have been obvious and predictable to one of ordinary skill. The choice of probability also would have been an obvious design choice for one of ordinary skill, based on intended use or output design of system and design preferences. 
	Also note: for claim interpretation purposes, Applicant’s specification describes “not using” probabilities in the context of training, not output image generation. See [0028] of specification as filed. For weights assigned in training, see Anandkumar, paras. 126, 139). 
	The prior art included each element recited in claim 8, although not necessarily in a single embodiment, with the only difference being between the claimed element and the prior art being the lack of actual combination of certain elements in a single prior art embodiment, as described above. 
	One of ordinary skill in the art could have combined the elements as claimed by known methods, and in that combination, each element merely performs the same function as it does separately. One of ordinary skill in the art would have also recognized that the results of the combination were predictable as of the effective filing date of the claimed invention.


	Regarding claim 19: see claim 8. 
	These claims are similar; the same rationale for rejection applies. 



 Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure, relevant to machine learning and image/text manipulations.
*   *   *   *   *
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sarah Lhymn whose telephone number is (571)270-0632. The examiner can normally be reached M-F, 9:00 AM to 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Sarah Lhymn
Primary Examiner
Art Unit 2613



/Sarah Lhymn/Primary Examiner, Art Unit 2613

Read full office action

Prosecution Timeline

Mar 04, 2024

Application Filed

Mar 18, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/576,781

Patent 12602882

AUGMENTED REALITY DISPLAY DEVICE AND AUGMENTED REALITY DISPLAY SYSTEM

2y 5m to grant Granted Apr 14, 2026

18/587,212

Patent 12602764

METHODS OF ARTIFICIAL INTELLIGENCE-ASSISTED INFRASTRUCTURE ASSESSMENT USING MIXED REALITY SYSTEMS

2y 5m to grant Granted Apr 14, 2026

18/678,551

Patent 12602746

SYSTEM AND METHOD FOR BACKGROUND MODELLING FOR A VIDEO STREAM

2y 5m to grant Granted Apr 14, 2026

18/502,679

Patent 12585888

AUTOMATICALLY GENERATING DESCRIPTIONS OF AUGMENTED REALITY EFFECTS

2y 5m to grant Granted Mar 24, 2026

18/536,822

Patent 12586163

INTERACTIVELY REFINING A DIGITAL IMAGE DEPTH MAP FOR NON DESTRUCTIVE SYNTHETIC LENS BLUR

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

65%

Grant Probability

81%

With Interview (+15.2%)

2y 4m

Median Time to Grant

Low

PTA Risk

Based on 546 resolved cases by this examiner. Grant probability derived from career allow rate.

CUSTOM IMAGE AND CONCEPT COMBINER USING DIFFUSION MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email