Last updated: April 19, 2026
Application No. 18/789,482
SEMANTIC-BASED IMAGE EDITING METHOD AND SYSTEM, AND MEDIUM

Non-Final OA §101§103
Filed
Jul 30, 2024
Examiner
WILSON, NICHOLAS R
Art Unit
2611
Tech Center
2600 — Communications
Assignee
Shuhui Qu
OA Round
1 (Non-Final)
Interview Optional

— +12.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 537 resolved cases, 2023–2026
Examiner Intelligence

WILSON, NICHOLAS R View full profile →
Grants 87% — above average
Career Allow Rate
467 granted / 537 resolved
+25.0% vs TC avg
Moderate +12% lift
Without
With
+12.1%
Interview Lift
resolved cases with interview
Fast prosecutor
1y 12m
Avg Prosecution
25 currently pending
Career history
562
Total Applications
across all art units
Statute-Specific Performance

§101
9.5%
-30.5% vs TC avg
§103
41.1%
+1.1% vs TC avg
§102
24.0%
-16.0% vs TC avg
§112
14.8%
-25.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 537 resolved cases
Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: text semantic obtaining module, image editing operation module, image refining module in claim 9. Considered a general-purpose processor in conjunction with a specific algorithm.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 101 – Abstract Idea
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1,7, 9-10, 16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more

Regarding claim 1, it recite(s) A semantic-based image editing method, wherein the method comprises: 
obtaining text on which image editing is based and parsing semantic information in the text, wherein the semantic information in the text is used to describe an image editing operation and image content corresponding to the operation; 
performing the image editing operation based on the semantic information in the text, wherein the image editing operation comprises at least one of an operation of generating a new image, an operation of adding content to a to-be-edited image, an operation of modifying content of the to-be-edited image, and an operation of deleting content from the to-be-edited image; and 
refining an edited image to obtain a refined image.

MPEP 2106 III provide a flowchart for the subject matter eligibility test for product and processes. The analysis following the flowchart is as follows:

Step 1: Is the claim to a process, machine, manufacture or composition of matter? 
Yes. It recites a Method, which is a process.

Step 2A, Prong One: Does the claim recite an abstract idea, law of nature, or nature phenomenon?
Yes. The claim recites an abstract idea. 

Obtaining text and parsing semantic information
performing the image editing operation 
refining an edited image

Obtaining text and parsing semantic information is merely a mental process of visualizing the text with and analyzing the meaning of the text in the human mind, as the obtaining and parsing techniques are not described and it is merely a statement of obtaining and parsing without limit to how it is being obtained and parsed.

Performing the image editing operation can be considered a mental process of visualizing an image in the human mind and mentally changing portions as desired as there is no description of how the specific image is edited merely that it is created, modified, a portion added, a portion removed. 

Refining an edited image can be considered a mental process of visualizing the edited image in the human mind and mentally changing portions as desired as there is no description of how the edited image is refined. 


Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. The claim does not recite any additional elements other that the limitations identified as an abstract idea (mental process) in Step 2A Prong One.
Therefore, this judicial exception is not integrated into a practical application because there are no additional elements other than the abstract idea limitations.

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. The claim does not recite any additional elements other that the limitations identified as an abstract idea (mental process) in Step 2A Prong One.
Therefore, this judicial exception is not integrated into a practical application because there are no additional elements other than the abstract idea limitations.


Regarding claim 7, it recite(s) The method according to claim 1, wherein the obtaining text on which image editing is based and parsing semantic information in the text specifically comprises: obtaining the text on which image editing is based and parsing the text by using a natural language processing (NLP) technology, to obtain the semantic information in the text.

MPEP 2106 III provide a flowchart for the subject matter eligibility test for product and processes. The analysis following the flowchart is as follows:

Step 1: Is the claim to a process, machine, manufacture or composition of matter? 
Yes. It recites a method which is a process.

Step 2A, Prong One: Does the claim recite an abstract idea, law of nature, or nature phenomenon?
Yes. The claim recites an abstract idea. 

Obtaining text and parsing semantic information

Obtaining text and parsing semantic information is merely a mental process of visualizing the text with and analyzing the meaning of the text in the human mind, as the obtaining and parsing techniques are not described and it is merely a statement of obtaining and parsing without limit to how it is being obtained and parsed.



Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.
This judicial exception is not integrated into a practical application.  In particular, the claim only recites the additional elements NLP technology which merely parses text information.  The NLP technology is recited at a high-level of generality (i.e., general processing technology, general purpose computer) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  The claim is directed to an abstract idea.   

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.
These elements are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component.  See MPEP 2106.05(f). MPEP 2106.05(f) provides the following considerations for determining whether a claim simply recites a judicial exception with the words “apply it” (or an equivalent), such as mere instructions to implement an abstract idea on a computer: (1) whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished; (2) whether the claim invokes computers or other machinery merely as a tool to perform an existing process; and (3) the particularity or generality of the application of the judicial exception.  In the instant claim, the use of NLP technology only presents the idea of a solution while failing to describe how the NLP technology are used or structured to achieve the solution.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.

Regarding claim 9, it recite(s) A semantic-based image editing system, wherein the system comprises: 
a text semantic obtaining module, configured to: obtain text on which image editing is based and parse semantic information in the text, wherein the semantic information in the text is used to describe an image editing operation and image content corresponding to the operation; 
an image editing operation module, configured to perform the image editing operation based on the semantic information in the text, wherein the image editing operation comprises at least one of an operation of generating a new image, an operation of adding content to a to-be-edited image, an operation of modifying content of the to-be-edited image, and an operation of deleting content from the to-be-edited image; and 
an image refining module, configured to refine an edited image to obtain a refined image.
MPEP 2106 III provide a flowchart for the subject matter eligibility test for product and processes. The analysis following the flowchart is as follows:

Step 1: Is the claim to a process, machine, manufacture or composition of matter? 
Yes. It recites a System which is a machine.

Step 2A, Prong One: Does the claim recite an abstract idea, law of nature, or nature phenomenon?
Yes. The claim recites an abstract idea. 

Obtain text and parse semantic information
perform the image editing operation 
refine an edited image

Obtain text and parse semantic information is merely a mental process of visualizing the text with and analyzing the meaning of the text in the human mind, as the obtaining and parsing techniques are not described and it is merely a statement of obtaining and parsing without limit to how it is being obtained and parsed.

Perform the image editing operation can be considered a mental process of visualizing an image in the human mind and mentally changing portions as desired as there is no description of how the specific image is edited merely that it is created, modified, a portion added, a portion removed. 

Refine an edited image can be considered a mental process of visualizing the edited image in the human mind and mentally changing portions as desired as there is no description of how the edited image is refined. 



Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.
This judicial exception is not integrated into a practical application.  In particular, the claim only recites the additional elements a text semantic obtaining module, an image editing operation module, an image refining module which merely obtains and parses text information, edits images, and refines image.  The text semantic obtaining module, the image editing operation module, and the image refining module are recited at a high-level of generality (i.e., general purpose processor in conjunction with software) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  The claim is directed to an abstract idea.   

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.
These elements are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component.  See MPEP 2106.05(f). MPEP 2106.05(f) provides the following considerations for determining whether a claim simply recites a judicial exception with the words “apply it” (or an equivalent), such as mere instructions to implement an abstract idea on a computer: (1) whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished; (2) whether the claim invokes computers or other machinery merely as a tool to perform an existing process; and (3) the particularity or generality of the application of the judicial exception.  In the instant claim, the use of NLP technology only presents the idea of a solution while failing to describe how the text semantic obtaining module, the image editing operation module, and the image refining module are used or structured to achieve the solution.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.

Regarding claim 10, it recite(s) A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the semantic-based image editing method according to claim 1 is implemented.

MPEP 2106 III provide a flowchart for the subject matter eligibility test for product and processes. The analysis following the flowchart is as follows:

Step 1: Is the claim to a process, machine, manufacture or composition of matter? 
Yes. It recites a computer readable medium, which is a manufacture.

Step 2A, Prong One: Does the claim recite an abstract idea, law of nature, or nature phenomenon?
Yes. The claim recites an abstract idea. 

(See steps below included from claim 1)
Obtaining text and parsing semantic information
performing the image editing operation 
refining an edited image

Obtaining text and parsing semantic information is merely a mental process of visualizing the text with and analyzing the meaning of the text in the human mind, as the obtaining and parsing techniques are not described and it is merely a statement of obtaining and parsing without limit to how it is being obtained and parsed.

Performing the image editing operation can be considered a mental process of visualizing an image in the human mind and mentally changing portions as desired as there is no description of how the specific image is edited merely that it is created, modified, a portion added, a portion removed. 

Refining an edited image can be considered a mental process of visualizing the edited image in the human mind and mentally changing portions as desired as there is no description of how the edited image is refined. 




Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.
This judicial exception is not integrated into a practical application.  In particular, the claim only recites the additional elements a computer-readable storage medium and a processor which merely obtains and parses text information, edits images, and refines image.  The computer-readable storage medium and the processor are recited at a high-level of generality (i.e., storage with instruction and processor) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  The claim is directed to an abstract idea.   

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.
These elements are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component.  See MPEP 2106.05(f). MPEP 2106.05(f) provides the following considerations for determining whether a claim simply recites a judicial exception with the words “apply it” (or an equivalent), such as mere instructions to implement an abstract idea on a computer: (1) whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished; (2) whether the claim invokes computers or other machinery merely as a tool to perform an existing process; and (3) the particularity or generality of the application of the judicial exception.  In the instant claim, the use of the computer-readable storage medium and the processor only presents the idea of a solution while failing to describe how the computer-readable storage medium and the processor are used or structured to achieve the solution.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.


Regarding claim 16, it recite(s) The computer-readable storage medium according to claim 10, wherein the obtaining text on which image editing is based and parsing semantic information in the text specifically comprises: obtaining the text on which image editing is based and parsing the text by using a natural language processing (NLP) technology, to obtain the semantic information in the text.
MPEP 2106 III provide a flowchart for the subject matter eligibility test for product and processes. The analysis following the flowchart is as follows:

Step 1: Is the claim to a process, machine, manufacture or composition of matter? 
Yes. It recites a computer readable medium, which is a manufacture.

Step 2A, Prong One: Does the claim recite an abstract idea, law of nature, or nature phenomenon?
Yes. The claim recites an abstract idea. 

Obtaining text and parsing semantic information

Obtaining text and parsing semantic information is merely a mental process of visualizing the text with and analyzing the meaning of the text in the human mind, as the obtaining and parsing techniques are not described and it is merely a statement of obtaining and parsing without limit to how it is being obtained and parsed.





Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
No.
This judicial exception is not integrated into a practical application.  In particular, the claim only recites the additional elements a computer-readable storage medium and a processor and NLP technology which merely obtains and parses text information.  The computer-readable storage medium and the processor and NLP technology are recited at a high-level of generality (i.e., storage with instructions and processor) such that it amounts no more than mere instructions to apply the exception using a generic computer component.  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  The claim is directed to an abstract idea.   

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No.
These elements are recited at a high-level of generality such that it amounts no more than mere instructions to apply the exception using a generic computer component.  See MPEP 2106.05(f). MPEP 2106.05(f) provides the following considerations for determining whether a claim simply recites a judicial exception with the words “apply it” (or an equivalent), such as mere instructions to implement an abstract idea on a computer: (1) whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished; (2) whether the claim invokes computers or other machinery merely as a tool to perform an existing process; and (3) the particularity or generality of the application of the judicial exception.  In the instant claim, the use of the computer-readable storage medium and the processor and NLP technology only presents the idea of a solution while failing to describe how the computer-readable storage medium and the processor and the NLP technology are used or structured to achieve the solution.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 10-17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to nonstatutory subject matter. Claim 10 recites "A computer-readable storage medium”. A computer program may be statutory if it is claimed as a physical product, by reciting the program in conjunction with a “non-transitory computer-readable medium.” The specification (see paragraph [0134] of the specification) recites “The embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may use a form of hardware only embodiments, software only embodiments, or embodiments with a combination of software and hardware. Moreover, the present disclosure may use a form of a computer program product that is implemented on one or more computer-usable storage media (including but not limited to a magnetic disk memory, a compact disc read-only memory (CD-ROM), an optical memory, and the like) that include computer-usable program code.” Because the specification does not positively restrict the medium to only statutory embodiments, and under a broadest reasonable interpretation the medium might include signals (i.e. transitory propagating signals, carrier waves, etc.) and is thus directed to nonstatutory subject matter (see MPEP §2106; In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007); and the Director’s Memo Subject Matter Eligibility of Computer Readable Media, 1351 Off. Gaz. Pat. Office 212 (Feb. 23, 2010)). In order to overcome the rejection, Applicant(s) should amend claims 10-17 such that the program is a physical product in conjunction with the medium and the medium is non-transitory in nature, “non-transitory computer readable storage medium.” 



Claim Objections
Claims 2 and 11 are objected to because of the following informalities: the limitation “determ0ining” appears to be a typographical error of “determining”.  Appropriate correction is required.



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1,7-10, 16, 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ravi et al. (US 2024/0362842)(Hereinafter referred to as Ravi) in view of Ume et al. (US 2025/0029208)(Hereinafter referred to as Ume).

Regarding claim 1, Ravi A semantic-based image editing method (The present disclosure relates to systems, methods, and non-transitory computer readable media for utilizing a diffusion
prior neural network for text guided digital image editing. See abstract)
, wherein the method comprises: 
obtaining text on which image editing is based and parsing semantic information in the text, wherein the semantic information in the text is used to describe an image editing operation and image content corresponding to the operation (See figure 2, edit text 204)( As also illustrated in FIG. 2, the diffusion prior image editing system 102 also identifies the edit text 204. The edit text 204 includes a verbal description (e.g., of a characteristic, feature, or modification for a digital image). For example, the edit text 204 can include a textual description of a desired characteristic of the modified digital image 212. See paragraph [0038]); 
performing the image editing operation based on the semantic information in the text, wherein the image editing operation comprises at least one of an operation of generating a new image, an operation of adding content to a to-be-edited image, an operation of modifying content of the to-be-edited image, and an operation of deleting content from the to-be-edited image (In particular, FIG. 2 illustrates the diffusion prior image editing system 102 generating a modified digital image 212 from a base digital image 202 and edit text 204 utilizing a diffusion prior neural network 206 and a diffusion neural network 210 in accordance with one or more embodiments. See paragraph [0036])(As shown in FIG. 2 the diffusion prior image editing system 102 utilizes a diffusion prior neural network 206 and the diffusion neural network 210 to convert the base digital image 202 and the edit text 204 to the modified digital image 212. In particular, the diffusion prior image editing system 102 utilizes the diffusion prior neural network 206 to perform conceptual editing 208. As used herein, the term neural network refers to a machine learning model of interconnected artificial neurons (e.g., organized in layers) that communicate and learn to approximate complex functions and generate outputs. In some cases, a neural network refers to an algorithm ( or set of algorithms) that implements deep learning techniques to model high-level abstractions in data. For example, a neural network can include a convolutional neural network, a recurrent neural network (e.g., an LSTM), a graph neural network, or a generative adversarial neural network. See paragraph [0039])( The diffusion prior image editing system 102 can perform the structural editing 214 by dynamically selecting a structural transition step. In particular, the diffusion prior image editing system 102 can select a structural transition step of the diffusion neural network 210 that determines the number of noising steps and/or denoising steps in generating the modified digital image 212. The diffusion prior image editing system 102 can utilize denoising steps of the diffusion neural network 210 following the structural transition denoising step to process a representation of the base digital image 202. The diffusion prior image editing system 102 can intelligently select the structural transition denoising step to control the preservation of details from the base digital image 202 in generating the modified digital image 212. See paragraph [0044]), but is silent to and
refining an edited image to obtain a refined image.  
	Ume teaches modifying source data to generate hyperreal synthetic content (FIG. 3 is a diagram illustrating an example technique for enhancing a source image 300 based on difference mapping data 302 between low-quality imagery 304 and high-quality imagery 306. As mentioned above, source data 100 and/or unmodified input data 114 can be modified (e.g., manually, semi-automatically, or automatically) to obtain modified images, such as images of an enhanced quality featuring a subject 308 (e.g., a person). In the example of FIG. 3, a frame 300 (e.g., Frame A) represents an image of a first quality ( e.g., a first resolution, first sharpness, etc.) featuring a body part (e.g., a face) of a subject 308. In particular, the frame 300 is urnnodified (e.g., the frame 300 may represent a frame from an old movie starring a famous actor (e.g., Charlie Chaplin in the example of FIG. 3) who is now deceased). Due to the equipment ( e.g., cameras, lenses, film, etc.) used to record such footage, fine details, such as blemishes, pores, and other imperfections on the subject's 308 skin and/or individual strands of hair on the subject's 308 face, may not be noticeable in the urnnodified frame 300. Meanwhile, frame 300' (e.g., Frame A') represents the image of the subject 308, but enhanced to a second quality (e.g., a second resolution, second sharpness, etc.) greater than the first quality. In general, difference mapping data 302 can be used to modify (e.g., enhance) frame 300 to obtain frame 300' representing the image of the second, greater quality. In some examples, the modification(s) that is/are performed to enhance the image quality may include upscaling the image( s ), sharpening the image(s ), performing color transformation of the pixel values of the image(s), linearizing a 2D color space, adding details ( e.g., skin pores, blemishes, freckles, hair strands, etc.) to the face featured in the image(s), or the like. See paragraph [0035])( FIG. 6 is a flow diagram of an example process 600 for training multiple machine learning models to modify source data and to generate media content featuring a hyperreal synthetic body part. See paragraph [0062])
	Ravi and Ume teach of editing images using machine learning networks and Ume teaches that upscaling an image to enhance the quality, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Ravi with the upscaling techniques of Ume such that the system could output an enhanced quality image.

Regarding claim 7, Ravi in view of Ume teaches the method according to claim 1, wherein the obtaining text on which image editing is based and parsing semantic information in the text specifically comprises: obtaining the text on which image editing is based and parsing the text by using a natural language processing (NLP) technology, to obtain the semantic information in the text (Ravi; See figure 2, edit text 204)( Ravi;  As also illustrated in FIG. 2, the diffusion prior image editing system 102 also identifies the edit text 204. The edit text 204 includes a verbal description (e.g., of a characteristic, feature, or modification for a digital image). For example, the edit text 204 can include a textual description of a desired characteristic of the modified digital image 212. See paragraph [0038]).  

Regarding claim 8, Ravi in view of Ume teaches the method according to claim 1, wherein the refining an edited image to obtain a refined image specifically comprises: inputting the edited image into an image optimization model to obtain the refined image, wherein the image optimization model uses a diffusion model as a backbone network (Ume; FIG. 6 is a flow diagram of an example process 600 for training multiple machine learning models to modify source data and to generate media content featuring a hyperreal synthetic body part. See paragraph [0062])(Ume; At 609, the processor(s) may analyze (e.g., scan) the source data 100, 400 (e.g., based at least in part on the target face 204) to identify missing data, and the processor (s) may use an AI model(s) (e.g., a tuned diffusion model(s)) to augment the training data 112 with AI-generated synthetic data 403 that corresponds to the missing data. See paragraph [0067]).  

Regarding claim 9, Ravi teaches A semantic-based image editing (The present disclosure relates to systems, methods, and non-transitory computer readable media for utilizing a diffusion
prior neural network for text guided digital image editing. See abstract) system, wherein the system comprises:  
a text semantic obtaining module (The components of the diffusion prior image editing
system 102, in one or more implementations, includes software, hardware, or both. For example, the components of the diffusion prior image editing system 102 include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices (e.g., the computing device 1100). When executed by the one or more processors, the computer executable instructions of the diffusion prior image editing system 102 cause the computing device 1100 to perform the methods described herein See paragraph [0129]), configured to: 
obtain text on which image editing is based and parse semantic information in the text, wherein the semantic information in the text is used to describe an image editing operation and image content corresponding to the operation  (See figure 2, edit text 204)( As also illustrated in FIG. 2, the diffusion prior image editing system 102 also identifies the edit text 204. The edit text 204 includes a verbal description (e.g., of a characteristic, feature, or modification for a digital image). For example, the edit text 204 can include a textual description of a desired characteristic of the modified digital image 212. See paragraph [0038]);  
an image editing operation module (The components of the diffusion prior image editing
system 102, in one or more implementations, includes software, hardware, or both. For example, the components of the diffusion prior image editing system 102 include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices (e.g., the computing device 1100). When executed by the one or more processors, the computer executable instructions of the diffusion prior image editing system 102 cause the computing device 1100 to perform the methods described herein See paragraph [0129]), configured to perform the image editing operation based on the semantic information in the text, wherein the image editing operation comprises at least one of an operation of generating a new image, an operation of adding content to a to-be-edited image, an operation of modifying content of the to-be-edited image, and an operation of deleting content from the to-be-edited image (In particular, FIG. 2 illustrates the diffusion prior image editing system 102 generating a modified digital image 212 from a base digital image 202 and edit text 204 utilizing a diffusion prior neural network 206 and a diffusion neural network 210 in accordance with one or more embodiments. See paragraph [0036])(As shown in FIG. 2 the diffusion prior image editing system 102 utilizes a diffusion prior neural network 206 and the diffusion neural network 210 to convert the base digital image 202 and the edit text 204 to the modified digital image 212. In particular, the diffusion prior image editing system 102 utilizes the diffusion prior neural network 206 to perform conceptual editing 208. As used herein, the term neural network refers to a machine learning model of interconnected artificial neurons (e.g., organized in layers) that communicate and learn to approximate complex functions and generate outputs. In some cases, a neural network refers to an algorithm ( or set of algorithms) that implements deep learning techniques to model high-level abstractions in data. For example, a neural network can include a convolutional neural network, a recurrent neural network (e.g., an LSTM), a graph neural network, or a generative adversarial neural network. See paragraph [0039])( The diffusion prior image editing system 102 can perform the structural editing 214 by dynamically selecting a structural transition step. In particular, the diffusion prior image editing system 102 can select a structural transition step of the diffusion neural network 210 that determines the number of noising steps and/or denoising steps in generating the modified digital image 212. The diffusion prior image editing system 102 can utilize denoising steps of the diffusion neural network 210 following the structural transition denoising step to process a representation of the base digital image 202. The diffusion prior image editing system 102 can intelligently select the structural transition denoising step to control the preservation of details from the base digital image 202 in generating the modified digital image 212. See paragraph [0044]), but is silent to and an image refining module, configured to refine an edited image to obtain a refined image.  
 	Ume teaches modifying source data to generate hyperreal synthetic content (FIG. 3 is a diagram illustrating an example technique for enhancing a source image 300 based on difference mapping data 302 between low-quality imagery 304 and high-quality imagery 306. As mentioned above, source data 100 and/or unmodified input data 114 can be modified (e.g., manually, semi-automatically, or automatically) to obtain modified images, such as images of an enhanced quality featuring a subject 308 (e.g., a person). In the example of FIG. 3, a frame 300 (e.g., Frame A) represents an image of a first quality ( e.g., a first resolution, first sharpness, etc.) featuring a body part (e.g., a face) of a subject 308. In particular, the frame 300 is urnnodified (e.g., the frame 300 may represent a frame from an old movie starring a famous actor (e.g., Charlie Chaplin in the example of FIG. 3) who is now deceased). Due to the equipment ( e.g., cameras, lenses, film, etc.) used to record such footage, fine details, such as blemishes, pores, and other imperfections on the subject's 308 skin and/or individual strands of hair on the subject's 308 face, may not be noticeable in the urnnodified frame 300. Meanwhile, frame 300' (e.g., Frame A') represents the image of the subject 308, but enhanced to a second quality (e.g., a second resolution, second sharpness, etc.) greater than the first quality. In general, difference mapping data 302 can be used to modify (e.g., enhance) frame 300 to obtain frame 300' representing the image of the second, greater quality. In some examples, the modification(s) that is/are performed to enhance the image quality may include upscaling the image( s ), sharpening the image(s ), performing color transformation of the pixel values of the image(s), linearizing a 2D color space, adding details ( e.g., skin pores, blemishes, freckles, hair strands, etc.) to the face featured in the image(s), or the like. See paragraph [0035])( FIG. 6 is a flow diagram of an example process 600 for training multiple machine learning models to modify source data and to generate media content featuring a hyperreal synthetic body part. See paragraph [0062])
	Ravi and Ume teach of editing images using machine learning networks and Ume teaches that upscaling an image to enhance the quality, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to combine the system of Ravi with the upscaling techniques of Ume such that the system could output an enhanced quality image.


Regarding claim 10, Ravi in view of Ume teaches A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the semantic-based image editing method according to claim 1 is implemented (Ravi; The present disclosure relates to systems, methods, and non-transitory computer readable media for utilizing a diffusion
prior neural network for text guided digital image editing. See abstract) (Ravi; The components of the diffusion prior image editing system 102, in one or more implementations, includes software, hardware, or both. For example, the components of the diffusion prior image editing system 102 include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices (e.g., the computing device 1100). When executed by the one or more processors, the computer executable instructions of the diffusion prior image editing system 102 cause the computing device 1100 to perform the methods described herein See paragraph [0129])(See rejection of claim 1).  

Regarding claim 16, Ravi in view of Ume teaches the computer-readable storage medium according to claim 10, wherein the obtaining text on which image editing is based and parsing semantic information in the text specifically comprises: obtaining the text on which image editing is based and parsing the text by using a natural language processing (NLP) technology, to obtain the semantic information in the text (Ravi; See figure 2, edit text 204)( Ravi;  As also illustrated in FIG. 2, the diffusion prior image editing system 102 also identifies the edit text 204. The edit text 204 includes a verbal description (e.g., of a characteristic, feature, or modification for a digital image). For example, the edit text 204 can include a textual description of a desired characteristic of the modified digital image 212. See paragraph [0038]).  

Regarding claim 17, Ravi in view of Ume teaches The computer-readable storage medium according to claim 10, wherein the refining an edited image to obtain a refined image specifically comprises: inputting the edited image into an image optimization model to obtain the refined image, wherein the image optimization model uses a diffusion model as a backbone network (Ume; FIG. 6 is a flow diagram of an example process 600 for training multiple machine learning models to modify source data and to generate media content featuring a hyperreal synthetic body part. See paragraph [0062])(Ume; At 609, the processor(s) may analyze (e.g., scan) the source data 100, 400 (e.g., based at least in part on the target face 204) to identify missing data, and the processor (s) may use an AI model(s) (e.g., a tuned diffusion model(s)) to augment the training data 112 with AI-generated synthetic data 403 that corresponds to the missing data. See paragraph [0067]).

Allowable Subject Matter
Claims 2-6, and 11-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and overcoming the 101 rejections set forth in this action.
The following is a statement of reasons for the indication of allowable subject matter:  The prior art of record alone or in combination is silent to the limitations “wherein when the image editing operation is the operation of generating a new image, the performing the image editing operation based on the semantic information in the text specifically comprises: determining entities and entity attributes in the text based on the semantic information in the text, and determining a relationship network between entities based on the entities and the entity attributes in the text, wherein the entities comprise target objects in an image and an image background; generating an entity mask map corresponding to each entity based on the relationship network between entities, wherein the entity mask map is used to define an image region in which the entity is placed; performing entity type embedding on the entity mask map for each entity to obtain each entity type embedding mask map; generating a text embedding map for each entity based on a textual description of an entity attribute in the text, and obtaining each entity embedding feature map based on each entity type embedding mask map and a corresponding text embedding map; and inputting each entity embedding feature map into an image generation model to obtain the edited image.” of claim 2 when read in light of the rest of the limitations in claim 2 and the claims to which claim 2 depends and thus claim 2 contains allowable subject matter.
Claim 3 contains allowable subject matter because it depends on a claim that contains allowable subject matter. 

The prior art of record alone or in combination is silent to the limitations “wherein when the image editing operation is the operation of adding content to a to-be-edited image, the performing the image editing operation based on the semantic information in the text specifically comprises: identifying original entities and original entity attributes in the to-be-edited image, and generating original entity mask maps for all the original entities; selecting a reference entity mask map from the original entity mask maps based on the semantic information in the text, wherein the reference entity mask map is a mask map corresponding to an original entity mentioned in the semantic information in the text; determining a new entity mask map corresponding to a new entity based on the reference entity mask map; performing entity type embedding on the new entity mask map to obtain a new entity type embedding mask map; generating a text embedding map for the new entity based on a textual description of a new entity attribute in the text, and obtaining an embedding feature map for the new entity based on the new entity type embedding mask map and the corresponding text embedding map; inputting the to-be-edited image and the embedding feature map for the new entity into an image generation model to obtain a new entity image, wherein the new entity image incorporates image information of the to-be-edited image; and inputting the new entity image and the to-be-edited image into the image generation model to obtain the edited image, wherein the edited image is an image obtained after the new entity image is added to the to-be-edited image. ” of claim 4 when read in light of the rest of the limitations in claim 4 and the claims to which claim 4 depends and thus claim 4 contains allowable subject matter.


The prior art of record alone or in combination is silent to the limitations “wherein when the image editing operation is the operation of modifying content of the to-be-edited image, the performing the image editing operation based on the semantic information in the text specifically comprises: identifying original entities and original entity attributes in the to-be-edited image, and generating original entity mask maps for all the original entities; selecting a to-be-modified entity mask map from the original entity mask maps based on the semantic information in the text, and modifying the to-be-modified entity mask map to a modified entity mask map based on a textual description of a modified entity attribute in the text, wherein the semantic information in the text records an original entity that is to be modified and provides an information description of a modified entity; obtaining an embedding feature map for the modified entity based on the textual description of the modified entity attribute and the modified entity mask map; and inputting the embedding feature map for the modified entity and the to-be-edited image into an image generation model to obtain the edited image, wherein the edited image is an image generated after a modification operation is performed.  ” of claim 5 when read in light of the rest of the limitations in claim 5 and the claims to which claim 5 depends and thus claim 5 contains allowable subject matter.

The prior art of record alone or in combination is silent to the limitations “wherein when the image editing operation is the operation of deleting content from the to-be-edited image, the performing the image editing operation based on the semantic information in the text specifically comprises: identifying original entities and original entity attributes in the to-be-edited image, and generating original entity mask maps for all the original entities; selecting a to-be-deleted entity mask map from the original entity mask maps based on the semantic information in the text, wherein the semantic information in the text records an original entity that is to be deleted; combining all the original entity mask maps except the to-be-deleted entity mask map, and generating an embedding feature map for a combined entity based on a combined mask map; and inputting the embedding feature map for the combined entity and the to-be-edited image into an image generation model to obtain the edited image, wherein the edited image is an image obtained after a deletion operation is performed.” of claim 6 when read in light of the rest of the limitations in claim 6 and the claims to which claim 6 depends and thus claim 6 contains allowable subject matter. 


The prior art of record alone or in combination is silent to the limitations “wherein when the image editing operation is the operation of generating a new image, the performing the image editing operation based on the semantic information in the text specifically comprises: determining entities and entity attributes in the text based on the semantic information in the text, and determ0ining a relationship network between entities based on the entities and the entity attributes in the text, wherein the entities comprise target objects in an image and an image background; generating an entity mask map corresponding to each entity based on the relationship network between entities, wherein the entity mask map is used to define an image region in which the entity is placed; performing entity type embedding on the entity mask map for each entity to obtain each entity type embedding mask map; generating a text embedding map for each entity based on a textual description of an entity attribute in the text, and obtaining each entity embedding feature map based on each entity type embedding mask map and a corresponding text embedding map; and inputting each entity embedding feature map into an image generation model to obtain the edited image.” of claim 11 when read in light of the rest of the limitations in claim 11 and the claims to which claim 11 depends and thus claim 11 contains allowable subject matter. 
  
Claim 12 contains allowable subject matter because it depends on a claim that contains allowable subject matter. 

The prior art of record alone or in combination is silent to the limitations “wherein when the image editing operation is the operation of adding content to a to-be-edited image, the performing the image editing operation based on the semantic information in the text specifically comprises: identifying original entities and original entity attributes in the to-be-edited image, and generating original entity mask maps for all the original entities; selecting a reference entity mask map from the original entity mask maps based on the semantic information in the text, wherein the reference entity mask map is a mask map corresponding to an original entity mentioned in the semantic information in the text; determining a new entity mask map corresponding to a new entity based on the reference entity mask map; performing entity type embedding on the new entity mask map to obtain a new entity type embedding mask map; generating a text embedding map for the new entity based on a textual description of a new entity attribute in the text, and obtaining an embedding feature map for the new entity based on the new entity type embedding mask map and the corresponding text embedding map; inputting the to-be-edited image and the embedding feature map for the new entity into an image generation model to obtain a new entity image, wherein the new entity image incorporates image information of the to-be-edited image; and inputting the new entity image and the to-be-edited image into the image generation model to obtain the edited image, wherein the edited image is an image obtained after the new entity image is added to the to-be-edited image. ” of claim 13 when read in light of the rest of the limitations in claim 13 and the claims to which claim 13 depends and thus claim 13 contains allowable subject matter.  

The prior art of record alone or in combination is silent to the limitations “wherein when the image editing operation is the operation of modifying content of the to-be-edited image, the performing the image editing operation based on the semantic information in the text specifically comprises: identifying original entities and original entity attributes in the to-be-edited image, and generating original entity mask maps for all the original entities; selecting a to-be-modified entity mask map from the original entity mask maps based on the semantic information in the text, and modifying the to-be-modified entity mask map to a modified entity mask map based on a textual description of a modified entity attribute in the text, wherein the semantic information in the text records an original entity that is to be modified and provides an information description of a modified entity; obtaining an embedding feature map for the modified entity based on the textual description of the modified entity attribute and the modified entity mask map; and inputting the embedding feature map for the modified entity and the to-be-edited image into an image generation model to obtain the edited image, wherein the edited image is an image generated after a modification operation is performed.” of claim 14 when read in light of the rest of the limitations in claim 14 and the claims to which claim 14 depends and thus claim 14 contains allowable subject matter. 

The prior art of record alone or in combination is silent to the limitations “wherein when the image editing operation is the operation of deleting content from the to-be-edited image, the performing the image editing operation based on the semantic information in the text specifically comprises: identifying original entities and original entity attributes in the to-be-edited image, and generating original entity mask maps for all the original entities; selecting a to-be-deleted entity mask map from the original entity mask maps based on the semantic information in the text, wherein the semantic information in the text records an original entity that is to be deleted; combining all the original entity mask maps except the to-be-deleted entity mask map, and generating an embedding feature map for a combined entity based on a combined mask map; and inputting the embedding feature map for the combined entity and the to-be-edited image into an image generation model to obtain the edited image, wherein the edited image is an image obtained after a deletion operation is performed.” of claim 15 when read in light of the rest of the limitations in claim 15 and the claims to which claim 15 depends and thus claim 15 contains allowable subject matter.  


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS R WILSON whose telephone number is (571)272-0936. The examiner can normally be reached M-F 7:30-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (572)-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICHOLAS R WILSON/Primary Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Jul 30, 2024
Application Filed
Mar 14, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/311,947
Patent 12602869
APPARATUS, SYSTEMS AND METHODS FOR PROCESSING IMAGES
2y 5m to grant Granted Apr 14, 2026
18/411,327
Patent 12602891
TELEPORTATION SYSTEM COMBINING VIRTUAL REALITY AND AUGMENTED REALITY
2y 5m to grant Granted Apr 14, 2026
18/043,343
Patent 12579605
INFORMATION PROCESSING DEVICE AND METHOD OF CONTROLLING DISPLAY DEVICE
2y 5m to grant Granted Mar 17, 2026
18/141,516
Patent 12567215
SYSTEM AND METHOD OF CONTROLLING SYSTEM
2y 5m to grant Granted Mar 03, 2026
18/371,028
Patent 12561911
3D CAGE GENERATION USING SIGNED DISTANCE FUNCTION APPROXIMANT
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
87%
Grant Probability
99%
With Interview (+12.1%)
1y 12m
Median Time to Grant
Low
PTA Risk
Based on 537 resolved cases by this examiner. Grant probability derived from career allow rate.