Last updated: May 29, 2026
Application No. 18/503,741
DEVICE, METHOD, AND PROGRAM FOR ENHANCING OUTPUT CONTENT THROUGH ITERATIVE GENERATION

Non-Final OA §103
Filed
Nov 07, 2023
Priority
Dec 04, 2019 — RE 10-2019-0160008 +2 more
Examiner
AGAHI, DARIOUSH
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
3 (Non-Final)
Interview Optional

— +30.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 85% grant rate with +30.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 174 resolved cases, 2023–2026
Examiner Intelligence

AGAHI, DARIOUSH View full profile →
Grants 85% — above average
Career Allowance Rate
148 granted / 174 resolved
+23.1% vs TC avg
Strong +30% interview lift
Without
With
+30.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
16 currently pending
Career history
195
Total Applications
across all art units
Statute-Specific Performance

§101
7.5%
-32.5% vs TC avg
§103
89.9%
+49.9% vs TC avg
§102
1.1%
-38.9% vs TC avg
§112
0.8%
-39.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 174 resolved cases
Office Action

§103
DETAILED ACTION
This office action is in response to Applicant’s submission filed on 2/26/2026. Claims 37-75 are pending in the application of which Claims 37, 49, and 61 are independent and have been examined.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .



Information Disclosure Statement
The information disclosure statement(s)(IDS) submitted on 4/6/2026 has been considered by the examiner.


Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant’s submission filed on 2/26/2026 has been entered.

Response to Arguments
Applicant’s arguments filed in the Amendment filed 2/26/2026 (herein “Amendment”) with respect to 35 U.S.C. 112(a) has been fully considered and persuasive.  Consequently, 35 U.S.C. 112(a) claim rejection is withdrawn.
Examiner concur with the  Applicant’s request with respect to the double patenting rejection to be held in abeyance until all substantive issues in the application are addressed.
Applicant’s amendments with respect to the 35 USC §103 rejection raised in the previous office action have been fully considered but they are not persuasive.
Applicant on page 15 set forth:”  Knowledge of Applicant's disclosure must be put aside in reaching this determination, yet kept in mind in order to determine the differences, conduct the search and evaluate the subject matter as a whole of the invention. The tendency to resort to "hindsight" based upon Applicant's disclosure is often difficult to avoid due to the very nature of the examination process. However, impermissible hindsight must be avoided and the legal conclusion must be reached on the basis of the facts gleaned from the prior art."
MPEP 2145: Applicants may argue that the examiner's conclusion of obviousness is based on improper hindsight reasoning. However, "[a]ny judgment on obviousness is in a sense necessarily a reconstruction based on hindsight reasoning, but so long as it takes into account only knowledge which was within the level of ordinary skill in the art at the time the claimed invention was made and does not include knowledge gleaned only from applicant's disclosure, such a reconstruction is proper." In re Mclaughlin, 443 F.2d 1392, 1395, 170 USPQ 209, 212
(CCPA 1971).
Therefore, Examiner fails to see which part of the applicant argument could justify impermissible hindsight reconstruction allegation. As such, the impermissible hindsight argument is not persuasive.
The other rest of the arguments presented are moot in view of the new grounds of rejection which was necessitated by applicant’s amendment.  Therefore, the previous rejection has been withdrawn.  However, upon further consideration, a new ground of rejection is introduced for independent claims further adding Ohara et al. (US 6529618 B1) to the combination of Cohen and Gupta.   
Please see prior art section below for more detail including updated citations and obviousness rationale.






Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 37-72 are rejected under 35 U.S.C. 103 as being unpatentable over Cohen (US20190196698A1), and in further view of  Gupta et al. (US20200250453A1)(herein "Gupta").

Chen and Gupta were applied in the previous Office Action.
Regarding claims 37, 49, and 61, Cohen teaches [One or more non-transitory computer-readable storage media storing one or more computer programs including computer-executable instructions that, when executed by one or more processors of an electronic device individually or collectively, cause the electronic device to perform operations, the operations comprising: - claim 37], [A method performed by an electronic device for modifying content, the method comprising: - claim 49], and [An electronic device for modifying content, the electronic device comprising: memory, comprising one or more storage media, storing instructions; and one or more processors communicatively coupled to the memory, wherein the instructions, when executed by the one or more processors individually or collectively, cause the electronic device to: - claim 61] (Cohen, Par. 0042:” Image enhancement system 110 also includes processors 124. Hence, image enhancement system 110 may be implemented at least partially by executing instructions stored on storage 126 on processors 124. For instance, processors 124 may execute portions of image enhancement application 120.”, and Par. 0184:” Computer-readable storage media 806 is illustrated as including memory/storage 812. Storage 126 in FIG. 1 is an example of memory/storage included in memory/storage 812.”, and Par. 0189:” “Computer-readable storage media” refers to media, devices, or combinations thereof that enable persistent or non-transitory storage of information in contrast to mere signal transmission, ...”)
present a base content; (Cohen, Figure 2, image 202; Fig. 5, image 510; Par. 0094:” … In one example, directed user conversation 204 is initiated in response to the image to be edited 202 being flagged for editing, such as by a user selecting the image to be edited 202 for editing, the image to be edited 202 being loaded into an image editing application (e.g., image enhancement application 120), and the like.”)
presenting an indication, on the base content being presented, corresponding to the target area; (Cohen, Fig. 5, Par. 0142:” Indicator 512 can be any suitable indicator [indication], such as a lasso, circle, shading, pattern, mask, overlay, arrow, proximate text, and the like. In the example in FIG. 5, indicator 512 includes a dashed outline [indication] encompassing the fire hydrant [target area] together with the dog's head.”, and Par. 0143:” … In the example in FIG. 5, the user selects indicator 512, (e.g., by pointing with a mouse and clicking a mouse button), which is denoted by a hand representation 514. The user also moves the indicator 512 to adjust content that it indicates (e.g., by holding a mouse button down and moving or adjusting indicator 512).”)
while the base content is being presented, receiving a natural language input, including at least one attribute information, for generating output content; and (Cohen, Par. 0143:” … Any suitable selection with a tool in user interface 500 can be used to provide multi-modal user input, such as user clicking on a center of an object (e.g., clicking on the center of the fire hydrant in intermediate image 510) while speaking (e.g., “No, you've selected the dog's head, too. This is the fire hydrant” in the directed user conversation of representation 504) represents a multi-modal user input. “, and Par. 0170:” … Confirming that the candidate object image matches the object can include receiving a multi-modal user input to correct the candidate object. One of the modes can be speech, and another mode can be input from a keyboard, mouse, stylus, gesture, or touchscreen, and the like.”, and Par. 0095:” …The device directs the conversation by asking “What would you like to replace?”, to which the user answers “The boring sky”. The device again directs the conversation by narrowing the parameters of the replacement task, and asks “What would you like to replace the boring sky with?” “, and Par. 0096:” Continuing with the example directed user conversation 204 in FIG. 2, the user responds “A cloudy sky”, indicating to the device that the boring sky should be replaced by a cloudy sky. In response, the device generates harmonized image 206, which includes a cloudy sky.”) Note: boring and cloudy are the attribute of the sky.
switching, from presenting the base content with the indication corresponding to the target area being presented thereon, to [[presenting, instead of the base content with the indication corresponding to the target area being presented thereon and in a same display area in which the base content with the indication corresponding to the target area being presented thereon was presented,]] modified base content in which the base content is modified to include the output content, in the target area, that is generated based on a detected object in the target area and as having at least one same attribute as the at least one attribute information included in the natural language input, (Cohen, Par. 0143:” … Any suitable selection with a tool in user interface 500 can be used to provide multi-modal user input, such as user clicking on a center of an object (e.g., clicking on the center of the fire hydrant in intermediate image 510) while speaking (e.g., “No, you've selected the dog's head, too. This is the fire hydrant” in the directed user conversation of representation 504) represents a multi-modal user input. “, and Par. 0170:” … Confirming that the candidate object image matches the object can include receiving a multi-modal user input to correct the candidate object. One of the modes can be speech, and another mode can be input from a keyboard, mouse, stylus, gesture, or touchscreen, and the like.”, and Par. 0095:” …The device directs the conversation by asking “What would you like to replace?”, to which the user answers “The boring sky”. The device again directs the conversation by narrowing the parameters of the replacement task, and asks “What would you like to replace the boring sky with?” “, and Par. 0096:” Continuing with the example directed user conversation 204 in FIG. 2, the user responds “A cloudy sky”, indicating to the device that the boring sky should be replaced by a cloudy sky. In response, the device generates harmonized image 206, which includes a cloudy sky.”, and Par. 0147:” … harmonized image 528 can be generated and displayed on a user interface of one of computing devices …”) Note: boring and cloudy are the attribute of the sky. Furthermore, once the harmonized image is displayed it reads on switching from base content to the target area where the enhancement/modification is conducted.  Also, Cohen Fig. 5 depict fire hydrant  with the arrow (indication) and subsequently switched to display Panel 526 which shows the modified image.
wherein the output content is generated by using at least one artificial intelligence (Al) model, and (Cohen, Par. 0131:” … Additionally or alternatively, harmonizing module 154 may harmonize a composite image with a neural network trained specifically for the type of object removed from or replaced in an image to be edited used to form the composite image, a background of the image to be edited, or combinations thereof. For instance, harmonizing module 154 can use a neural network trained to harmonize persons in a beach scene when removing or replacing a person in an image with a beach scene. “, and Par. 0005:” … In one example, a vision module specific to the object is used, such as using a sky vision module including a neural network trained to identify skies when satisfying the replace request “Replace the boring sky with a cloudy sky”. … In another example, intermediate results are exposed to the user, and multi-modal input is received during a directed user conversation.”)
Cohen, does not teach, however, Gupta teaches while the base content is being presented without a target area of the base content having been selected, receiving a user input, related to one or more coordinates, to the base content for selecting the target area of the base content based on the user input related to the one or more coordinates; Gupta, Par. 0045:” … an image editing program can include a content-aware selection system. The content-aware selection system can enable a user to select an area of an image using a label or a tag that identifies object in the image, ... For example, for an image that includes a dog and a cat, the content-aware selection system can enable a user to input the label “dog,” upon which the content-aware selection system will generate a selection area around the pixels that represent the dog. As a further example, the system can enable the user to input the label “animals,” which will generate a selection area including the pixels for both the dog and the cat.”, and Par. 0046:” … Instead of having to draw a selection boundary around an object, or painting over the area that contains the object, users can click or tap on the object, and the content-aware selection system can automatically draw a selection area around the object. The content-aware selection system may be particularly useful when an image editing program supports voice input. With voice input, the user can speak a phrase such as “select the dog,” and the content-aware selection system will generate a selection area around the dog, without the user needing to provide any physical input. “) Note: per as-filed Spec. Par. 0060:” … The user input may be related to one or more coordinates, but is not limited thereto. For example, the user input may be an audio input, voice input, text input, or a combination thereof. An input related to a coordinate may be a touch input, click input, gesture input, etc.”
wherein the user input related to the one or more coordinates is different than the natural language input. (Gupta, Par. 0046:” … Instead of having to draw a selection boundary around an object, or painting over the area that contains the object, users can click or tap on the object, and the content-aware selection system can automatically draw a selection area around the object. “)Note: per as-filed Spec. Par. 0060:” … The user input may be related to one or more coordinates, but is not limited thereto. For example, the user input may be an audio input, voice input, text input, or a combination thereof. An input related to a coordinate may be a touch input, click input, gesture input, etc.”
Gupta is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cohen further in view of Gupta to while the base content is being presented without a target area of the base content having been selected, receiving a user input, related to one or more coordinates, to the base content for selecting the target area of the base content based on the user input related to the one or more coordinates, wherein the user input related to the one or more coordinates is different than the natural language input. Motivation to do so would improve the image editing process, in terms of speed and accuracy (Gupta, Par. 0046).
Cohen, as modified above, does not teach, however, Ohara teaches [[switching, from presenting the base content with the indication corresponding to the target area being presented thereon, to]] presenting, instead of the base content with the indication corresponding to the target area being presented thereon and in a same display area in which the base content with the indication corresponding to the target area being presented thereon was presented, [[modified base content in which the base content is modified to include the output content, in the target area, that is generated based on a detected object in the target area and as having at least one same attribute as the at least one attribute information included in the natural language input,]] (Ohara, Col. 20, ll. 58-65:” In the controller 40, by supplying the image data before correction and the image data after correction, each for one field, being made as a pair, to the display control section 55, through displaying both the radiation image based on the image data before correction and the radiation image based on the image data after correction simultaneously on the image surface of the image display device 56 as shown in FIG. 14A, …”).

    PNG
    media_image1.png
    476
    546
    media_image1.png
    Greyscale

Ohara is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cohen, as modified above, further in view of Ohara to present, instead of the base content with the indication corresponding to the target area being presented thereon and in a same display area in which the base content with the indication corresponding to the target area being presented thereon was presented. The motivation to so would provide instant, accurate, and detailed visual comparison of the changes and provide verification of the desired changes.

Regarding claims 38, 50, and 62, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches detecting the object, wherein the object is detected by using the at least one Al model. (Cohen, Par. 0021:” … For instance, a sky vision module including a neural network trained to identify [detect] skies [object] is used to ascertain pixels of a sky in an image when an object to be replaced in the image is identified as a sky, such as for the replace request “Replace the boring sky with a cloudy sky”.)

Regarding claims 39, 51, and 63, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches wherein the base content and the modified base content are images. (Cohen, Fig. 2, Par. 0021:” … For instance, a sky vision module including a neural network trained to identify skies [images] is used to ascertain pixels of a sky in an image when an object to be replaced in the image is identified as a sky, such as for the replace request “Replace the boring sky with a cloudy sky”.) Note: both boring sky and cloudy sky are images.

Regarding claims 40, 52, and 64, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches wherein the target area is less than an entire area of the base content. (Cohen, Fig. 2, Par. 0021:” … a sky in an image when an object to be replaced in the image is identified as a sky, such as for the replace request “Replace the boring sky with a cloudy sky”.) Note: Sky is less than the entire area of the base content.

Regarding claims 41, 53, and 65, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches wherein at least one of a size or a shape of the target area is user adjustable. (Cohen, Par. 0129:” … For instance, compositing module 152 can extract fill material or replacement material from an image obtained by image search module 150, filter the material (e.g., adjust color, brightness, contrast, apply a filter, and the like), re-size the material (e.g., interpolate between pixels of the material, decimate pixels of the material, or both, to stretch or squash the material), rotate the material, crop the material, composite the material with itself or other fill or replacement material, and the like.”, and Par. 0133:” … In one example, a user may adjust a border of a background segmentation generated by vision module 146, such as by moving a water line separating a beach and ocean that defines a background scene of an image …”).

Regarding claims 42, 54, and 66, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches wherein the natural language input includes at least one of a voice input or a text input, and wherein in case the natural language input includes the voice input, the voice input is converted into text by using an automatic speech recognition (ASR) model.  (Cohn, Par. 0027:" … In one example, computing devices 104 include speech recognition, identification, and synthesis functionalities, microphones, and speakers that allow computing devices 104 to communicate with user 102 in a conversation, e.g., a directed user conversation.", and Par. 0048:” … A user conversation can include any suitable type of communication, such as verbal communication (e.g., with microphones and speakers of conversation module 144), written communication (e.g., a user may type into a keyboard or provide a document to conversation module 144), or combinations of verbal communication and written communication.”, and Par. 0056:” … In one example, an editing query includes a transcript of a directed user conversation (e.g., text in ASCII format).”, and Par. 0143:” … Any suitable selection with a tool in user interface 500 can be used to provide multi-modal user input, such as user clicking on a center of an object (e.g., clicking on the center of the fire hydrant in intermediate image 510) while speaking (e.g., “No, you've selected the dog's head, too.”)

Regarding claims 43, 55, and 67, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches at least one of: wherein the output content is generated based on the base content, or wherein the output content is generated to match the base content.  (Cohen, Par. 0005:” … Based on the directed user conversation indicating a remove request or replace request, an object is removed and fill material is added [output content] in its place, or an object is replaced with replacement material, to produce a plurality of composite images that are harmonized to make the editing appear natural.”) Note: harmonized output maps to the output content are generated based on the base content or output content is generated to match the base content.

Regarding claims 44, 56, and 68, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches at least one of: wherein the output content is generated based on compositing content into the target area of the base content, or wherein the base content is modified based on compositing the output content into the base content. (Cohen, Par. 0022:” … The composite images are harmonized to make them look natural (e.g., so that the editing is not easily detected). In one example, harmonizing includes adjusting lighting of a composite image to match times of day between image materials.”) Note: when lighting of a composite image is adjusted, implies base content is also modified.

Regarding claims 45, 57, and 69, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches wherein the output content is generated based on an object corresponding to content information included in the natural language input, or wherein the output content is generated to include the detected object in the target area with at least one attribute thereof changed so as to have the at least one same attribute as the at least one attribute information included in the natural language input. (Cohen, Fig. 2, Par. 0021:” … For instance, a sky vision module including a neural network trained to identify skies is used to ascertain pixels of a sky in an image when an object to be replaced in the image is identified as a sky, such as for the replace request “Replace the boring sky with a cloudy sky”.) Note: content information for the output is the “cloudy sky” and the output has cloudy sky.

Regarding claims 46, 58, and 70, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches wherein the output content is generated as having at least one same attribute as the detected object in the target area. (Cohen, Par. 0146:” As a result of a user selecting an image in images panel 518, an image in display panel 526 is exposed. … In one example, display panel 526 is displayed in user interface 500 responsive to a user selection of one of the harmonized images 520 displayed in images panel 518.”) Note: As depicted in Fig. 5, all of the output content has the same dog attribute, as the dog in the detected object in the target area of 506. As noted only the background is being replace but the dog stayed the same between the target area and the output.

Regarding claims 47, 59, and 71, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches wherein the at least one attribute information included in the natural language input is obtained from the natural language input by using the at least one AI model. (Cohn, Par. 0027:" … In one example, computing devices 104 include speech recognition, identification, and synthesis functionalities, microphones, and speakers that allow computing devices 104 to communicate with user 102 in a conversation, e.g., a directed user conversation.", and Par. 0095:” …The device directs the conversation by asking “What would you like to replace?”, to which the user answers “The boring sky”. The device again directs the conversation by narrowing the parameters of the replacement task, and asks “What would you like to replace the boring sky with?” “, and Par. 0096:” Continuing with the example directed user conversation 204 in FIG. 2, the user responds “A cloudy sky”, indicating to the device that the boring sky should be replaced by a cloudy sky. In response, the device generates harmonized image 206, which includes a cloudy sky.”) Note: boring and cloudy are the attribute of the sky. Also, user is conversing thru a speech recognition module, which a speech recognizer is considered an AI model.

Regarding claims 48, 60, and 72, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches presenting a user interface for selecting among a plurality of contents that each correspond to a different version of the output content.  (Cohen, Par. 0005:” … Based on the directed user conversation indicating a remove request or replace request, an object is removed and fill material is added in its place, or an object is replaced with replacement material, to produce a plurality of composite images that are harmonized to make the editing appear natural. Multiple harmonized images are exposed in a user interface. Thus, a user is presented a plurality of options (e.g., harmonized images with different versions of fill material or replacement material) that satisfy the editing query based on a directed user conversation. In one example, the plurality of images are presented to the user automatically and without user intervention once a directed user conversation is completed and an image to be edited is obtained. In another example, intermediate results are exposed to the user, and multi-modal input is received during a directed user conversation.”)


Claims 73-75 are rejected under 35 U.S.C. 103 as being unpatentable over Cohen, Gupta, and Ohara, and in further view of  Gaash et al. (US 20210334612 A1)(herein " Gaash ").

Gaash was applied in the previous Office Action.
Regarding claims 73-75, Cohen, as modified above,  teaches the media, the method, and the electronic device of claims 37, 49, and 61 respectively.
Cohen, as modified above,  further teaches included in the natural language input (Cohen, Par. 0017:” … user input during a directed user conversation in addition to speech input during the directed user conversation.”, and Par. 0018:” … directing a user conversation to obtain an editing query, and providing a plurality of images that have been enhanced by fulfilling a remove request or a replace request with different content, … based on the editing query. . Received user responses are processed to determine parameters of an editing query, such as whether the user conversation indicates a remove request or replace request, objects to be removed, objects to be replaced, objects to replace objects, modifiers of objects, …”, and Par. 0030:” … user conversation 108 includes an editing query for the image to be edited 106, such as “Replace the rainy background with a sunny day” …”)
Cohen, as modified above,  does not teach, however, Gaash teaches wherein the at least one attribute information [[included in the natural language input]] is a plurality of attribute information included in the natural language input, and (Gaash, Par. 0016:” … how to apply a particular image attribute modification to a seed image, thereby generating a modified image, and a rule relating to the placement of the modified image in a collage (or print) area.”, and Par. 0034:” … determining a set of image attribute modifications. … The set of image attribute modifications may therefore be the same for each individual seed image, ... Accordingly, any single seed may have the same set of image attribute modifications applied to it differently, the application of which is determined by the unique identifier selected in block 204. The set of images attribute modifications may comprise a single image attribute modification, or a plurality of image attribute modifications. … an output may be used to indicate a set of image attribute modifications to be applied.”)
wherein the output content is generated based on the detected object in the target area and as having at least one same attribute as only less than all of the plurality of attribute information [[included in the natural language input]]. (Gaash, Par. 0036:” Block 214 comprises selecting an image attribute modification from the set of image attribute modifications determined in block 210. Block 216 comprises applying the selected image attribute modification to the jth copy of the selected seed image xi. As indicated by the looping arrow, blocks 214 and 216 may be carried out for each image attribute modification. Once all image attribute modifications in the set have been applied to the copy of the appropriate seed image, the modified image is defined. Generating the modified image therefore comprises, at block 218, applying each image attribute modification in the set to the copy of the appropriate seed image.”) Note: As an image attribute is applied to the image, one attribute is lessened from the plurality of attributes.
Gaash is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Cohen, as modified above, further in view of Gaash to wherein the at least one attribute information  is a plurality of attribute information included in the natural language input, and wherein the output content is generated based on the detected object in the target area and as having at least one same attribute as only less than all of the plurality of attribute information. Motivation to do so would provide multiple images that have a degree of consistency or commonality (Gaash, Par. 0008).





Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Zhang et al. (Text-to-Image Synthesis via Visual-Memory Creative Adversarial Network”, Advances in Multimedia Information Processing–PCM 2018: 19th Pacific-Rim …, 2018) teaches in ABS:” … propose a method named visual-memory Creative Adversarial Network (vmCAN) to generate images depending on their corresponding narrative sentences.”, and Section 3.1:” Region Proposal Network ranks and refines region boxes called anchors to generate high-quality region proposals which most likely contain an object.”, and Section 2.2:” … method to edit a given image with specific textual description. … proposed vmCAN attempts to synthesize images conditioned on the textual description and multiple relevant sub-images, …”
Examiner's Note: Examiner has cited particular columns and line numbers and/or paragraph numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






DARIOUSH AGAHI, P.E.
Primary Examiner

/DARIOUSH AGAHI/Primary Examiner, Art Unit 2656
Read full office action
Prosecution Timeline

Show 2 earlier events
Sep 11, 2025
Non-Final Rejection mailed — §103
Nov 10, 2025
Applicant Interview (Telephonic)
Nov 10, 2025
Examiner Interview Summary
Dec 08, 2025
Response Filed
Jan 15, 2026
Final Rejection mailed — §103
Feb 26, 2026
Request for Continued Examination
Feb 27, 2026
Response after Non-Final Action
May 07, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/102,483
Patent 12639515
ELECTRONIC APPARATUS RECOMMENDING CONTENT-BASED SEARCH TERMS AND CONTROL METHOD THEREOF
3y 4m to grant Granted May 26, 2026
18/395,319
Patent 12639526
METHODS AND APPARATUS TO SELF-GUARDRAIL LARGE LANGUAGE MODEL RESPONSES
2y 5m to grant Granted May 26, 2026
18/442,982
Patent 12639512
SYSTEMS AND METHODS FOR SEEDED NEURAL TOPIC MODELING
2y 3m to grant Granted May 26, 2026
18/497,721
Patent 12639361
ISSUE HANDLING USING UNSUPERVISED MACHINE LEARNING
2y 6m to grant Granted May 26, 2026
18/628,373
Patent 12609134
ONSET ZONE DETECTION USING COHERENT FOCUSING SUMMATION OVER MULTIPLE GEOMETRIC POSITIONS
2y 0m to grant Granted Apr 21, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+30.2%)
2y 7m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 174 resolved cases by this examiner. Grant probability derived from career allowance rate.