Last updated: April 19, 2026
Application No. 18/240,837
SYSTEMS AND METHODS FOR LAYERED IMAGE GENERATION

Final Rejection §103
Filed
Aug 31, 2023
Examiner
PRINGLE-PARKER, JASON A
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Adeia Imaging LLC
OA Round
2 (Final)
Interview Optional

— +12.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 546 resolved cases, 2023–2026
Examiner Intelligence

PRINGLE-PARKER, JASON A View full profile →
Grants 84% — above average
Career Allow Rate
456 granted / 546 resolved
+21.5% vs TC avg
Moderate +13% lift
Without
With
+12.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
25 currently pending
Career history
571
Total Applications
across all art units
Statute-Specific Performance

§101
9.5%
-30.5% vs TC avg
§103
44.3%
+4.3% vs TC avg
§102
24.5%
-15.5% vs TC avg
§112
12.0%
-28.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 546 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Arguments
Regarding 35 USC § 102/103.
Applicant argues:
Applicant submits that the cited art, alone or in combination, fail to teach or suggest, at least the following features:"generating, using a trained machine learning model and based on the text input, a single-layer image comprising a plurality of objects; ... extracting, from the text input, a portion of the text input describing a background portion of the single-layer image; generating, using the trained machine learning model and based on the extracted portion of the text input, a background image," as recited in Applicant's claim 1 (and as similarly recited in independent claim 12). 
Pages 2-3 of the Office Action, in rejecting claim 1, cite paragraphs [0029], [0056], and [0147] of Cohen as allegedly teaching "generating, using a trained machine learning model and based on the text input, a single-layer image comprising a plurality of objects", as recited in Applicant's claim 1. Such sections of Cohen describe "image to be edited 106 can be obtained ... from another computing device, from file storage on computing device 104-1 ..., by taking a picture with a camera on computing device 104-1." Cohen is silent on any mention of a machine learning model, let alone a machine learning model that generates an initial single-layer image comprising a plurality of objects based on text input. At best, Cohen describes a neural network to identify objects in the (already obtained) image to be edited, in order to remove an identified object from the image, rather than for generating an initial image. 
It follows that Cohen does not disclose extracting, from the same text input used to generate the initial image, a text portion describing a background portion of the image, and inputting, to the same machine learning model used to generate the initial image, such background portion, to obtain a background image, for use in generating a multi-layer image. 
Examiner replies that:
Applicants arguments are not found persuasive. Examiner agrees that the text input is not a single prompt such as in Applicants Fig. 2 202, however Cohen has a direct user conversation 204 as shown in Cohen Fig. 2 that is a text input, where the text input is numerous individual inputs from the user. Therefore the entirety of the text conversations is considered a “text input”.
 As indicated in the Non-Final Rejection, a user can generate an image and save it, and input it back into the system for editing, which makes it a machine learning model that generates an initial single-layer image comprising a plurality of objects based on text input. The text conversation is then used for further processing.
Further clarification of the text input in the claim language would likely overcome the prior art.
Allowable Subject Matter
Claim 3 is allowed. The claim recites “determining that, as a result of the segmenting, each respective image of the plurality of images comprises one or more empty regions at a portion of the respective image at which one of more objects of the plurality of objects is depicted in the single-layer image; and modifying at least one empty region of the one or more empty regions by causing the at least one empty region to be filled in; wherein generating the multi-layer image is based on the plurality of images, having the at least one modified empty region, and the background image.” and “determining that a size of the at least one empty region does not exceed a threshold; and in response to determining that the size of the at least one empty region does not exceed the threshold, performing the modifying of the at least one empty region.” The prior art does not teach these limitations in combination with the other limitations.
Claims 5, 6, 11, 14, 16-17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 4, 8-10, 12-13, 15, 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen U.S. Patent/PG Publication 20190196698 in view of Kong U.S. Patent/PG Publication 20200175975.
Regarding claim 1:
A computer-implemented method for generating a multi-layer image based on text input, the method comprising: 
 generating, using a trained machine learning model and based on the text input, (Cohen [0029] In the example illustrated in FIG. 1, computing device 104-1 obtains an image to be edited 106. An image to be edited 106 can be obtained in any suitable way, such as from another computing device, from file storage on computing device 104-1 (discussed in more detail below), by taking a picture with a camera on computing device 104-1, and the like.)(Cohen [0147] Furthermore, display panel 526 includes controls 530, suitable to control an image displayed in display panel 526. Controls 530 can include any suitable control, such as adjusters for brightness, contrast, color, selection of filters, shading, crop, overlay, saving an image, a number of pixels in an image, aspect ratio, and the like. In one example, controls 530 includes a selection to export a harmonized image, e.g., harmonized image 528, so that it is sent to a computing device, such as one of computing devices 104 in FIG. 1. For instance, harmonized image 528 can be generated and displayed on a user interface of one of computing devices 104 and sent to another of computing devices 104 by a selection in controls 530. In one example, the selection requires a single user action, such as enabling a “send to friends” button that causes harmonized image 528 to be sent to a predetermined list of computing devices, persons, or combinations thereof. )(Cohen [0056] Conversation module 144 provides an editing query included in (or indicated by) a directed user conversation to modules of image enhancement system 110. An editing query provided by conversation module 144 can be any suitable type of editing query. In one example, an editing query includes a transcript of a directed user conversation (e.g., text in ASCII format).) since an image can be generated, saved, and input back into the system for editing.
 a single-layer image comprising a plurality of objects (Cohen [0045] Image enhancement system 110 also includes image gallery module 142. Image gallery module 142 is representative of functionality configured to maintain images associated with a user, such as user 102. For instance, image gallery module 142 can manage image libraries of a user, including images stored in a user's image editing application, such as Photoshop®. Furthermore, image gallery module 142 integrates images a user manipulates on or from one of computing devices 104 into a gallery of images stored on storage 126, such as images a user posts in a social media post or blog from one of computing devices 104, images a user has attached to an email, text, or other communication sent from or received by computing devices 104, and the like. Image gallery module 142 makes image from galleries maintained by image gallery module 142 available to image enhancement application 120, e.g., to be used for fill material or replacement material. Images maintained by image gallery module 142 can be stored in image data 138.)
 segmenting the single-layer image to generate a plurality of images, each image of the plurality of images comprising a depiction of a respective object of the plurality of objects of the single-layer image (Cohen [0058]-[0060] Vision module 146 is representative of functionality configured to ascertain pixels of an image to be edited corresponding to an object to be removed or replaced indicated in a directed user conversation. Vision module 146 performs a segmentation of pixels in an image to be edited to determine pixels corresponding to an object in any suitable way. In one example, vision module 146 includes one or more neural networks that have been trained to identify a specific object, such as a sky, fire hydrant, background, car, person, face, and the like. Hence, vision module 146 can use a neural network trained to identify a specific object indicated by an editing query when ascertaining pixels of the object in the image to be edited. For instance, if an editing query includes a remove request “Remove the fire hydrant”, vision module 146 ascertains pixels in the image that correspond to a fire hydrant using a neural network trained to identify fire hydrants with training images including variations of fire hydrants. Additionally or alternatively, vision module 146 ascertains pixels in an image to be edited that correspond to an object in an image using a neural network that is not trained to identify the specific object, such as a neural network trained to identify insects, birds, or bats when the object to identify is a butterfly.)(Cohen [0080] Consequently, compositing module 152 can produce a plurality of composite images, each of which include fill material from a different image, a same image, or combinations thereof.).
 extracting, from the text input, a portion of the text input describing a background portion of the single-layer image; generating, using the trained machine learning model and based on the extracted portion of the text input, a background image (Cohen [0021] An image to be edited is obtained (e.g., a user loads the image into an image editing application). The image to be edited is processed by a vision module that can be specific to an object, such as an object to be replaced. For instance, a sky vision module including a neural network trained to identify skies is used to ascertain pixels of a sky in an image when an object to be replaced in the image is identified as a sky, such as for the replace request “Replace the boring sky with a cloudy sky”. Moreover, ascertaining the pixels corresponding to an object in an image can include generating an object mask for the object, dilating the object mask to create a region bounded by a boundary of the object mask, and generating a refined mask representing the pixels corresponding to the object by separating a background from a foreground in the region. Furthermore, contributions to pixels in the region from the background of the image can be removed in a background decontamination process.).
and generating the multi-layer image based on the plurality of images and the background image (Cohen [0086] Harmonizing module 154 is representative of functionality configured to harmonize a composite image to form a harmonized image that looks natural and removes artifacts of image editing, including removing an object and replacing an object in an image. Harmonizing module 154 can harmonize a composite image in any suitable way to make it look natural and unedited. For instance, harmonizing module 154 can adjust the lighting locally or globally in an image. In one example, lighting is adjusted in one portion of a composite image (e.g., in replacement material) to make it match the lighting in another portion of the composite image. The lighting may be adjusted to account for different times of day between the replacement material and an image to be edited, and thus may adjust shadows and highlights in a harmonized image to match times of day.).
Cohen discloses the above elements in several embodiments.  With the embodiments being disclosed in a single reference, one of ordinary skill in the art  before the effective filing date of the invention being aware of one embodiment would also have been aware of the others, and it would have been obvious to one of ordinary skill in the art at the time of the filing of the invention to have combined these elements from two or more embodiments into a single arrangement for the benefit of enjoying the advantages of all the embodiments disclosed combined into a single arrangement.
Cohen discloses a multi-layer image as describe above. However, for the purposes of compact prosecution and for further clarity, in a related field of endeavor, Kong teaches:
and generating the multi-layer image based on the plurality of images and the background image (Kong Fig. 6A-6I: Photoshop with Layers)(Kong [0078] For example, a visual instruction may be the text “Click Background in the Layers panel below to edit it.”)
Therefore, it would have been obvious before the effective filing date of the claimed invention to use multiple layers as taught by Kong. The rationale for doing so would have been that it combines prior art elements according to known methods to yield predictable results since Cohen is using voice commands to edit images and separates objects from images, and Kong is using voice commands to edit images and separate objects from images and explicitly uses layers in a UI, where there would be predictable results since functionality of editing images is not changed. Therefore it would have been obvious to combine Kong with Cohen to obtain the invention.
Regarding claim 2:
 The method of claim 1, has all of its limitations taught by Cohen in view of Kong. Cohen further teaches  further comprising: 
 determining that, as a result of the segmenting, each respective image of the plurality of images comprises one or more empty regions at a portion of the respective image at which one of more objects of the plurality of objects is depicted in the single-layer image  and modifying at least one empty region of the one or more empty regions by causing the at least one empty region to be filled in wherein generating the multi-layer image is based on the plurality of images, having the at least one modified empty region, (Cohen [0005] [0019] Based on parameters of the directed user conversation, a plurality of images is obtained that include fill material to fill a hole created when an object is removed according to a remove request, or replacement material to replace an object according to a replace request. In one example, fill material is recognized as similar to different pixels of the image than the pixels of the image corresponding to an object to be removed. For instance, when removing a fire hydrant from a lawn, the fill material may be similar to pixels of the lawn. In one example, replacement material is recognized as similar to pixels of the image corresponding to an object to be removed (e.g., when replacing a boring sky with a cloudy sky, the pixels are similar because they represent skies). Furthermore, the replacement material corresponds to the replace request (e.g., when a replace request indicates replacing a boring sky with a cloudy sky, replacement material represents a cloudy sky indicated by the replace request).)
 and the background image (Cohen [0021] An image to be edited is obtained (e.g., a user loads the image into an image editing application). The image to be edited is processed by a vision module that can be specific to an object, such as an object to be replaced. For instance, a sky vision module including a neural network trained to identify skies is used to ascertain pixels of a sky in an image when an object to be replaced in the image is identified as a sky, such as for the replace request “Replace the boring sky with a cloudy sky”. Moreover, ascertaining the pixels corresponding to an object in an image can include generating an object mask for the object, dilating the object mask to create a region bounded by a boundary of the object mask, and generating a refined mask representing the pixels corresponding to the object by separating a background from a foreground in the region. Furthermore, contributions to pixels in the region from the background of the image can be removed in a background decontamination process.).
Regarding claim 4:
 The method of claim 2, has all of its limitations taught by Cohen in view of Kong. Cohen further teaches  wherein modifying the at least one empty region by causing the at least one empty region to be filled in comprises performing inpainting of the at least one empty region (Cohen [0005] [0019] Based on parameters of the directed user conversation, a plurality of images is obtained that include fill material to fill a hole created when an object is removed according to a remove request, or replacement material to replace an object according to a replace request. In one example, fill material is recognized as similar to different pixels of the image than the pixels of the image corresponding to an object to be removed. For instance, when removing a fire hydrant from a lawn, the fill material may be similar to pixels of the lawn. In one example, replacement material is recognized as similar to pixels of the image corresponding to an object to be removed (e.g., when replacing a boring sky with a cloudy sky, the pixels are similar because they represent skies). Furthermore, the replacement material corresponds to the replace request (e.g., when a replace request indicates replacing a boring sky with a cloudy sky, replacement material represents a cloudy sky indicated by the replace request).)
Regarding claim 8:
 The method of claim 1, has all of its limitations taught by Cohen in view of Kong. Cohen further teaches  further comprising: 
 receiving input of a particular image, wherein the particular image is included as an object of the plurality of objects in the generated single-layer image based on the received input of the particular image;  generating, for display at a graphical user interface, the multi-layer image, wherein the graphical user interface comprises one or more options to modify the multi-layer image; receiving selection of the one or more options and modifying the multi-layer image based on the received selection (Cohen [0023] In one example, intermediate results are exposed in a user interface. For instance, an image may be presented to a user with an indicator of a candidate object, such as a lasso surrounding an object requested to be replaced in an editing query. To confirm selection of the candidate object, multi-modal input is received. Multi-modal input includes multiple forms of input received during a directed user conversation to indicate a same action. For instance, spoken instructions (e.g., “Move the lasso towards the fire hydrant and away from the dog's head”) and a selection from a mouse (e.g., a mouse click, hold, and drag of a lasso) to confirm selection of an object. Consequently, images are efficiently presented to a user that satisfy an editing query, and at the same time instruct the user on the use of the editing application while operating on the user's actual data, rather than a tutorial with canned data. Hence, a user is able to efficiently communicate with a computing device (e.g., a personal assistant) implementing the techniques described herein, and does not have to rely on an additional party, like an on-line editing service, friend, co-worker, or acquaintance, to enhance an image by removing or replacing objects in the image. As a result, a user is able to automatically obtain multiple harmonized images without appreciable delay (e.g., seconds or minutes, rather than hours, days, or weeks) that each fulfill the user's spoken query by participating in a directed user conversation, providing multi-modal input, or combinations thereof.).
Regarding claim 9:
 The method of claim 1, has all of its limitations taught by Cohen in view of Kong. Cohen further teaches  further comprising:
 generating a plurality of variations of the multi-layer image based on the plurality of images and the background image (Cohen [0018] Accordingly, this disclosure describes systems and techniques for directing a user conversation to obtain an editing query, and providing a plurality of images that have been enhanced by fulfilling a remove request or a replace request with different content, such as content obtained from a database of stock images, based on the editing query. Multi-modal user input can be received during the directed user conversation, including a complementary user input to speech input (e.g., a mouse click, touch on a touchscreen, and the like) during the directed user conversation, to increase the reliability of communications between a user and a computing device. A user conversation can be directed by broadcasting a query to a user, receiving a user response, and responding to the user based on the user response. Received user responses are processed to determine parameters of an editing query, such as whether the user conversation indicates a remove request or replace request, objects to be removed, objects to be replaced, objects to replace objects, modifiers of objects, combinations thereof, and the like. The directed user conversation can include broadcasting a query, receiving a user response, and responding to a user response any suitable number of times, e.g., initiated by an image editing application on a computing device. The scope of questioning for each volley of questioning and response in the directed user conversation can be set in any suitable way, such as based on a previously received user response, whether an editing query indicates a remove request or replace request, availability of resources to fulfill an editing request, and the like.).
Regarding claim 10:
 The method of claim 9, has all of its limitations taught by Cohen in view of Kong. Cohen further teaches  wherein the plurality of variations comprise a first variation and a second variation, and one or more of a size, location, or appearance of a first object of the plurality of objects in the first variation is different from one or more of a size, location, or appearance of the first object in the second variation (Cohen [0055] For instance, if during the course of a user conversation a user requests to “add an old pickup truck” to an image to be edited, conversation module 144 may initiate a search of a database of stock images to identify what types of old pickup trucks are included in images of the database, and provide an appropriate question in reply to the user, such as “Would you like a 1946 Chevrolet or a 1955 Ford pickup truck?” based on the database including images with a 1946 Chevrolet pickup truck and a 1955 Ford pickup truck.) since variations may be displayed, and both variations are not required to be displayed at the same time.
Regarding claim 12:
The claim is a parallel version of claim 1. As such it is rejected under the same teachings.
Regarding claim 13:
The claim is a parallel version of claim 2. As such it is rejected under the same teachings.
Regarding claim 15:
 The claim is a parallel version of claim 4. As such it is rejected under the same teachings.
Regarding claim 19:
The claim is a parallel version of claim 8. As such it is rejected under the same teachings.
Regarding claim 20:
The claim is a parallel version of claim 9. As such it is rejected under the same teachings.
Claim(s) 7, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cohen U.S. Patent/PG Publication 20190196698 in view of Kong U.S. Patent/PG Publication 20200175975 and Faaborg U.S. Patent/PG Publication 20230026575.
Regarding claim 7:
 The method of claim 1, has all of its limitations taught by Cohen in view of Kong. Cohen further teaches  further comprising: 
 generating a depth  (Cohen [0101] Vision module 146 ascertains pixels of the image to be edited 302 corresponding to “the woman in the front of the image”. Because vision module 146 has semantic understanding of the image, it is able to accurately determine which woman in the image is the woman in the front of the image. Accordingly, vision module 146 generates object mask 306 for the object “the woman in the front of the image”. Object mask identifies a rough set of pixels corresponding to the object, without including content of the image. For instance, pixels of an object mask may be binary colors, indicating inclusion or exclusion in the object mask. In the example object mask 306, white pixels are included in the object mask and black pixels are excluded from the object mask.).
Cohen does not teach  a depth map. In a related field of endeavor, Faaborg teaches:
generating a depth map for the single-layer image, wherein generating the multi-layer image further comprises ordering the plurality of images, respectively corresponding to a plurality of layers of the multi-layer image, based on the depth map (Faaborg [0003] In general aspect, a computer-implemented method includes receiving a two-dimensional (2-D) image of a scene captured by a camera, and recognizing one or more objects in the scene depicted in the 2-D image. The method also includes determining whether the one or more recognized objects have known real-world dimensions, and determining a depth from the camera of at least one recognized object having known real-world dimensions.)(Faaborg [0024] 2-D image 100 may be segmented or divided into different segments or layers so that equally distant objects belong to one segment or layer and unequally distant objects belong to correspondingly different segments or layers of the image.)
Therefore, it would have been obvious before the effective filing date of the claimed invention to use a depth map as taught by Faaborg. The motivation for doing so would have been improved realism, where Cohen adds/removes objects from a scene, and Faaborg uses a depth map for properly integrating (size and positioning) objects to be more realistic (Faaborg [0011]). Therefore it would have been obvious to combine Faaborg with Cohen to obtain the invention.
Regarding claim 18:
The claim is a parallel version of claim 7. As such it is rejected under the same teachings.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON PRINGLE-PARKER whose telephone number is (571) 272-5690 and e-mail is jason.pringle-parker@uspto.gov. The examiner can normally be reached on 8:30am-5:00pm est Monday-Friday. If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, King Poon can be reached on (571) 270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
 Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, seehttp://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JASON A PRINGLE-PARKER/
Primary Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Aug 31, 2023
Application Filed
Aug 21, 2025
Non-Final Rejection — §103
Nov 19, 2025
Applicant Interview (Telephonic)
Nov 20, 2025
Examiner Interview Summary
Nov 25, 2025
Response Filed
Jan 23, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/302,622
Patent 12603978
SYSTEM AND METHOD FOR PARALLAX CORRECTION FOR VIDEO SEE-THROUGH AUGMENTED REALITY
2y 5m to grant Granted Apr 14, 2026
18/481,379
Patent 12597181
HIGH DYNAMIC RANGE DIGITAL IMAGE EDITING VISUALIZATIONS
2y 5m to grant Granted Apr 07, 2026
18/608,654
Patent 12597210
GENERATING POLYGON MESHES APPROXIMATING SURFACES WITH SUB-CELL FEATURES
2y 5m to grant Granted Apr 07, 2026
18/511,177
Patent 12592008
INFORMATION ANALYSIS SYSTEM, INFORMATION ANALYSIS METHOD, AND NON-TRANSITORY RECORDING MEDIUM
2y 5m to grant Granted Mar 31, 2026
17/966,629
Patent 12586205
SYSTEM AND METHOD FOR DETECTING A BOUNDARY IN IMAGES USING MACHINE LEARNING
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
84%
Grant Probability
96%
With Interview (+12.7%)
2y 5m
Median Time to Grant
Moderate
PTA Risk
Based on 546 resolved cases by this examiner. Grant probability derived from career allow rate.