DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant claims benefit of US Provisional Application No. 63/468,132, filed May 22, 2023.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/22/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because method 1000 shown in FIG. 10 does not have descriptive labels in conformance with 37 CFR 1.84(n) and 1.84(o), or numbering that is further described in the specification. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Objections
Claim 20 is objected to because it depends on claim 17. It seems more appropriate for Claim 20 to depend from the claim 18. Appropriate correction is required.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 18 is rejected under 35 U.S.C. 102(a)(1) as being anticipated by Fang et al. (“A Comprehensive Pipeline for Complex Text-to-Image Synthesis”, JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, vol. 35, no. 3. 2020-05-01, page 522-537).
Regarding claim 18, Fang et al. discloses a non-transitory computer-readable medium storing instructions executable by one or more processors to perform a method of generating imagery, the method comprising: receiving a captured image of an object and a captured background; (see Abstract, “we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet.” and Fig.1)
receiving text or verbal input specifying scenery to be generated for the object; (see Introduction, “we firstly apply natural language processing tools to parse the input text and extract the names of necessary foreground objects, foreground objects’ attributes, mutual positional relationships, and background scene information.” and Fig.1, “Input Text”)
separating a foreground of the captured image from the captured background, the foreground including the object; (see 4.1 Foreground Objects Retrieval, “We use these masks to separate target objects from other parts of source images to form the foreground object dataset.” and Fig.1, “Foreground Retrieval”)
generating, based on the object, imagery in response the received input, the imagery depicting the object and the scenery corresponding to the received text or verbal input; and (see 5 Image Synthesis Using Constrained MCMC, “After selecting satisfactory foreground objects and background scene image, we put the foreground objects in the right place of the background image with a proper size to ensure that all the scene items comply with the constraints required in the input text.” and Fig.1, “Then we optimize the sizes and positions of the foreground objects on the background scene by the constrained MCMC method. Finally, we perform post-processing to obtain the blended final synthesis result.”)
applying one or more post-processing techniques to the generated scenery. (see Abstract “Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step.”, and Fig.1, “Post-Processing”)
PNG
media_image1.png
243
654
media_image1.png
Greyscale
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-9, 11-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fang et al. (“A Comprehensive Pipeline for Complex Text-to-Image Synthesis”, JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, vol. 35, no. 3. 2020-05-01, page 522-537) in view of Zhang et al. (US 20240005574 A1).
Regarding claim 1, Fang et al. discloses a method of generating imagery, comprising: receiving, with one or more processors, a captured image of an object and a captured background; (see Abstract, “we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet.” and Fig.1, “Finally, we perform post-processing to obtain the blended final synthesis result.”)
receiving text or verbal input specifying scenery to be generated for the object; (see Introduction, “we firstly apply natural language processing tools to parse the input text and extract the names of necessary foreground objects, foreground objects’ attributes, mutual positional relationships, and background scene information.” and Fig.1, “Input Text”)
separating, with the one or more processors, a foreground of the captured image from the captured background, the foreground including the object; (see 4.1 Foreground Objects Retrieval, “We use these masks to separate target objects from other parts of source images to form the foreground object dataset.” and Fig.1, “Foreground Retrieval”)
generating, with the one or more processors based on the object, imagery in response to the received input, the imagery depicting the object and the scenery corresponding to the received text or verbal input; and (see 5 Image Synthesis Using Constrained MCMC, “After selecting satisfactory foreground objects and background scene image, we put the foreground objects in the right place of the background image with a proper size to ensure that all the scene items comply with the constraints required in the input text.” and Fig.1, “Then we optimize the sizes and positions of the foreground objects on the background scene by the constrained MCMC method. Finally, we perform post-processing to obtain the blended final synthesis result.”)
However, Fang et al. fail to disclose applying one or more post-processing techniques to the generated scenery, including removing the depicted object from the generated imagery, retouching the scenery of the imagery in an area corresponding to the object, and re-inserting the object from the captured image into the imagery.
Zhang et al. teach, in the context of imagery generation, applying one or more post-processing techniques to the generated scenery, including removing the depicted object from the generated imagery, retouching the scenery of the imagery in an area corresponding to the object, and re-inserting the object from the captured image into the imagery. (see para. [0085], “As discussed above, in some embodiments, the object-aware texture transfer system 106 reinserts one or more extracted objects into a modified digital image after texture transference and harmonizes a background region of the modified digital image proximate to the reinserted objects. For example, FIG. 7 illustrates the object-aware texture transfer system 106 generating a harmonized digital image utilizing a harmonization neural network 706 having a dual-branched neural network architecture. Indeed, as shown in FIG. 7, the object-aware texture transfer system 106 provides a modified digital image 702 with a reinserted object (i.e., the person portrayed in the foreground) and a segmentation mask 704 (e.g., an object mask generated as described above in relation to FIGS. 4-5) corresponding to the reinserted object to a harmonization neural network 706.”)
PNG
media_image2.png
610
1062
media_image2.png
Greyscale
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include applying one or more post-processing techniques to the generated scenery, including removing the depicted object from the generated imagery, retouching the scenery of the imagery in an area corresponding to the object, and re-inserting the object from the captured image into the imagery in the method disclosed by Fang et al. according to the teaching of Zhang et al. in order to overcome shortcomings of generated digital imagery regarding accuracy, efficiency, and flexibility. (see Background of Zhang et al.)
Regarding claim 2, Fang et al. in view of Zhang et al. disclose all the limitations of claim 1, and Zang et al. further disclose where in generating the imagery comprises executing an artificial intelligence model (see para. [0019] of Zhang et al., “To further illustrate, in one or more embodiments, the object-aware texture transfer system identifies one or more object within a target digital image whose style or texture should be preserved after texture transfer from a source digital image. In one or more embodiments, the object-aware texture transfer system identifies the one or more objects utilizing an object detection model, such as a machine learning model or neural network, as described in further detail below.).
Regarding claim 3, Fang et al. in view of Zhang et al. disclose all the limitations of claim 1, and Zang et al. further disclose further comprising applying one or more pre-processing techniques to the object, including applying a border stroke to an outline of the object (see para. [0058] of Zhang et al., “As illustrated in FIG. 4, the object detection machine learning model 408 detects several objects for the digital image 416. In some instances, the detection-masking neural network 400 identifies all objects within the bounding boxes. For example, the bounding boxes comprise the approximate boundary area indicating the detected object. An approximate boundary refers to an indication of an area including an object that is larger and/or less accurate than an object mask. In one or more embodiments, an approximate boundary includes at least a portion of a detected object and portions of the digital image 416 not comprising the detected object. An approximate boundary includes any shape, such as a square, rectangle, circle, oval, or other outline surrounding an object. In one or more embodiments, an approximate boundary comprises a bounding box.”).
Regarding claim 4, Fang et al. in view of Zhang et al. disclose all the limitations of claim 3, and Zang et al. further disclose wherein applying the border stroke comprises automatically detecting, with the one or more processors, an edge of the object and applying the border stroke to the detected edge in response (see para. [0059] of Zhang et al., “Upon detecting the objects in the digital image 416, the detection-masking neural network 400 generates object masks for the detected objects. Generally, instead of utilizing coarse bounding boxes during object localization, the detection-masking neural network 400 generates segmentations masks that better define the boundaries of the object. The following paragraphs provide additional detail with respect to generating object masks for detected objects in accordance with one or more embodiments. In particular, FIG. 4 illustrates the object-aware texture transfer system 106 utilizing the object segmentation machine learning model 410 to generate segmented objects in accordance with some embodiments.”).
PNG
media_image3.png
646
1042
media_image3.png
Greyscale
Regarding claim 5, Fang et al. in view of Zhang et al. disclose all the limitations of claim 1, and Zang et al. further disclose wherein removing the object from the imagery comprises applying a repair mask to the generated imagery (see para. [0085] of Zhang et al., “Indeed, as shown in FIG. 7, the object-aware texture transfer system 106 provides a modified digital image 702 with a reinserted object (i.e., the person portrayed in the foreground) and a segmentation mask 704 (e.g., an object mask generated as described above in relation to FIGS. 4-5) corresponding to the reinserted object to a harmonization neural network 706.”).
Regarding claim 6, Fang et al. in view of Zhang et al. disclose all the limitations of claim 1, and Zang et al. further disclose wherein retouching the scenery of the imagery comprises blurring or feathering the imagery (see para. [0044] of Zhang et al., “Further, in some embodiments, the object-aware texture transfer system 106 utilizes inpainting 314 to fill holes corresponding the objects extracted by segmentation 306. For instance, as shown in FIG. 3, the object-aware texture transfer system 106 utilizes inpainting 314 to fill a hole in the first intermediate target digital image 308 to generate a second intermediate target digital image 316” and para. [0071] of Zhang et al., “In one or more implementations, the object-aware texture transfer system 106 utilizes a content aware fill machine learning model 516 in the form of a deep inpainting model to generate the content (and optionally fill) the hole corresponding to the removed object. For example, the object-aware texture transfer system 106 utilizes a deep inpainting model trained to fill holes.” and para [0071] of Zhang et al., “In one or more implementations, the object-aware texture transfer system 106 utilizes a content aware fill machine learning model 516 in the form of a deep inpainting model to generate the content (and optionally fill) the hole corresponding to the removed object.”).
Regarding claim 7 and claim 8, Fang et al. in view of Zhang et al. disclose the method of claim 1, further comprising upscaling the object prior to generating the imagery and the method of claim 7, further comprising downsizing the object upon placement in the generated imagery. (see 5.2 Scale Adjustment of Fang et al., “We set the sizes of foreground objects according to two factors, that is, intrinsic scaling factor and perspective factor. We first resize every foreground object’s bounding box, make them have the same height, and then adjust their scales according to the two factors.”, also see Fig. 7 of Fang et al, “Scale adjustment by (a) intrinsic scaling factor and (b) perspective factor.”)
PNG
media_image4.png
287
622
media_image4.png
Greyscale
Regarding claim 9, Fang et al. in view of Zhang et al. disclose all the limitations of claim 1, and Zang et al. further disclose further comprising: applying a mask to the captured image, the mask defining a shape of the object; and augmenting the mask (see para. [0047] of Zhang et al., “Thus, the scene-based image editing system 106 dilates (e.g., expands) the object mask of an object to avoid associated artifacts when removing the object. Dilating objects masks, however, presents the risk of removing portions of other objects portrayed in the digital image. For instance, where a first object to be removed overlaps, touches, or is proximate to a second object, a dilated mask for the first object will often extend into the space occupied by the second object. Thus, when removing the first object using the dilated object mask, significant portions of the second object are often removed and the resulting hole is filled in (generally improperly), causing undesirable effects in the resulting image. Accordingly, the scene-based image editing system 106 utilizes smart dilation to avoid significantly extending the object mask of an object to be removed into areas of the digital image occupied by other objects.”).
Regarding claim 11, the system claim 11 is similar in scope to claim 1 and is rejected under the same rationale.
Regarding claim 12, the system claim 12 is similar in scope to claim 2 and is rejected under the same rationale.
Regarding claim 13, the system claim 13 is similar in scope to claim 3 and is rejected under the same rationale.
Regarding claim 14, the system claim 14 is similar in scope to claim 4 and is rejected under the same rationale.
Regarding claim 15, the system claim 15 is similar in scope to claim 5 and is rejected under the same rationale.
Regarding claim 16, the system claim 16 is similar in scope to claim 7 and is rejected under the same rationale.
Regarding claim 17, the system claim 17 is similar in scope to claim 8 and is rejected under the same rationale.
Regarding claim 19, Fang et al. disclose all the limitation of claim 18, but do not disclose wherein generating the imagery comprises executing an artificial intelligence model. However, Zhang et al. teach, in the context of imagery generation, where in generating the imagery comprises executing an artificial intelligence model (see para. [0019] of Zhang et al., “To further illustrate, in one or more embodiments, the object-aware texture transfer system identifies one or more object within a target digital image whose style or texture should be preserved after texture transfer from a source digital image. In one or more embodiments, the object-aware texture transfer system identifies the one or more objects utilizing an object detection model, such as a machine learning model or neural network, as described in further detail below.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include where in generating the imagery comprises executing an artificial intelligence model in the method disclosed by Fang et al. according to the teaching of Zhang et al. in order to overcome shortcomings of generated digital imagery regarding accuracy, efficiency, and flexibility. (see Background of Zhang et al.)
Regarding claim 20, Fang et al. disclose all the limitation of claim 18, but does not disclose wherein removing the object from the imagery comprises applying a repair mask to the generated imagery. However, Zhang et al. teach, in the context of imagery generation, wherein removing the object from the imagery comprises applying a repair mask to the generated imagery (see para. [0085] of Zhang et al., “Indeed, as shown in FIG. 7, the object-aware texture transfer system 106 provides a modified digital image 702 with a reinserted object (i.e., the person portrayed in the foreground) and a segmentation mask 704 (e.g., an object mask generated as described above in relation to FIGS. 4-5) corresponding to the reinserted object to a harmonization neural network 706.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include wherein removing the object from the imagery comprises applying a repair mask to the generated imagery in the method disclosed by Fang et al. according to the teaching of Zhang et al. in order to overcome shortcomings of generated digital imagery regarding accuracy, efficiency, and flexibility. (see Background of Zhang et al.)
Allowable Subject Matter
Claim 10 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is an examiner’s statement of reasons for allowance:
Prior art does not teach the method of claim 10, wherein the one or more post-processing techniques comprises detecting whether the object in the generated imagery appears to be floating, the detecting comprising: generating a depth map for the object from the generated imagery; generating an object mask from the depth map; generating a convex hull of the object mask; calculating an integral of the mask while vertically displacing the object mask downward; computing a surface region beneath the object by subtracting the integral from the convex hull mask; computing a depth for the object mask and a depth of the surface region; and computing depth displacement based on the depth for the object mask and the depth of the surface region; and determining whether a normalized value for the computed depth displacement falls within a predetermined range., in combination with limitations recited in claim 1.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hyorim Park whose telephone number is (571)272-3859. The examiner can normally be reached Monday - Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at (571) 272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Hyorim Park/Examiner, Art Unit 2619 /JASON CHAN/Supervisory Patent Examiner, Art Unit 2619