DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 4, 8, 15, 21, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Suzuki (U.S. Publication 2020/0160575) in view of Achlioptas (U.S. Publication 2024/0112401).
As to claim 1, Suzuki discloses a method comprising:
obtaining user input that indicates a target color and a semantic label for a region of an image to be generated (p. 3, sections 0042-0048; a target color and a semantic label such as “ears” or “nose” is applied to a region by a user for a region for an image to be generated in an editing area);
generating a noise map including noise biased towards the target color in the region indicated by the user input (p. 1, section 0016; p. 4, section 0051; an initial intermediate image, which begins as a random noise map, is changed so that is more similar to/biased towards features in the edited image patch, which would include color);
and generating the image based on the noise map and the semantic label for the region wherein the image includes an object in the region that is described by the semantic label (p. 2, section 0022; p. 3, sections 0044-0048; p. 4, section 0053; an image is generated based on the intermediate representation/noise map including a generated version of the labeled area; for example, a dog’s ears can be changed to a lion’s ears).
Suzuki discloses a generative model (p. 1, section 0002), but does not expressly disclose that this is a diffusion model. Suzuki discloses that features, of which color would be one, are taken into account, as discussed above, but Suzuki does not expressly disclose that the image has the target color. Achlioptas, however, discloses a diffusion model where a user draws a color on a canvas and that color is used to generate an image with that target color (fig. 11; p. 2, sections 0026-0030; p. 3, sections 0036-0039; a user scribbles/draws on a figure image/canvas and colors from the scribble are used to generate an edited image with new garments that match the scribble color or colors). The motivation for this is to digitize garments with very simple user input using a method that demonstrates superior quality in image and point cloud generation (p. 3, section 0037; p. 7, section 0073). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Suzuki to use a diffusion model where a user draws a color on a canvas and that color is used to generate an image with that target color in order to digitize garments with very simple user input using a method that demonstrates superior quality in image and point cloud generation as taught by Achlioptas.
As to claim 4, Suzuki discloses wherein: the user input comprises a user drawing on an image canvas depicting the target color in the region (p. 3, sections 0042-0048; a target color is painted/drawn on the image by the user, which would make the image a painting/drawing “canvas”).
As to claim 8, Achlioptas discloses beginning a reverse diffusion process at an intermediate step of the diffusion model, wherein the image is based on an output of the reverse diffusion process (p. 3, section 0039-p. 4, sections 0040; a backward/reverse diffusion process is started as an intermediate step after the forward process and the image is based on the gradual denoising as part of the backward/reverse process). Motivation for the combination of references is given in the rejection to claim 1.
As to claim 15, see the rejection to claim 1. Further, Suzuki discloses an apparatus comprising one or more processors; and one or more memories including instructions executable by the one or more processors to perform the method (p. 5, section 0073-p. 6, section 0078).
As to claim 21, see the rejection to claims 1 and 15.
As to claim 26, see the rejection to claim 8.
Claims 2, 3, 5, 18, and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Suzuki and Achlioptas and further in view of Liu (U.S. Publication 2022/0114698).
As to claim 2, Suzuki does not disclose, but Liu discloses displaying a user interface to a user, wherein the user interface includes a label input field (p. 5, section 0065-p. 6, section 0067; a user can select or specify a label for a specific region; where a user specifies a label or selects a label in an interface would read on a label input field), a color input field (p. 6, section 0067; a user can input and select a color for a label; where a user selects this color would read on a color input field), and a selection tool for selecting a region of an image canvas (p. 5, section 0061; p. 6, section 0067; a user can select a region using various tools such as draw, paint, resize, etc.), and wherein the user input is received via the user interface (p. 6, section 0067). The motivation for this is to enable users to easily control style and content of synthesis results, as well as to create multi-modal images (p. 7, section 0070). It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Suzuki and Achlioptas to display a user interface to a user, wherein the user interface includes a label input field, a color input field, and a selection tool for selecting a region of an image canvas, and wherein the user input is received via the user interface in order to enable users to easily control style and content of synthesis results, as well as to create multi-modal images in order to enable users to easily control style and content of synthesis results, as well as to create multi-modal images as taught by Liu.
As to claim 3, Suzuki does not disclose, but Liu discloses wherein: the user input indicates an additional target color and an additional semantic label for an additional region of an image canvas, and wherein the image includes an additional object in the additional region that is described by the additional semantic label and that has the additional target color (p. 6, sections 0067-0068; similar to the first region, a plurality of regions can be defined with colors and labels; additional objects such as rocks and bodies of water can be included in the additional region). Motivation for the combination is given in the rejection to claim 2.
As to claim 5, Suzuki does not disclose, but Liu discloses wherein: the user input includes layout information indicating a plurality of regions of an image canvas (p. 6, section 0067; a user can adjust layout of a region by draw, paint, resize, etc.), and wherein each of the plurality of regions is associated with a corresponding target color and a corresponding semantic label (p. 6, section 0067; each region has a selected label and associated color). Motivation for the combination is given in the rejection to claim 2.
As to claim 18, see the rejection to claim 5.
As to claim 22, see the rejection to claim 2.
As to claim 23, see the rejection to claim 3.
As to claim 24, see the rejection to claim 5.
Claims 6, 7, 16, 17, 19, 20, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Suzuki and Achlioptas and further in view of Min (U.S. Publication 2024/0087179).
As to claim 6, Suzuki does not disclose but Min discloses, wherein: the diffusion model is trained by generating a predicted image based on layout information, computing a loss function based on the predicted image, and updating parameters of the diffusion model based on the loss function (p. 2, section 0027-p. 3, section 0034; a loss function based on a synthesized/predicted frame image is used to train the diffusion model, with parameters in the model updating each iteration). The motivation for this is to better denoise an image. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Suzuki and Achlioptas to have the diffusion model trained by generating a predicted image based on layout information, compute a loss function based on the predicted image, and update parameters of the diffusion model based on the loss function in order to better denoise an image as taught by Min.
As to claim 7, Min discloses wherein: the loss function comprises a perceptual loss (p. 2-3, section 0031). Motivation for the combination of references is given in the rejection to claim 6.
As to claim 16, Min discloses wherein the diffusion model comprises a U-Net architecture (p. 2, section 0025; p. 3, section 0034). Motivation for the combination of references is given in the rejection to claim 6.
As to claim 17, Min discloses wherein the diffusion model comprises a text-guided diffusion model (p. 1, section 0016-p. 2, section 0019). The motivation for this is to allow a user to only input text and possibly an image. It would have been obvious to one skilled in the art before the effective filing date of the claimed invention to modify Suzuki and Achlioptas to have the diffusion model comprise a text-guided diffusion model in order to allow a user to only input text and possibly an image in order to as taught by Min.
As to claim 19, Suzuki discloses limiting an input image to a particular region, as noted in the rejection to claim 1. Suzuki does not disclose, but Min discloses, wherein the instructions are further executable to: generate an object representation based on the semantic label and the input image using a perception model, wherein the image is generated based on an intermediate noise prediction from the diffusion model and the object representation (p. 1, section 0017-p. 2, section 0025; p. 2, section 0031-p. 3, section 0036; an input class/semantic label and an input image are combined to synthesize a representation of the output frame objects using a model that takes into account loss based on perception; the process goes through various iterations of intermediate noise predictions based on this and eventually produces output frames). Motivation for the combination of references is given in the rejection to claim 6.
As to claim 20, Min discloses wherein: the perception model comprises a multi-modal encoder (p. 1, section 0017; p. 2, section 0019; the model encodes feature embeddings of both input text and a subject image, making it multi-modal). Motivation for the combination of references is given in the rejection to claim 6.
As to claim 25, see the rejection to claim 6.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AARON M RICHER whose telephone number is (571)272-7790. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571)272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AARON M RICHER/Primary Examiner, Art Unit 2617