DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
2. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 03/23/2026 has been entered.
Claim Rejections - 35 USC § 103
3. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
4. Claims 1-3 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over “Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model” by Cheng et al., (“Cheng”) in view of Westcott et al., (“Westcott”) [US-2025/0209700-A1] with the Provisional application No. 63/613,658, filed on Dec. 21, 2023, further in view of ("Sketch-Guided Text-to-Image Diffusion Models" by Voynov et al., (“Voynov”)
Regarding claim 1, Cheng discloses a method (Cheng- Figure 1 shows Three-dimension controls of image generation from stroke and sketch. (left) Our proposed model is able provide three-dimension controls over image synthesis from stroke and sketch. Given sketch and stroke as input, we can control the scales of faithfulness for the synthesized output with respect to the sketch and stroke, as well as the degree of its realism; Abstract, at least discloses a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models; page 4056, right column, section 3. Method, at least discloses aims to perform image generation conditioned on the input of stroke and sketches with three-dimensional control over the faithfulness to the conditions and the realism of the synthesized output) comprising:
obtaining a sketch input and a value of a fidelity parameter indicating a level of adherence to the sketch input (Cheng- Figure 1 shows Three-dimension controls of image generation from stroke and sketch. (left) Our proposed model is able provide three-dimension controls over image synthesis from stroke and sketch. Given sketch and stroke as input, we can control the scales of faithfulness for the synthesized output with respect to the sketch and stroke, as well as the degree of its realism [value of a fidelity parameter]. (right) (a) Given sketch and strokes, we perform sketch/stroke-to-image translation. (b) We generate multimodal results with partial sketch/strokes as input; Figure 7 shows the realism scale is varied from low (0.0, right) to high (1.0, left); page 4055, section 1. Introduction, left column, last paragraph, at least discloses Users can decide to what extent the faithfulness should be to the input sketch [sketch input] and strokes, and to what degree the results are close to real images [value of a fidelity parameter indicating a level of adherence]; page 4056, right column, section 3. Method, at least discloses method, starting from the preliminaries for diffusion models (Section 3.1) and the modifications we make for realizing the conditional generation and the discussion for the sketch and stroke guidance […] and the control over realism);
encoding, using a control network of an image generation model, the sketch input and the value of the fidelity parameter to obtain sketch guidance information (Cheng- page 4055, right column, 2nd paragraph, at least discloses a unified framework of adaptively-realistic image generation from stroke and sketch that encodes the condition of the given stroke and sketch with the classifier-free guidance mechanism [sketch guidance information] and adjusts the degree of realism [value of the fidelity parameter] with a latent variable refinement technique), wherein the control network takes the fidelity parameter as input (Cheng- Figure 2 shows Conditional denoising process . At each time-step t, our proposed pipeline first performs classifier-free guidance with csketch and cstroke [input], which are extracted from a single input of colorful drawing ccomb, and then controls the fidelity/realism by refining xt−1 [the sketch input] with the input ccomb, in which such realism control is realized by iterative latent variable refinement; page 4055, right column, 2nd paragraph, at least discloses adjusts the degree of realism [the fidelity parameter] with a latent variable refinement technique); and
generating, using the image generation model, a synthesized image based on the sketch guidance information (Cheng- page 4055, right column, 2nd paragraph, at least discloses a unified framework of adaptively-realistic image generation from stroke and sketch that encodes the condition of the given stroke and sketch with the classifier-free guidance mechanism [sketch guidance information] and adjusts the degree of realism with a latent variable refinement technique. The proposed framework enables a three-dimensional control over image synthesis with flexibility and controllability over shape, color, and realism of the generation, given the input stroke and sketch. Moreover, our proposed work unleashes several interesting applications: multi-conditioned local editing, region-sensitive stroke-to-image, and multi-domain sketch-to-image; page 4056, right column, section 3. Method, at least discloses method, starting from the preliminaries for diffusion models (Section 3.1) and the modifications we make for realizing the conditional generation and the discussion for the sketch and stroke guidance [sketch guidance] (enabled by the technique of classifier-free diffusion guidance, Section 3.2) […] and the control over realism; page 4057, section 3.2. Sketch- and Stroke-Guided Diffusion Model, at least discloses To generate images based on the given sketches and strokes, our proposed method concatenates the sketch condition csketch and the stroke condition cstroke along with xt as input for the U-Net model […] To separately control the guidance level of the sketch and stroke conditions, we leverage classifier-free guidance [9] and modify it for two-dimensional guidance), wherein the synthesized image depicts an object from the sketch input based on the fidelity parameter (Cheng- Figure 2 shows Conditional denoising process . At each time-step t, our proposed pipeline first performs classifier-free guidance with csketch and cstroke [sketch input], which are extracted from a single input of colorful drawing ccomb, and then controls the fidelity/realism by refining xt−1 [object from the sketch input] with the input ccomb, in which such realism control is realized by iterative latent variable refinement; page 4057, right column, section 3.3. Realism Control, at least discloses it is essential to provide the control over how faithful the output should be to the inputs. In other words, how realistic the output should be. We then provide realism control in addition to the two-dimensional classifier-free guidance with sketch and stroke information […] The proposed realism control allows additional trade-off between consistency to the provided strokes/sketches and the distance to target data distribution (i.e. real images) […] Given a realism scale srealism ∼ [0, 1] as an indication of the transformed size N and a reference image combining sketch and stroke information ccomb of size m∗m, the realism adjustment during the conditioning generative process at timestep t can be expressed […] The overall three-dimensional control of our proposed framework is illustrated in Figure 2, in which it is realized via the combination of the sketch- and stroke-guidance with the realism control; page 4059, left column, section 4.1. Qualitative Evaluation- Adaptively-realistic image generation from sketch and stroke, at least discloses the qualitative comparisons between the proposed DiSS and other methods in Figure 3. Compared to the other frameworks, the proposed DiSS approach produces more realistic results on the object-level (cats and flowers) and scene-level (landscapes) datasets).
Cheng does not explicitly disclose the control network comprises a trainable copy of a layer of the image generation model; takes the fidelity parameter as input in a form of a class label indicating the level of adherence of the sketch guidance information to the sketch input.
However, Westcott discloses
the control network comprises a trainable copy of a layer of the image generation model (Westcott- ¶0160, at least discloses A ControlNet is a is a neural network that allows for fine-tuning pre-trained diffusion models, such as the LoRA-tuned diffusion models 1127 a, 1127 b, to achieve more control over the image generation process; Fig. 12 and ¶0164, at least disclose The ControlNet 1128 a, 1128 b or other control neural network within each composite neural network 1124 a, 1124 b preferably includes a trainable copy of one or more layers of the artificial neural network implementing the LoRA-tuned diffusion model 1127 a, 1127 b within such composite neural network 1124 a, 1124 b).
Westcott further discloses
takes the fidelity parameter as input (Westcott- ¶0051, at least discloses One principal benefit of this approach is that it is learned how to invert a process (p(y|x)) but balance that progress with the prior (p(x)), which enables learning from experience and provides improved realism (or improved adherence to a desired style); ¶0147-0148, at least disclose The goal of that processing was higher fidelity and faster inference time. However, the warping of input imagery may also serve an additional purpose. This is particularly useful when an outer autoencoder is used (as is done with Stable Diffusion), as that can struggle to faithfully reproduce hands and faces when they do not occupy enough of the frame. Using a warping function, we may devote more pixels to important areas (e.g., hands and face) at the expense of less-important features […] While this may improve the fidelity of the parts the photographer/videographer “cares” about, this may also allow a smaller size image to be denoised with the same quality, thus improving storage size and training/inference time. It is likely that use of LoRA customization on a distorted frame (distorted prior to VAE encoder) will produce better results; ¶0214, at least discloses By dynamically determining a mask for the subject (e.g., vehicle as in FIG. 18A), feathering the edges of the transition region, and then combining with the per-pixel grayscale perceptual quality map (as in FIG. 22C), we may selectively improve the image while maintaining the fidelity of the original subject […] As mentioned, the subject itself may also be improved, but the intent of the graphic is to show that unlike many other diffusion methods (e.g., image-to-image), we maintain the fidelity of the specialized subject even without fine-tuning in the quality assurance pass);
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng to incorporate the teachings of Westcott, and apply the ControlNet with trainable copy of one or more layers of the artificial neural network into the Cheng’s teachings in order the control network comprises a trainable copy of a layer of the image generation model and takes the fidelity parameter as input.
Doing so would provide fine-grained image control modifications, quality assurance operations, and branding assurance operations to form a final personalized digital image advertisement.
The prior art does not explicitly disclose, but Voynov discloses
input in a form of a class label indicating the level of adherence of the sketch guidance information to the sketch input (Voynov- Figure 4 shows Sketch-to-Image Translation. For low starting-𝑡 values, the system struggles to add colors and texture to the model, while for high starting-𝑡 values the fidelity to the input sketch significantly decreases. Text-prompt used to condition the model:“A photograph of a bike made of wood"; section 4.2 Comparisons, paragraph 2, at least discloses as can be seen in Figure 4, the model expects that the guiding image lays in the RGB domain, hence, resulting in unnatural, black and white images that follow the input sketch (text-prompt condition used: “A photograph of a bike made of wood"). For low values of 𝑡 , the system struggles to add texture to the model, and when 𝑡 is increased, the fidelity to the input sketch significantly decreases [Wingdings font/0xE0] suggests parameters that control the fidelity to the input sketch; Figure 16 shows Spatial Label Map. (a) Images generated with the class guidance map equal to "day" and "night" with the prompt: "a photograph of an old city". (b) Image generation with a spatially varying soft label map. The right side of the image contains a bright sun, and there are stars and a black sky on the left side; section 5.2 Spatial Labels Guidance, at least discloses when the guidance is performed with a constant 1 or 0 spatial labeling map, the produced images are either attributed to day or night (Figure 16, left). When the labeling map is formed by the interpolation between labels probabilities, the generated image also interpolates the scene between night and day (Figure 16, right) [Wingdings font/0xE0] suggests images generated according to day and night with a prompt).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Westcott to incorporate the teachings of Voynov, and apply the parameters that control the fidelity and the labels into the Cheng/Westcott’s teachings in order the control network comprises a trainable copy of a layer of the image generation model and takes the fidelity parameter as input in a form of a class label indicating the level of adherence of the sketch guidance information to the sketch input.
Doing so would apply to a rich variety of sketch styles from diverse domains.
Regarding claim 2, Cheng in view of Westcott and Voynov, discloses the method of claim 1, and further discloses wherein obtaining the sketch input (see Claim 1 rejection for detailed analysis) comprises:
providing a sketch element in a user interface (Cheng- Figure 5 shows (a) By drawing the new contour or color on an existing image, the proposed model enables the mask-free image editing; page 4059, left column, section Compared methods, at least discloses the black sketches and the colored stroke to form the drawing image, which is considered to belong to the source domain […] SSS2IS [14] is a self-supervised GAN-based scheme that takes as input a black sketch and a style image; Westcott- ¶0100, at least discloses the processor elements 820 are operatively coupled to a touch-sensitive 2D/volumetric display 804 configured to present a user interface 208); and
receiving the sketch input via the sketch element (Cheng- Figure 5 shows (a) By drawing the new contour or color on an existing image, the proposed model enables the mask-free image editing; page 4059, left column, section Compared methods, at least discloses the black sketches and the colored stroke to form the drawing image, which is considered to belong to the source domain […] SSS2IS [14] is a self-supervised GAN-based scheme that takes as input a black sketch and a style image).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng to incorporate the teachings of Westcott, and apply the user interface into the Cheng’s teachings for providing a sketch element in a user interface.
The same motivation that was utilized in the rejection of claim 1 applies equally to this claim.
Regarding claim 3, Cheng in view of Westcott and Voynov, discloses the method of claim 1, and further discloses wherein obtaining the value of the fidelity parameter (see Claim 1 rejection for detailed analysis) comprises:
providing a fidelity parameter selection element in a user interface (Cheng- Figure 7 shows Trade-off between realism and consistency to image guidance. We demonstrate the trade-off between the image realism and the correspondence to the input guidance, where the realism scale is varied from low (0.0, right) to high (1.0, left). We also show the LPIPS scores between the generated image and the input guidance. Both the object-level (a cat drawing‡) and scene-level (a landscape painting§) input guidance images are used in this experiment; Westcott- ¶0100, at least discloses the processor elements 820 are operatively coupled to a touch-sensitive 2D/volumetric display 804 configured to present a user interface 208); and
receiving the value of the fidelity parameter via the fidelity parameter selection element (Cheng- Figure 7 shows Trade-off between realism and consistency to image guidance. We demonstrate the trade-off between the image realism and the correspondence to the input guidance, where the realism scale is varied from low (0.0, right) to high (1.0, left). We also show the LPIPS scores between the generated image and the input guidance. Both the object-level (a cat drawing‡) and scene-level (a landscape painting§) input guidance images are used in this experiment; page 4060, right column, section 4.2. Quantitative Evaluation - Image quality and correspondence to input sketch, at least discloses We use the Fr´echet Inception Distance (FID) [7] to measure the realism of the generated images. To evaluate whether the synthesized images correspond to the input sketch, we compute the Learned Perceptual Image Patch Similarity (LPIPS) [24] score on the sketch level. Specifically, we calculate the similarity between the input sketch and the sketch inferred from the generated image (via Photosketching)).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Voynov to incorporate the teachings of Westcott, and apply the user interface into the Cheng/Voynov’s teachings for providing a fidelity parameter selection element in a user interface.
The same motivation that was utilized in the rejection of claim 1 applies equally to this claim.
Regarding claim 6, Cheng in view of Westcott and Voynov, discloses the method of claim 1, and discloses the method further comprising:
obtaining a text prompt, wherein the synthesized image is generated based on the text prompt (Voynov- Fig. 1 shows Given a sketch and a text-prompt, our method uses the sketch to guide a pretrained text-to-image diffusion model during inference time. The method allows producing diverse results that correspond to the text-prompt and follow the spatial layout of the sketch; Abstract, at least discloses Text-to-Image models have introduced a remarkable leap in the evolution of machine learning, demonstrating high-quality synthesis of images from a given text-prompt. However, these powerful pretrained models still lack control handles that can guide spatial properties of the synthesized images; section 1 INTRODUCTION, 4th paragraph, at least discloses method can accept free-hand sketches inputs, as in Figure 1, and generate diverse results that correspond to the text-prompt and follow the spatial layout of the sketch; Fig. 13 and section 4 EXPERIMENTS, at least disclose Figure 13 shows a gallery of results which demonstrate the ability of our framework to convert sketches to images with an input text-prompt; Fig. 4 shows Applying SDEdit [Meng et al. 2021] for Sketch-to-Image Translation. For low starting-𝑡 values, the system struggles to add colors and texture to the model, while for high starting-𝑡 values the fidelity to the input sketch significantly decreases. Text-prompt used to condition the model: “A photograph of a bike made of wood"; section 4.2 Comparisons, at least discloses the model expects that the guiding image lays in the RGB domain, hence, resulting in unnatural, black and white images that follow the input sketch (text-prompt condition used: “A photograph of a bike made of wood"); section 3 METHOD, 2nd paragraph, at least discloses The key idea of our method is to guide the inference process of a pretrained text-to-image diffusion model with an edge predictor that operates on the internal activations of the core network of the diffusion model, encouraging the edge of the synthesized image to follow a reference sketch; section 3.2 Sketch-Guided Text-to-Image Synthesis, right column, at least discloses Once being synthesized with the guidance from the objective L, the model produces a natural image aligned with the desired sketch).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Westcott to incorporate the teachings of Voynov, and apply the text-prompt into the Cheng/Westcott’s teachings for providing obtaining a text prompt, wherein the synthesized image is generated based on the text prompt.
The same motivation that was utilized in the rejection of claim 1 applies equally to this claim.
5. Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Westcott, further in view of Voynov, still further in view of Lu et al., (“Lu”) [US-2021/0158494-A1]
Regarding claim 4, Cheng in view of Westcott and Voynov, discloses the method of claim 1, and discloses the method further comprising:
receiving an edit to the sketch input (Cheng- Figure 5 shows (a) By drawing the new contour or color on an existing image, the proposed model enables the mask-free image editing. (b) With the partial colored stoke as the input, the proposed method synthesizes more diverse contents in the non-colored region. Here we use a cat contour† and hand-drawing flowers as examples; page 4060, left column, section Applications., at least discloses two interesting applications: multi-conditioned local editing and region-sensitive stroke-to-image generation […] We provide the details in the supplementary document. First, we present the visual editing results in Figure 5 (a). Our model enables flexible local manipulation on an existing image, which refers to both the hand-drawn contour and colored strokes. Secondly, we demonstrate the region-sensitive stroke-to-image generation results in Figure 5 (b). The proposed approach can take partial-sketch as input and produces results that 1) match the appearance in the region of partial-sketch and 2) exhibit multiple plausible contents in the non-colored region);
The prior art does not explicitly disclose but Lu discloses.
modifying the sketch input based on the edit to obtain a modified sketch input (Lu- ¶0015, at least discloses a user can provide a partial sketch or modify a sketch and, in real-time, preview generated electronic paintings and/or how modifications affect a generated painting; ¶0021, at least discloses such training allows for adding, filling in, and/or correcting any missing or inaccurate details in an input sketch. For example, such training allows the system to learn to add a missing eye or eyebrows and/or modify cartoonish or amateur sketches to correct proportions such as the size of eyes in relation to facial size; ¶0024-0026, at least disclose a digital tool include, but are not limited to, content creation tool, content editing tool […] any other tool that can be used for creating, editing, managing, generating, tracking, consuming or performing any other function or workflow related to content. A digital tool includes the creative apparatus 108 […] Examples of the digital experience include content creating, content editing […] any combination of these experiences, or any other workflow or function that can be performed related to content; ¶0042, at least discloses the user can create an image and can request for some expert opinion or expert editing. An expert user can then either edit the image as per the user liking or can provide expert opinion. The editing and providing of the expert opinion by the expert is enabled using the community engine 164 and the synchronization engine 132; ¶0055, at least discloses the image neural network can be modified or adjusted based on the comparison such that the quality of subsequently generated intermediate images increases. Such training helps to maintain features of an input sketch during the sketch to painting conversion; ¶0089, at least discloses the trained neural network system is capable of modifying input sketches to increase their realism. For instance, if features are not in proportion to each other in the input sketch, the neural network system can correct the features to reflect more realistic proportions. However, if a user does not like the generated outcome, the user can make modifications to the input sketch. Given that an input sketch of resolution 256×256 takes 20 ms for such a trained network to transform into an intermediate image, a user can make modifications to compensate for unexpected results from the system or make modifications so the output more closely reflects a desired result. Modifications can also include modifying guided colorization of an input sketch. To change and/or add suggested colors, a user can add/modify colored strokes, or scribbles, on regions of the sketch); and
generating, using the image generation model, a modified image based on the modified sketch input (Lu- ¶0015, at least discloses a user can provide a partial sketch or modify a sketch and, in real-time, preview generated electronic paintings and/or how modifications affect a generated painting; ¶0042, at least discloses the user can create an image and can request for some expert opinion or expert editing. An expert user can then either edit the image as per the user liking or can provide expert opinion. The editing and providing of the expert opinion by the expert is enabled using the community engine 164 and the synchronization engine 132; ¶0055, at least discloses the image neural network can be modified or adjusted based on the comparison such that the quality of subsequently generated intermediate images increases. Such training helps to maintain features of an input sketch during the sketch to painting conversion; ¶0089, at least discloses the trained neural network system is capable of modifying input sketches to increase their realism. For instance, if features are not in proportion to each other in the input sketch, the neural network system can correct the features to reflect more realistic proportions. However, if a user does not like the generated outcome, the user can make modifications to the input sketch. Given that an input sketch of resolution 256×256 takes 20 ms for such a trained network to transform into an intermediate image, a user can make modifications to compensate for unexpected results from the system or make modifications so the output more closely reflects a desired result. Modifications can also include modifying guided colorization of an input sketch. To change and/or add suggested colors, a user can add/modify colored strokes, or scribbles, on regions of the sketch).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Westcott/Voynov to incorporate the teachings of Lu, and apply modifying and editing a sketch into the Cheng/Westcott/Voynov’s teachings for modifying the sketch input based on the edit to obtain a modified sketch input; and generating, using the image generation model, a modified image based on the modified sketch input.
Doing so would generate an electronic painting from a sketch, where the painting accurately reflects features of the sketch in a designated painting style.
Regarding claim 5, Cheng in view of Westcott, Voynov and Lu, discloses the method of claim 4, and discloses the method further comprising:
displaying a preview of the synthesized image, wherein the edit is received in response to the preview (Lu- ¶0014, at least discloses a user can provide a partial sketch or modify a sketch and, in real-time, preview generated electronic paintings and/or how modifications affect a generated painting; ¶0052, at least discloses a user may provide or input a painting having a desired style. Based on the input sketch and the painting style preference, a painting can be generated and provided to the user via the user device 202 a. In this regard, the painting can be displayed via a display screen of the user device).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Westcott/Voynov to incorporate the teachings of Lu, and apply preview generated electronic paintings and displayed via a display screen of the user device into the Cheng/Westcott/Voynov’s teachings for displaying a preview of the synthesized image, wherein the edit is received in response to the preview.
The same motivation that was utilized in the rejection of claim 4 applies equally to this claim.
6. Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Westcott, further in view of Voynov, still further in view of Yokokawa et al., (“Yokokawa”) [US-2009/0214121-A1]
Regarding claim 7, Cheng in view of Westcott and Voynov, discloses the method of claim 1, and further discloses wherein:
the image generation model is trained using training data corresponding to the fidelity parameter (Cheng- Figure 2 shows Conditional denoising process . At each time-step t, our proposed pipeline first performs classifier-free guidance with csketch and cstroke, which are extracted from a single input of colorful drawing ccomb, and then controls the fidelity/realism [fidelity parameter] by refining xt−1 with the input ccomb, in which such realism control is realized by iterative latent variable refinement; page 4055, right column, 2nd paragraph, at least discloses a unified framework of adaptively-realistic image generation [image generation model] from stroke and sketch that encodes the condition of the given stroke and sketch with the classifier-free guidance mechanism and adjusts the degree of realism with a latent variable refinement technique; page 4056, right column, section 3. Method, at least discloses method, starting from the preliminaries for diffusion models (Section 3.1) […] and the control over realism; page 4057, right column, second paragraph, at least discloses a two-stage training strategy; page 4057, right column, section 3.3. Realism Control, at least discloses it is essential to provide the control over how faithful the output should be to the inputs. In other words, how realistic the output should be. We then provide realism control in addition to the two-dimensional classifier-free guidance with sketch and stroke information […] The proposed realism control allows additional trade-off between consistency to the provided strokes/sketches and the distance to target data distribution (i.e. real images) […] Given a realism scale srealism ∼ [0, 1] as an indication of the transformed size N and a reference image combining sketch and stroke information ccomb of size m∗m, the realism adjustment during the conditioning generative process at timestep t can be expressed […] The overall three-dimensional control of our proposed framework is illustrated in Figure 2, in which it is realized via the combination of the sketch- and stroke-guidance with the realism control; page 4059, left column, section 4.1. Qualitative Evaluation- Adaptively-realistic image generation from sketch and stroke, at least discloses the qualitative comparisons between the proposed DiSS and other methods in Figure 3. Compared to the other frameworks, the proposed DiSS approach produces more realistic results on the object-level (cats and flowers) and scene-level (landscapes) datasets)).
The prior art does not explicitly disclose training data having a distortion level corresponding to the fidelity parameter.
However, Yokokawa discloses
a distortion level corresponding to the fidelity parameter (Yokokawa- ¶0007, at least discloses the blur degree is detected by analyzing the extracted edge point, depending on the edge amount included in the image, a fluctuation in a detection accuracy for the blur degree is generated. For example, regarding the image with the small edge amount which includes not much texture, it is difficult to extract the sufficient amount of the edge point, and as a result, there is a tendency that the detection accuracy for the blur degree is decreased; ¶0074, at least discloses in order to improve the detection accuracy for the blur degree, regarding the input image of the low dynamic range, the edge point is also extracted from the block where the edge intensity is weak so that the sufficient amounts of the edge points to set the detection accuracy for the blur degree of the input image equal to or higher than the certain level can be ensured, and regarding the input image of the high dynamic range, the edge point is extracted from the block where the edge intensity is strong as much as possible so that the edge point constituting the stronger edge can be extracted).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Westcott/Voynov to incorporate the teachings of Yokokawa, and apply the detection accuracy for the blur degree into the Cheng/Westcott/Voynov’s teachings in order the image generation model is trained using training data having a distortion level corresponding to the fidelity parameter.
Doing so would enable the detection for the blur state of the image at a higher accuracy.
7. Claim 8-9, 11-12 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over “Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model” by Cheng et al., (“Cheng”) in view of Lu et al., (“Lu”) [US-2021/0158494-A1], further in view of “DeepPortraitDrawing: Generating human body images from freehand sketches” by Wu et al., (“Wu”), still further in view of Yokokawa et al., (“Yokokawa”) [US-2009/0214121-A1]
Regarding claim 8, Cheng discloses a method (Cheng- Figure 1 shows Three-dimension controls of image generation from stroke and sketch. (left) Our proposed model is able provide three-dimension controls over image synthesis from stroke and sketch. Given sketch and stroke as input, we can control the scales of faithfulness for the synthesized output with respect to the sketch and stroke, as well as the degree of its realism; Abstract, at least discloses a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models) comprising:
initializing an image generation model (Cheng- Figure 1 shows Three-dimension controls of image generation from stroke and sketch. (left) Our proposed model is able provide three-dimension controls over image synthesis from stroke and sketch. Given sketch and stroke as input, we can control the scales of faithfulness for the synthesized output with respect to the sketch and stroke, as well as the degree of its realism; Abstract, at least discloses a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models; page 4056, right column, section 3. Method, at least discloses aims to perform image generation conditioned on the input of stroke and sketches with three-dimensional control over the faithfulness to the conditions and the realism of the synthesized output. In the following we sequentially describe our proposed method, starting from the preliminaries for diffusion models (Section 3.1) and the modifications we make for realizing the conditional generation and the discussion for the sketch and stroke guidance (enabled by the technique of classifier-free diffusion guidance, Section 3.2), and the control over realism (achieved by the technique of iterative latent variable refinement, Section 3.3));
obtaining a training set including an image, a sketch input corresponding to the image (Cheng- Figure 1 shows Three-dimension controls of image generation from stroke and sketch. (left) Our proposed model is able provide three-dimension controls over image synthesis from stroke and sketch. Given sketch and stroke as input, we can control the scales of faithfulness for the synthesized output with respect to the sketch and stroke, as well as the degree of its realism [value of a fidelity parameter]. (right) (a) Given sketch and strokes, we perform sketch/stroke-to-image translation. (b) We generate multimodal results with partial sketch/strokes as input; Figure 3 shows Qualitative comparisons. We present results from different approaches on the (top two rows) AFHQ, (middle two rows) Oxford Flower, and (bottom two rows) Landscapes datasets. U-GAT-IT [11], as an image-to-image translation method, takes as input the combination of sketches and strokes (third column). SDEdit [17], SSS2IS [14] and our model take the contour and color as separate inputs (the leftmost two columns); page 4055, section 1. Introduction, left column, last paragraph, at least discloses Users can decide to what extent the faithfulness should be to the input sketch [sketch input] and strokes, and to what degree the results are close to real images; page 4057, right column, second paragraph, at least discloses a two-stage training strategy; page 4058, section Datasets., right column, last paragraph, at least discloses We conduct experiments using the AFHQ [4], Landscapes [22] and Oxford Flower [19] datasets. We use Photo-sketching [13] to generate the black sketches, and the stylized neural painting [27] as well as the paint transformer [16] model to synthesize the colored strokes for all the datasets [training set]),
; and
training, using the training set, the image generation model to generate images based on the sketch input and a fidelity parameter sketch and cstroke [sketch input], which are extracted from a single input of colorful drawing ccomb, and then controls the fidelity/realism [fidelity parameter] by refining xt−1 with the input ccomb, in which such realism control is realized by iterative latent variable refinement [generate images based on the sketch input and a fidelity parameter]; page 4055, right column, 2nd paragraph, at least discloses a unified framework of adaptively-realistic image generation [image generation model] from stroke and sketch that encodes the condition of the given stroke and sketch with the classifier-free guidance mechanism and adjusts the degree of realism with a latent variable refinement technique. The proposed framework enables a three-dimensional control over image synthesis with flexibility and controllability over shape, color, and realism of the generation, given the input stroke and sketch; page 4056, right column, section 3. Method, at least discloses method, starting from the preliminaries for diffusion models (Section 3.1) […] and the control over realism; page 4057, right column, second paragraph, at least discloses a two-stage training strategy; page 4057, right column, section 3.3. Realism Control, at least discloses it is essential to provide the control over how faithful the output should be to the inputs. In other words, how realistic the output should be. We then provide realism control in addition to the two-dimensional classifier-free guidance with sketch and stroke information […] The proposed realism control allows additional trade-off between consistency to the provided strokes/sketches and the distance to target data distribution (i.e. real images) […] Given a realism scale srealism ∼ [0, 1] as an indication of the transformed size N and a reference image combining sketch and stroke information ccomb of size m∗m, the realism adjustment during the conditioning generative process at timestep t can be expressed […] The overall three-dimensional control of our proposed framework is illustrated in Figure 2, in which it is realized via the combination of the sketch- and stroke-guidance with the realism control; page 4059, left column, section 4.1. Qualitative Evaluation- Adaptively-realistic image generation from sketch and stroke, at least discloses the qualitative comparisons between the proposed DiSS and other methods in Figure 3. Compared to the other frameworks, the proposed DiSS approach produces more realistic results on the object-level (cats and flowers) and scene-level (landscapes) datasets)).
Cheng does not explicitly disclose a distortion level of the sketch input; wherein the sketch input is generated by performing edge detection on the image to obtain an edge map and applying a random distortion to the edge map to obtain the sketch input, wherein the random distortion comprises a warping or an affine transformation, and wherein the distortion level indicates a level of the random distortion applied to the edge map; and generate images based on the sketch input and a fidelity parameter corresponding to the distortion level.
However, Lu discloses
a distortion level of the sketch input (Lu- ¶0072, at least discloses Adjusting the neural network to correct for errors is accomplished by changing at least one node parameter of such an image neural network. The image neural network can comprise a plurality of interconnected nodes with a parameter, or weight, associate with each node. While individual parameters do not have to be specified during training of a neural network, examples of such parameters can include edge detection, RGB color, textures of features, roughness and/or blur of a sketch. Each node receives inputs from multiple other nodes and can activate based on the combination of all these inputs, for example, when the sum of the input signals is above a threshold. The parameter can amplify or dampen the input signals. For example, a parameter could be a value between 0 and 1 [Wingdings font/0xE0] “parameters” can include “blur of a sketch” would suggest a distortion level of the sketch input);
the sketch input corresponding to the distortion level (Lu- ¶0055, at least discloses an input sketch generally refers to a sketch provided to the neural network system, or portion thereof. Input sketches used to train the image neural network may be referred to herein as training sketches or training input sketches; ¶0071-0072, at least disclose The image neural network can then be trained by evaluating differences between the reference image used to create the training sketch [sketch input] and the training intermediate image to determine any errors or discrepancies therebetween […] Adjusting the neural network to correct for errors is accomplished by changing at least one node parameter of such an image neural network. The image neural network can comprise a plurality of interconnected nodes with a parameter, or weight, associate with each node. While individual parameters do not have to be specified during training of a neural network, examples of such parameters can include edge detection, RGB color, textures of features, roughness and/or blur of a sketch. Each node receives inputs from multiple other nodes and can activate based on the combination of all these inputs, for example, when the sum of the input signals is above a threshold. The parameter can amplify or dampen the input signals. For example, a parameter could be a value between 0 and 1 [Wingdings font/0xE0] “parameters” can include “blur of a sketch” would suggest the sketch input corresponding to the distortion level).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng to incorporate the teachings of Lu, and apply the parameters can include blur of a sketch into the image generation model, as taught by Cheng for obtaining a training set including an image, a sketch input corresponding to the image, and a distortion level of the sketch input; and training, using the training set, the image generation model to generate images based on the sketch input and a fidelity parameter corresponding to the distortion level.
Doing so would generate an electronic painting from a sketch, where the painting accurately reflects features of the sketch in a designated painting style.
The prior art does not explicitly disclose, but Wu discloses
performing edge detection on the image to obtain an edge map (Wu- page 77, right column, section 4.1. Data preparation, at least discloses apply the edge detection method proposed by Im2Pencil to get an edge map for each human image) and applying a random distortion to the edge map to obtain the sketch input (Wu- page 76, right column, 4th paragraph, at least discloses apply random affine transformations to all part edge maps {Sc } and part heatmaps {Hc } in the training set, except for a selected reference part […] The pose network P needs to predict all part heatmaps {Hˆ c } from each randomly transformed edge map Sˆ), wherein the random distortion comprises a warping or an affine transformation (Wu- page 76, right column, 4th paragraph, at least discloses apply random affine transformations to all part edge maps {Sc } and part heatmaps {Hc } in the training set, except for a selected reference part […] The pose network P needs to predict all part heatmaps {Hˆ c } from each randomly transformed edge map Sˆ; page 77, left column, 3rd paragraph, at least discloses The spatial transformer network Tj+1 in the (j+1)-th step is fed with the transformed edge map ˆSj and the combined heatmaps Hˆ j in the jth step […] where F represents an affine transformation operation and I denotes the identity matrix; page 78, left column, section 4.2. Implementation details, at least discloses To train the structure refinement module, we preprocess the training set by applying random affine transformations, which are composed of translation, rotation, resizing, and shearing transformations […] the last two fully-connected layers to predict the affine transformation matrices for all body parts).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Lu to incorporate the teachings of Wu, and apply the affine transformations and edge map being detected into the Cheng/Lu’s teachings in order the sketch input is generated by performing edge detection on the image to obtain an edge map and applying a random distortion to the edge map to obtain the sketch input, wherein the random distortion comprises a warping or an affine transformation.
Doing so would provide a novel deep generative approach for generating realistic human images from coarse, rough freehand sketches
The prior art does not explicitly disclose, but Yokokawa discloses
the distortion level indicates a level of the distortion applied to the edge map (Yokokawa- ¶0007, at least discloses the blur degree is detected by analyzing the extracted edge point, depending on the edge amount included in the image, a fluctuation in a detection accuracy for the blur degree is generated […] regarding the image with the small edge amount which includes not much texture, it is difficult to extract the sufficient amount of the edge point, and as a result, there is a tendency that the detection accuracy for the blur degree is decreased; Fig. 2 and ¶0039, at least disclose In step S1, the edge map creation unit 11 creates the edge map; Fig. 2 and ¶0075, at least disclose In step S14, the edge analysis unit 17 performs an edge analysis. To be more specific, the extraction amount determination unit 16 supplies the edge reference value at the moment when it is determined that the edge point extraction amount is appropriate and the edge point tables 1 to 3 to the edge analysis unit 17; Fig. 2 and ¶0087-0090, at least disclose The edge analysis unit 17 supplies information representing the calculated Nsmal1blur and Nlarge1blur to the blur degree detection unit 18. In step S15, the blur degree detection unit 18 detects a blur degree BlurEstimation which is an index for the blur state of the input image […] The blur degree detection unit 18 outputs the detected blur degree BlurEstimation to the outside, and the blur degree detection processing is ended. For example, the external apparatus determines whether or not the input image is blurred by comparing the blur degree BlurEstimation with a predetermined threshold);
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Lu/Wu to incorporate the teachings of Yokokawa, and apply the detection accuracy for the blur degree into the Cheng/Lu/Wu’s teachings in order the distortion level indicates a level of the random distortion applied to the edge map.
Doing so would enable the detection for the blur state of the image at a higher accuracy.
Regarding claim 9, Cheng in view of Lu, Wu and Yokokawa, discloses the method of claim 8, and further discloses wherein obtaining the training set (see Claim 8 rejection for detailed analysis) comprises:
generating a preliminary sketch input based on the image (Cheng- page 4056, right column, section 3. Method, at least discloses starting from the preliminaries for diffusion models (Section 3.1); page 4057, right column, 2nd paragraph, at least discloses In practice, we adopt a two-stage training strategy. First, we train the model with complete sketches and strokes as conditions. Then we fine-tune the model by randomly replacing 30% of each condition with an image filled with gray pixels, denoted as ∅, for unconditional representation); and
distorting the preliminary sketch input based on the distortion level to obtain the sketch input (Cheng- page 4057, right column, 2nd paragraph, at least discloses In practice, we adopt a two-stage training strategy. First, we train the model with complete sketches and strokes as conditions; Lu- ¶0055, at least discloses an input sketch generally refers to a sketch provided to the neural network system, or portion thereof. Input sketches used to train the image neural network may be referred to herein as training sketches or training input sketches; ¶0071-0072, at least disclose The image neural network can then be trained by evaluating differences between the reference image used to create the training sketch [sketch input] and the training intermediate image to determine any errors or discrepancies therebetween […] Adjusting the neural network to correct for errors is accomplished by changing at least one node parameter of such an image neural network. The image neural network can comprise a plurality of interconnected nodes with a parameter, or weight, associate with each node. While individual parameters do not have to be specified during training of a neural network, examples of such parameters can include edge detection, RGB color, textures of features, roughness and/or blur of a sketch. Each node receives inputs from multiple other nodes and can activate based on the combination of all these inputs, for example, when the sum of the input signals is above a threshold. The parameter can amplify or dampen the input signals. For example, a parameter could be a value between 0 and 1 [Wingdings font/0xE0] “parameters” can include “blur of a sketch” would suggest the distortion level).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Wu/Yokokawa to incorporate the teachings of Lu, and apply the parameters can include blur of a sketch into the image generation model, as taught by Cheng/Wu/Yokokawa for obtaining a training set including an image, a sketch input corresponding to the image, and a distortion level of the sketch input; and training, using the training set, the image generation model to generate images based on the sketch input and a fidelity parameter corresponding to the distortion level.
The same motivation that was utilized in the rejection of claim 8 applies equally to this claim.
Regarding claim 11, Cheng in view of Lu, Wu and Yokokawa, discloses the method of claim 9, and further discloses wherein distorting the preliminary sketch input (see Claim 9 rejection for detailed analysis) comprises:
obtaining a plurality of transformation parameters based on the distortion level (Lu- ¶0014-0015, at least disclose The produced electronic painting, however, is oftentimes unsatisfactory to the user, as the resulting painting does not maintain features of the grayscale sketch (e.g., when the image is a portrait of a person, one or both eyes, the mouth, and/or the nose of the person can be distorted, obscured, and/or inaccurate; when the image is a landscape, one or more trees and/or a horizon line from the landscape can be distorted, obscured, and/or inaccurate; when the image is a bedroom scene, a bed, a lamp, and/or a corner of a room can be distorted, obscured, and/or inaccurate) and/or does not reflect a desired painting style […] facilitating efficient and effective sketch to painting transformations. In this regard, a sketch generated or provided by a user (e.g., a hand-drawn sketch) can be transformed or converted into an electronic painting in accordance with a desired painting style […] sketch to painting transformations can occur in real-time without a completed sketch being input for transformation. To this end, a user can provide a partial sketch or modify a sketch and, in real-time, preview generated electronic paintings and/or how modifications affect a generated painting); and
performing affine transformation on the preliminary sketch input based on the plurality of transformation parameters to obtain the sketch input (Wu- Figure 3 shows Illustration of the structure refinement module. The keypoints of individual body parts (e.g., the arms and shoulders) are better connected and their relative length is globally more consistent after this step; page 76, right column, 2nd paragraph, at least discloses As illustrated in Fig. 3, we first utilize a pose estimation network P to predict heatmaps Hc for the position of each keypoints from each refined part sketch map ˙Sc . Note that we need to predict the same joint repeatedly for neighboring body parts. Then, we leverage all the part heatmaps {Hc } as guidance to recover the global structure of the sketched human body. The different body parts should preserve proper relative lengths, and connect with each other based on the inherent relationships among them. To achieve this, we apply affine transformations to the body parts predicted by a spatial transformer network [60] T , so that the part heatmaps {Hc } are transformed to reasonable locations {H˜ c } learned from real human poses. We apply the same predicted affine transformations to the refined part sketch maps {˙Sc } and the part mask maps {M˙ c }, resulting in {S˜c } and {M˜ c }, respectively).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Lu/Yokokawa to incorporate the teachings of Wu, and apply the affine transformations into the Cheng/Lu/Peterson’s teachings for performing affine transformation on the preliminary sketch input based on the plurality of transformation parameters to obtain the sketch input.
The same motivation that was utilized in the rejection of claim 9 applies equally to this claim.
Regarding claim 12, Cheng in view of Lu, Wu and Yokokawa, discloses the method of claim 8, and further discloses wherein obtaining the training set (see Claim 8 rejection for detailed analysis) comprises:
performing edge detection on the image to obtain the sketch input (Lu- ¶0071-0072, at least disclose The image neural network can then be trained by evaluating differences between the reference image used to create the training sketch and the training intermediate image to determine any errors or discrepancies therebetween […] While individual parameters do not have to be specified during training of a neural network, examples of such parameters can include edge detection, RGB color, textures of features, roughness and/or blur of a sketch; ¶0077, at least discloses Total variation loss compares the edges of the training intermediate image with the edges of the reference image. Correcting for total variation loss can improve contrast and sharpness in intermediate images).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Wu/Yokokawa to incorporate the teachings of Lu, and apply performing edge detection on the image into Cheng/Wu/Yokokawa’s teachings for performing edge detection on the image to obtain the sketch input.
The same motivation that was utilized in the rejection of claim 8 applies equally to this claim.
Regarding claim 14, Cheng in view of Lu, Wu and Yokokawa, discloses the method of claim 8, and further discloses wherein training the image generation model (see Claim 8 rejection for detailed analysis) comprises:
fixing parameters of an image generator of the image generation model (Lu- ¶0072, at least discloses Adjusting the neural network to correct for errors is accomplished by changing at least one node parameter of such an image neural network. The image neural network can comprise a plurality of interconnected nodes with a parameter, or weight, associate with each node. While individual parameters do not have to be specified during training of a neural network, examples of such parameters can include edge detection, RGB color, textures of features, roughness and/or blur of a sketch […] The parameter can amplify or dampen the input signals. For example, a parameter could be a value between 0 and 1.); and
iteratively updating parameters of a control network of the image generation model (Cheng- Figure 2 shows Conditional denoising process . At each time-step t, our proposed pipeline first performs classifier-free guidance with csketch and cstroke, which are extracted from a single input of colorful drawing ccomb, and then controls the fidelity/realism by refining xt−1 with the input ccomb, in which such realism control is realized by iterative latent variable refinement; page 4061, right column, section 5. Conclusion, at least discloses iterative latent variable refinement to offer the three-dimensional control (sketch, colored stroke, realism) over the image generation process; Lu- ¶0058, at least discloses a neural network system comprised of an image neural network and a painting neural network of sketch transformer 206 is iteratively trained using multiple training input sketches to generate training output paintings).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Wu/Yokokawa to incorporate the teachings of Lu, and apply iteratively updating parameters into Cheng/Wu/Yokokawa’s teachings for iteratively updating parameters of a control network of the image generation model.
The same motivation that was utilized in the rejection of claim 8 applies equally to this claim.
Regarding claim 15, Cheng in view of Lu, Wu and Yokokawa, discloses the method of claim 8, and further discloses wherein obtaining the training set (see Claim 8 rejection for detailed analysis) comprises:
generating a plurality of sketch inputs based on the image (Cheng- page 4056, right column, section 3. Method, at least discloses our proposed framework, named DiSS, aims to perform image generation conditioned on the input of stroke and sketches with three-dimensional control over the faithfulness to the conditions and the realism of the synthesized output; page 4057, left column, section 3.2. Sketch- and Stroke-Guided Diffusion Model, at least discloses To generate images based on the given sketches and strokes, our proposed method concatenates the sketch condition csketch and the stroke condition cstroke along with xt as input for the U-Net model […] First, we train the model with complete sketches and strokes as conditions […] During sampling, the ratio between the degree of faithfulness to strokes and sketches is controlled through the following linear combination with two guidance scales ssketch and sstroke; Lu- ¶0068, at least discloses training data can be generated by placing colored strokes or scribbles on top of a training sketch. Alternatively, training data can be generated by extracting a one-channel grayscale image from the training image and combining it with a three-channel image containing colored strokes or scribbles. When generating the colored strokes using the training image, to emulate arbitrary user behaviors, a number of colored strokes of random length and thickness can be sampled from random locations), wherein each of the plurality of sketch inputs is based on a set of stroke attributes corresponding to a different sketch style (Cheng- page 4059, left column, 3rd paragraph, at least discloses SSS2IS [14] is a self-supervised GAN-based scheme that takes as input a black sketch and a style image. We retrain the model by replacing the input style images with a colored stroke image, and computing the regression loss between the real image and the autoencoder output; Lu- ¶0017, at least discloses Sketches to train such a neural network system can be generated using reference images (e.g., where the sketches used to train the system are synthetically generated sketches). Various methods can be used to generate training sketches from such reference images, so that the training sketches reflect different sketch styles and techniques to ensure that the neural network system is capable of recognizing a wide variety of styles and techniques of input sketches upon completion of its training; ¶0066, at least discloses conversion component 306 can use various methods to generate training sketches that reflect different sketch styles and techniques to ensure that the neural network system is capable of recognizing a wide variety of styles and techniques of input sketches upon completion of its training; ¶0068, at least discloses Conversion component 306 can also be used to synthesize rough color to train the system to recognize preferred colors in regions. To accomplish this, training data can be generated by placing colored strokes or scribbles on top of a training sketch; ¶0098, at least discloses At block 602, a sketch, reference style, and a category can be input. For example, a user can input a sketch, reference style, and a category).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Wu/Yokokawa to incorporate the teachings of Lu, and apply training sketches reflect different sketch styles into Cheng/Wu/Yokokawa’s teachings for generating a plurality of sketch inputs based on the image, wherein each of the plurality of sketch inputs is based on a set of stroke attributes corresponding to a different sketch style.
The same motivation that was utilized in the rejection of claim 8 applies equally to this claim.
8. Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Lu, further in view of Wu, still further in view of Yokokawa, still further in view of “WarpGAN: Automatic Caricature Generation” by Shi et al., (“Shi”)
Regarding claim 10, Cheng in view of Lu, Wu and Yokokawa, discloses the method of claim 9, and further discloses wherein distorting the preliminary sketch input (see Claim 9 rejection for detailed analysis), comprises:
the preliminary sketch input based on the distortion level (see Claim 9 rejection for detailed analysis).
The prior art does not explicitly disclose, but Shi discloses
warping the preliminary sketch input based on the distortion (Shi- Figure 1 shows Example photos and caricatures of two subjects in our dataset. Column (a) shows each identity’s real face photo, while two generated caricatures of the same subjects by WarpGAN are shown in column (b) and (c). Caricatures drawn by artists are shown in the column (d) and (e); Figure 3 shows The generator module of WarpGAN. Given a face image, the generator outputs an image with a different texture style and a set of control points along with their displacements. A differentiable module takes the control points and warps the transferred image to generate a caricature; page 10765, left column, section 3.1. Generator, at least discloses The proposed deformable generator in WarpGAN is composed of three sub-networks: a content encoder Ec, a decoder R and a warp controller […] The warp controller estimates the control points and their displacements to warp the rendered images. An overview of the deformable generator is shown in Figure 3).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Lu/Wu/Yokokawa to incorporate the teachings of Shi, and apply generating a caricature with WarpGan into the Cheng/Lu/Wu/Yokokawa’s teachings for warping the preliminary sketch input based on the distortion level.
Doing so would provide automatic caricature generator where users can customize both the texture style and exaggeration degree.
9. Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Cheng in view of Lu, further in view of Wu, still further in view of Yokokawa, still further in view of “Open World Entity Segmentation” by Qi et al., (“Qi”)
Regarding claim 13, Cheng in view of Lu, Wu and Yokokawa, discloses the method of claim 8, and further discloses wherein obtaining the training set and the prior art does not explicitly disclose, but Qi discloses the method comprises:
performing entity segmentation on the image to obtain the sketch input (Qi- Figure 3 shows The entity segmentation framework; page 1, right column, 2nd paragraph, at least discloses a new image segmentation task named Entity Segmentation (ES) which aims to generate class-agnostic segmentation masks of an image; page 3, left column, 2nd paragraph, at least discloses a new perspective on image segmentation by introducing the entity segmentation task that handles dense image segmentation similarly as semantic/instance/panoptic segmentation, but without the classification aspect akin to salient object detection. This task focuses only on class-agnostic segmentation; page 3, right column, section 3 ENTITY SEGMENTATION, at least discloses The task of entity segmentation (ES) is defined as segmenting all visual entities within an image in a class-agnostic maimer. Here, "entity" refers to either a thing (instance) mask or a stuff mask in the common context. This definition is related to the standard and well-accepted definition of "object" which is based on certain abjectness properties introduced by the seminal work [lJ: (a) a well-defined closed boundary in space; (o) a different appearance from their surroundings).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Lu/Wu/Yokokawa to incorporate the teachings of Qi, and apply the entity segmentation task into the Cheng/Lu/Wu/Yokokawa’s teachings for performing entity segmentation on the image to obtain the sketch input.
Doing so would outperform popular class-specific panoptic segmentation models in terms oi segmentation quality.
10. Claim 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over “Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model” by Cheng et al., (“Cheng”) in view of Lu et al., (“Lu”) [US-2021/0158494-A1], further in view of Westcott et al., (“Westcott”) [US-2025/0209700-A1] with the Provisional application No. 63/613,658, filed on Dec. 21, 2023, still further in view of ("Sketch-Guided Text-to-Image Diffusion Models" by Voynov et al., (“Voynov”)
Regarding claim 16, Cheng discloses an apparatus (Cheng- Abstract, at least discloses a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models; page 4055, right column, 2nd paragraph, at least discloses the proposed framework enables a three-dimensional control over image synthesis with flexibility and controllability over shape, color, and realism of the generation, given the input stroke and sketch) comprising:
a machine learning model comprising parameters to obtain a sketch input and a value of a fidelity parameter indicating a level of adherence to the sketch input (Cheng- Figure 1 shows Three-dimension controls of image generation from stroke and sketch. (left) Our proposed model is able provide three-dimension controls over image synthesis from stroke and sketch. Given sketch and stroke as input, we can control the scales of faithfulness for the synthesized output with respect to the sketch and stroke, as well as the degree of its realism [value of a fidelity parameter]. (right) (a) Given sketch and strokes, we perform sketch/stroke-to-image translation. (b) We generate multimodal results with partial sketch/strokes as input; Figure 7 shows the realism scale is varied from low (0.0, right) to high (1.0, left); page 4055, section 1. Introduction, left column, last paragraph, at least discloses Users can decide to what extent the faithfulness should be to the input sketch [sketch input] and strokes, and to what degree the results are close to real images [value of a fidelity parameter indicating a level of adherence]; page 4056, right column, section 3. Method, at least discloses method, starting from the preliminaries for diffusion models (Section 3.1) and the modifications we make for realizing the conditional generation and the discussion for the sketch and stroke guidance (enabled by the technique of classifier-free diffusion guidance, Section 3.2), and the control over realism (achieved by the technique of iterative latent variable refinement, Section 3.3).), wherein the machine learning model comprises a control network trained to encode the sketch input and the value of the fidelity parameter to obtain sketch guidance information (Cheng- page 4055, right column, 2nd paragraph, at least discloses a unified framework of adaptively-realistic image generation from stroke and sketch that encodes the condition of the given stroke and sketch with the classifier-free guidance mechanism [sketch guidance information] and adjusts the degree of realism [value of the fidelity parameter] with a latent variable refinement technique), and wherein the machine learning model further comprises an image generator trained to generate a synthesized image based on the sketch guidance information using training data corresponding to the fidelity parameter (Cheng- page 4055, right column, 2nd paragraph, at least discloses a unified framework of adaptively-realistic image generation from stroke and sketch that encodes the condition of the given stroke and sketch with the classifier-free guidance mechanism [sketch guidance information] and adjusts the degree of realism with a latent variable refinement technique. The proposed framework enables a three-dimensional control over image synthesis with flexibility and controllability over shape, color, and realism of the generation, given the input stroke and sketch. Moreover, our proposed work unleashes several interesting applications: multi-conditioned local editing, region-sensitive stroke-to-image, and multi-domain sketch-to-image; page 4056, right column, section 3. Method, at least discloses method, starting from the preliminaries for diffusion models (Section 3.1) and the modifications we make for realizing the conditional generation and the discussion for the sketch and stroke guidance [sketch guidance] (enabled by the technique of classifier-free diffusion guidance, Section 3.2) […] and the control over realism; page 4057, section 3.2. Sketch- and Stroke-Guided Diffusion Model, at least discloses To generate images based on the given sketches and strokes, our proposed method concatenates the sketch condition csketch and the stroke condition cstroke along with xt as input for the U-Net model […] To separately control the guidance level of the sketch and stroke conditions, we leverage classifier-free guidance [9] and modify it for two-dimensional guidance; page 4059, left column, section 4.1. Qualitative Evaluation- Adaptively-realistic image generation from sketch and stroke, at least discloses the qualitative comparisons between the proposed DiSS and other methods in Figure 3. Compared to the other frameworks, the proposed DiSS approach produces more realistic results on the object-level (cats and flowers) and scene-level (landscapes) datasets)).
Cheng does not explicitly disclose an apparatus comprising: at least one processor; and
at least one memory including instructions executable by the at least one processor: the control network comprises a trainable copy of a layer of the machine learning model and takes the fidelity parameter as input in a form of a class label indicating the level of adherence of the sketch guidance information to the sketch input, training data having a distortion level corresponding to the fidelity parameter.
However, Lu discloses
an apparatus (Lu- Fig. 1 and ¶0024, at least disclose A user of the user device can utilize various products, applications, or services supported by the creative apparatus 108 via the network 106. The user devices 102A-102N can be operated by various users) comprising:
at least one processor (Lu- Figs. 1 and 7 and ¶0103, at least disclose one or more processors 714); and at least one memory including instructions executable by the at least one processor (Lu- ¶0046, at least discloses some functions may be carried out by a processor executing instructions stored in memory as further described with reference to FIG. 7; Fig. 7 and ¶0103, at least disclose computing device 700 includes bus 710 that directly or indirectly couples the following devices: memory 712, one or more processors 714; ¶0105, at least discloses Memory 712 includes computer storage media in the form of volatile and/or nonvolatile memory. As depicted, memory 712 includes instructions 724. Instructions 724, when executed by processor(s) 714 are configured to cause the computing device to perform any of the operations).
training data having a distortion level (Lu- ¶0072, at least discloses Adjusting the neural network to correct for errors is accomplished by changing at least one node parameter of such an image neural network. The image neural network can comprise a plurality of interconnected nodes with a parameter, or weight, associate with each node. While individual parameters do not have to be specified during training of a neural network, examples of such parameters can include edge detection, RGB color, textures of features, roughness and/or blur of a sketch. Each node receives inputs from multiple other nodes and can activate based on the combination of all these inputs, for example, when the sum of the input signals is above a threshold. The parameter can amplify or dampen the input signals. For example, a parameter could be a value between 0 and 1 [Wingdings font/0xE0] “parameters” can include “blur of a sketch” would suggest the distortion level).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng to incorporate the teachings of Lu, and apply the parameters can include blur of a sketch into the image generation model, as taught by Cheng in order the machine learning model further comprises an image generator trained to generate a synthesized image based on the sketch guidance information using training data having a distortion level corresponding to the fidelity parameter.
Doing so would generate an electronic painting from a sketch, where the painting accurately reflects features of the sketch in a designated painting style.
The prior art does not explicitly disclose the control network comprises a trainable copy of a layer of the machine learning model and takes the fidelity parameter as input.
However, Westcott discloses
the control network comprises a trainable copy of a layer of the image generation model (Westcott- ¶0160, at least discloses A ControlNet is a is a neural network that allows for fine-tuning pre-trained diffusion models, such as the LoRA-tuned diffusion models 1127 a, 1127 b, to achieve more control over the image generation process; Fig. 12 and ¶0164, at least disclose The ControlNet 1128 a, 1128 b or other control neural network within each composite neural network 1124 a, 1124 b preferably includes a trainable copy of one or more layers of the artificial neural network implementing the LoRA-tuned diffusion model 1127 a, 1127 b within such composite neural network 1124 a, 1124 b) and takes the fidelity parameter as input (Westcott- ¶0051, at least discloses One principal benefit of this approach is that it is learned how to invert a process (p(y|x)) but balance that progress with the prior (p(x)), which enables learning from experience and provides improved realism (or improved adherence to a desired style); ¶0147-0148, at least disclose The goal of that processing was higher fidelity and faster inference time. However, the warping of input imagery may also serve an additional purpose. This is particularly useful when an outer autoencoder is used (as is done with Stable Diffusion), as that can struggle to faithfully reproduce hands and faces when they do not occupy enough of the frame. Using a warping function, we may devote more pixels to important areas (e.g., hands and face) at the expense of less-important features […] While this may improve the fidelity of the parts the photographer/videographer “cares” about, this may also allow a smaller size image to be denoised with the same quality, thus improving storage size and training/inference time. It is likely that use of LoRA customization on a distorted frame (distorted prior to VAE encoder) will produce better results; ¶0214, at least discloses By dynamically determining a mask for the subject (e.g., vehicle as in FIG. 18A), feathering the edges of the transition region, and then combining with the per-pixel grayscale perceptual quality map (as in FIG. 22C), we may selectively improve the image while maintaining the fidelity of the original subject […] As mentioned, the subject itself may also be improved, but the intent of the graphic is to show that unlike many other diffusion methods (e.g., image-to-image), we maintain the fidelity of the specialized subject even without fine-tuning in the quality assurance pass).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Lu to incorporate the teachings of Westcott, and apply the ControlNet with trainable copy of one or more layers of the artificial neural network into the Cheng/Lu’s teachings in order the control network comprises a trainable copy of a layer of the image generation model and takes the fidelity parameter as input.
Doing so would provide fine-grained image control modifications, quality assurance operations, and branding assurance operations to form a final personalized digital image advertisement.
The prior art does not explicitly disclose, but Voynov discloses
input in a form of a class label indicating the level of adherence of the sketch guidance information to the sketch input (Voynov- Figure 4 shows Sketch-to-Image Translation. For low starting-𝑡 values, the system struggles to add colors and texture to the model, while for high starting-𝑡 values the fidelity to the input sketch significantly decreases. Text-prompt used to condition the model:“A photograph of a bike made of wood"; section 4.2 Comparisons, paragraph 2, at least discloses as can be seen in Figure 4, the model expects that the guiding image lays in the RGB domain, hence, resulting in unnatural, black and white images that follow the input sketch (text-prompt condition used: “A photograph of a bike made of wood"). For low values of 𝑡 , the system struggles to add texture to the model, and when 𝑡 is increased, the fidelity to the input sketch significantly decreases [Wingdings font/0xE0] suggests parameters that control the fidelity to the input sketch; Figure 16 shows Spatial Label Map. (a) Images generated with the class guidance map equal to "day" and "night" with the prompt: "a photograph of an old city". (b) Image generation with a spatially varying soft label map. The right side of the image contains a bright sun, and there are stars and a black sky on the left side; section 5.2 Spatial Labels Guidance, at least discloses when the guidance is performed with a constant 1 or 0 spatial labeling map, the produced images are either attributed to day or night (Figure 16, left). When the labeling map is formed by the interpolation between labels probabilities, the generated image also interpolates the scene between night and day (Figure 16, right) [Wingdings font/0xE0] suggests images generated according to day and night with a prompt).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Lu/Westcott to incorporate the teachings of Voynov, and apply the parameters that control the fidelity and the labels into the Cheng/Lu/Westcott’s teachings in order the control network comprises a trainable copy of a layer of the image generation model and takes the fidelity parameter as input in a form of a class label indicating the level of adherence of the sketch guidance information to the sketch input.
Doing so would apply to a rich variety of sketch styles from diverse domains.
Regarding claim 17, Cheng in view of Lu, Westcott and Voynov, discloses the apparatus of claim 16, and discloses the apparatus further comprising:
a user interface configured to receive the sketch input and the value of the fidelity parameter (Cheng- Figure 1 shows Three-dimension controls of image generation from stroke and sketch. (left) Our proposed model is able provide three-dimension controls over image synthesis from stroke and sketch. Given sketch and stroke as input, we can control the scales of faithfulness for the synthesized output with respect to the sketch and stroke, as well as the degree of its realism [value of a fidelity parameter]. (right) (a) Given sketch and strokes, we perform sketch/stroke-to-image translation. (b) We generate multimodal results with partial sketch/strokes as input; Figure 7 shows the realism scale is varied from low (0.0, right) to high (1.0, left); page 4055, section 1. Introduction, left column, last paragraph, at least discloses Users can decide to what extent the faithfulness should be to the input sketch [sketch input] and strokes, and to what degree the results are close to real images [value of a fidelity parameter indicating a level of adherence]; page 4056, right column, section 3. Method, at least discloses method, starting from the preliminaries for diffusion models (Section 3.1) and the modifications we make for realizing the conditional generation and the discussion for the sketch and stroke guidance […] and the control over realism); Lu- ¶0035, at least discloses The workspace, as described herein, includes setting of the application program, setting of tools or setting of user interface provided by the application program, and any other setting or properties specific to the application program; ¶0106, at least discloses a natural user interface (NUI) that processes air gestures, voice, or other physiological inputs generated by a user).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Westcott/Voynov to incorporate the teachings of Lu, and apply the user interface into the Cheng/Westcott/Voynov’s teachings in order the user interface configured to receive the sketch input and the value of the fidelity parameter.
The same motivation that was utilized in the rejection of claim 16 applies equally to this claim.
Regarding claim 18, Cheng in view of Lu, Westcott and Voynov, discloses the apparatus of claim 16, and further discloses wherein:
the image generator comprises a diffusion model (Cheng- page 4056, section 2.2. Diffusion Models, at least discloses both the techniques of classifier-free diffusion guidance and ILVR into our diffusion-based framework of image generation for fulfilling a three-dimensional control on the synthesized images in terms of their realism and the consistency with respect to the stroke and sketch conditions; page 4057, section 3.2. Sketch- and Stroke-Guided Diffusion Model, at least discloses To generate images based on the given sketches and strokes, our proposed method concatenates the sketch condition csketch and the stroke condition cstroke along with xt as input for the U-Net model (which is responsible for posterior prediction) […] To separately control the guidance level of the sketch and stroke conditions, we leverage classifier-free guidance [9] and modify it for two-dimensional guidance. In practice, we adopt a two-stage training strategy. First, we train the model with complete sketches and strokes as conditions. Then we fine-tune the model by randomly replacing 30% of each condition with an image filled with gray pixels, denoted as ∅, for unconditional representation. During sampling, the ratio between the degree of faithfulness to strokes and sketches is controlled through the following linear combination with two guidance scales ssketch and sstroke […] With this formulation, our model supports multi-guidance on a single diffusion model).
Regarding claim 19, Cheng in view of Lu, Westcott and Voynov, discloses the apparatus of claim 16, and further discloses wherein:
the control network is initialized using parameters from the image generator (Cheng- page 4055, right column, 2nd paragraph, at least discloses a unified framework of adaptively-realistic image generation from stroke and sketch that encodes the condition of the given stroke and sketch with the classifier-free guidance mechanism [sketch guidance information] and adjusts the degree of realism [value of the fidelity parameter] with a latent variable refinement technique).
Regarding claim 20, Cheng in view of Lu, Westcott and Voynov, discloses the apparatus of claim 16, and discloses the apparatus further comprising:
a data preparation component configured to distort the sketch input based on the distortion level (Lu- ¶0055, at least discloses an input sketch generally refers to a sketch provided to the neural network system, or portion thereof. Input sketches used to train the image neural network may be referred to herein as training sketches or training input sketches; ¶0071-0072, at least disclose The image neural network can then be trained by evaluating differences between the reference image used to create the training sketch [sketch input] and the training intermediate image to determine any errors or discrepancies therebetween […] Adjusting the neural network to correct for errors is accomplished by changing at least one node parameter of such an image neural network. The image neural network can comprise a plurality of interconnected nodes with a parameter, or weight, associate with each node. While individual parameters do not have to be specified during training of a neural network, examples of such parameters can include edge detection, RGB color, textures of features, roughness and/or blur of a sketch. Each node receives inputs from multiple other nodes and can activate based on the combination of all these inputs, for example, when the sum of the input signals is above a threshold. The parameter can amplify or dampen the input signals. For example, a parameter could be a value between 0 and 1 [Wingdings font/0xE0] “parameters” can include “blur of a sketch” would suggest the sketch input corresponding to the distortion level).
It would have been obvious to one of ordinary in the art before the effective filing date of the claimed invention to have modified Cheng/Westcott/Voynov to incorporate the teachings of Lu, and apply the parameters include blur of a sketch into the Cheng/Westcott/Voynov’s teachings in order a data preparation component configured to distort the sketch input based on the distortion level.
The same motivation that was utilized in the rejection of claim 16 applies equally to this claim.
Conclusion
11. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. They are as recited in the attached PTO-892 form.
12. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL LE whose telephone number is (571)272-5330. The examiner can normally be reached 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL LE/Primary Examiner, Art Unit 2614