DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed 02/12/2026 has been entered. Claims 1-20 remain pending in the application.
Response to Arguments
Applicant’s arguments have been considered but are moot because the new ground of rejection does not rely on any combination of references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Objections
Claim 15 is objected to because of the following informalities: “to generate a blended noise input” should read “to generate the blended noise input”. Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 8-10, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Greenen et al. (U.S. Patent No. 2024/0221242 A1), hereinafter Greenen, in view of Babanin et al. (U.S. Patent No. 2025/0054210 A1), hereinafter Babanin, in further view of Whitaker (Whitaker, Jonathan. Multi-Resolution Noise for Diffusion Model Training, 28 Feb 2023 [online], [retrieved on 2026-03-02]. Retrieved from the Internet <URL: https://wandb.ai/johnowhitaker/multires_noise/reports/Multi-Resolution-Noise-for-Diffusion-Model-Training--VmlldzozNjYyOTU2>).
Regarding claim 1, Greenen teaches a method (Greenen, para 33-34: “generate content tiles for purposes such as texture tiling, in accordance with at least one embodiment…a user may decide to use texture tiling for a region of the content to be generated, such as to fill one or more large areas or regions of the virtual environment”) comprising:
receiving a digital image portraying a scene to be replicated as a digital material (Greenen, sample image to be represented in content tiles, para 42: “input indicating a type of content to be represented in a set of content tiles is received 552. This input can be received in or as a number of different forms, such as text, speech, a sample image”; see also para 23 and FIG 1-2; para 155: “input identifying a type of content for a region of a scene to be rendered”); and
generating, using a controlled diffusion neural network (Greenen, para 22: “In at least one embodiment, a diffusion network can be updated (trained) to generate content tiles that satisfy a set of rules or boundary conditions”), a plurality of material maps corresponding to the scene portrayed by the digital image (Greenen, BRDF texture channels, para 35: “a stable diffusion algorithm can be provided or modified that can enable synthesis of texture sets that are compliant with any given tiling rule(s). Such an approach can be text-prompted in some embodiments, and can also be used to generate full physically-based, bidirectional reflectance distribution function (BRDF) texture channels (e.g., albedo, normals, roughness, and metal)”) based on the spatial condition (Greenen, boundary condition).
While Greenen teaches a boundary condition that is input to the diffusion network, Greenen fails to teach generating, using a conditioning neural network, a spatial condition from the digital image. Greenen also fails to teach determining a blended noise input for generating material maps from the digital image based on noise input corresponding to a plurality of scales and generating, from the blended noise input (emphasis added).
However, Babanin teaches generating, using a conditioning neural network, a spatial condition from a digital image (Babanin, ControlNet, para 23: “Stable diffusion models can be controlled by one or more deep learning algorithms (e.g., ControlNet) based on various conditions described herein. Specifically, a stable diffusion model can generate synthesized images (e.g., personalized narrative images) based on text prompts and one or more conditions (e.g., a sketch, a pose map, a depth map, a normal map, or a canny edge) …A condition, such as a sketch, can be a control image that defines an entity's general shape and position (e.g., a person or an animal) in the image”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have utilized a conditioning neural network to generate a spatial condition from the digital image, as taught by Babanin, with the method of Greenen in order to control the diffusion model output based on the input digital image, allowing the method to produce material maps that reflect the spatial features of the input image (Babanin, see last citation).
Additionally, Whitaker teaches determining a blended noise input for a diffusion model based on noise input corresponding to a plurality of scales (Whitaker, pg. 3, last para: “combining very high frequency noise with extremely low-frequency noise (the offset)”; see blended pyramid noise versus random Gaussian noise in the image on pg. 4, pg. 4 continues: “The idea is to create noise at different resolutions and stack them, optionally scaling down the lower-resolution noise according to some factor”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the blended noise input of Whitaker with the diffusion neural network of Greenen in view of Babanin in order to improve the results of the model, specifically for very light and dark generated images/materials (Whitaker, pg. 2: “degrade all aspects of the image signal using multi-resolution noise, resulting in a model that can generate more diverse images than regular stable diffusion, including extremely light or dark images which have historically been hard to achieve without resorting to using a large number of sampling steps”). Since Whitaker teaches the use of the blended noise input for diffusion models, the combination of Greenen in view of Babanin and Whitaker teaches generating, from the blended noise input and using a controlled diffusion neural network.
Regarding claim 8 (dependent on claim 1), Greenen in view of Babanin and Whitaker teaches wherein generating the plurality of material maps based on the spatial condition using the controlled diffusion neural network comprises generating the plurality of material maps based on the spatial condition using the controlled diffusion neural network over a plurality of denoising steps (Greenen, para 38: “In a diffusion-based approach where there is stable diffusion, hundreds of iterations or more can be performed to generate a single textured layout. This can start by passing in instances of noise, or noisy priors, which can be at least slightly denoised for each pass or iteration. After a number of iterations, a high-quality texture, or textured layout, with low noise can be obtained that satisfies all relevant boundary conditions.”).
Regarding claim 9 (dependent on claim 1), Greenen in view of Babanin and Whitaker teaches wherein generating the plurality of material maps corresponding to the scene portrayed by the digital image comprises generating a plurality of spatially varying bidirectional reflectance distribution function maps corresponding to the scene portrayed by the digital image (Greenen, para 35: “generate full physically-based, bidirectional reflectance distribution function (BRDF) texture channels (e.g., albedo, normals, roughness, and metal)”).
Regarding claim 10, Greenen teaches a non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations (Greenen, para 223: “code (e.g., executable code or source code) is stored on a set of one or more non-transitory computer-readable storage media having stored thereon executable instructions (or other memory to store executable instructions) that, when executed (i.e., as a result of being executed) by one or more processors of a computer system, cause computer system to perform operations described herein”). All further claim limitations are met and rendered obvious by Greenen in view of Babanin and Whitaker because the operations performed by the processing device in claim 10 are the same as the method steps in claim 1.
Regarding claim 13 (dependent on claim 10), Greenen in view of Babanin and Whitaker teaches wherein generating, from the blended noise input (Taught in combination with Whitaker, see details in claim 1 rejection) and using the controlled diffusion neural network, the plurality of material maps corresponding to the scene portrayed by the digital image based on the spatial condition comprises generating the plurality of material maps from the blended noise input and using the controlled diffusion neural network over a plurality of diffusion steps based on the spatial condition (Greenen, para 38: “In a diffusion-based approach where there is stable diffusion, hundreds of iterations or more can be performed to generate a single textured layout. This can start by passing in instances of noise, or noisy priors, which can be at least slightly denoised for each pass or iteration. After a number of iterations, a high-quality texture, or textured layout, with low noise can be obtained that satisfies all relevant boundary conditions.”).
Claims 2 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, and Rombach et al. (attached with IDS, Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).), hereinafter Rombach.
Regarding claim 2 (dependent on claim 1), Greenen in view of Babanin and Whitaker teaches further comprising generating, using the controlled diffusion neural network, the plurality of material maps based on the spatial condition comprises generating, using the controlled diffusion neural network, the plurality of material maps based on the spatial condition and a noised input (Greenen, para 22: "The diffusion network can receive a noisy image (e.g., a noisy prior image) as input"; 422 in FIG 4). However, Greenen fails to explicitly teach comprising determining, from a noise distribution, a plurality of noised latent tensors corresponding to the plurality of scales (emphasis added).
Rombach teaches a latent diffusion model (Rombach, abstract) comprising determining, from a noise distribution, a plurality of noised latent tensors (Rombach, noised inputs that are iteratively denoised by the model, pg. 10686, section 3.1: “encoder E encodes x into a latent representation z”; zt in FIG 3 on pg. 10687). The diffusion network of Greenen takes latent code as input (Greenen, para 40), but fails to explicitly teach the noised latent tensor. Rombach teaches the known technique of utilizing a latent diffusion model and inputting a noised latent tensor to the diffusion neural network. A person having ordinary skill in the art, before the effective filing date of the claimed invention, could have applied the known technique, as taught by Rombach, in the same way to the method of Greenen in view of Babanin and Whitaker and achieved predictable results of a more computationally efficient model by utilizing low-dimensional latent space. Thus, the combination of Greenen in view of Babanin, Whitaker, and Rombach teaches a plurality of noised latent tensors corresponding to the plurality of scales and generating based on the plurality of noised latent tensors (Latent tensors corresponding to noised images at a plurality of scales; for example, for the plurality of noised images on pg. 2 of Whitaker).
Regarding claim 17, Greenen teaches a system comprising:
one or more memory components (Greenen, para 223: “non-transitory computer-readable storage media having stored thereon executable instructions (or other memory to store executable instructions)”); and
one or more processing devices coupled to the one or more memory components, the one or more processing devices to perform operations (Greenen, para 223: “stored thereon executable instructions (or other memory to store executable instructions) that, when executed (i.e., as a result of being executed) by one or more processors of a computer system, cause computer system to perform operations described herein”) comprising:
a digital image portraying a scene to be replicated as a digital material (Greenen, sample image to be represented in content tiles, para 42: “input indicating a type of content to be represented in a set of content tiles is received 552. This input can be received in or as a number of different forms, such as text, speech, a sample image”; see also para 23 and FIG 1-2; para 155: “input identifying a type of content for a region of a scene to be rendered”);
providing the spatial condition to a controlled diffusion neural network (Greenen, boundary condition, para 22: “In at least one embodiment, a diffusion network can be updated (trained) to generate content tiles that satisfy a set of rules or boundary conditions”);
determining a noised input for the controlled diffusion neural network (Greenen, para 22: "The diffusion network can receive a noisy image (e.g., a noisy prior image) as input"; 422 in FIG 4);
using the controlled diffusion neural network over a plurality of diffusion steps (Greenen, para 38: “In a diffusion-based approach where there is stable diffusion, hundreds of iterations or more can be performed to generate a single textured layout. This can start by passing in instances of noise, or noisy priors, which can be at least slightly denoised for each pass or iteration. After a number of iterations, a high-quality texture, or textured layout, with low noise can be obtained that satisfies all relevant boundary conditions.”); and
generating a plurality of material maps corresponding to the scene portrayed by the digital image (Greenen, BRDF texture channels, para 35: “a stable diffusion algorithm can be provided or modified that can enable synthesis of texture sets that are compliant with any given tiling rule(s). Such an approach can be text-prompted in some embodiments, and can also be used to generate full physically-based, bidirectional reflectance distribution function (BRDF) texture channels (e.g., albedo, normals, roughness, and metal)”).
While Greenen teaches a boundary condition that is input to the diffusion network, Greenen fails to teach generating, using a conditioning neural network, a spatial condition from the digital image. Further, Greenen fails to explicitly teach determining a noised latent tensor at a first scale for the controlled diffusion neural network from a noise distribution; determining a blended noise input from the noised latent tensor at the first scale and an additional noised latent tensor at a second scale; generating, using the controlled diffusion neural network, a denoised latent tensor based on the spatial condition and the blended noise input; and generating, using a decoder and from the denoised latent tensor, a plurality of material maps corresponding to the scene portrayed by the digital image (emphasis added).
However, Babanin teaches generating, using a conditioning neural network, a spatial condition from a digital image (Babanin, ControlNet, para 23: “Stable diffusion models can be controlled by one or more deep learning algorithms (e.g., ControlNet) based on various conditions described herein. Specifically, a stable diffusion model can generate synthesized images (e.g., personalized narrative images) based on text prompts and one or more conditions (e.g., a sketch, a pose map, a depth map, a normal map, or a canny edge) …A condition, such as a sketch, can be a control image that defines an entity's general shape and position (e.g., a person or an animal) in the image”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have utilized a conditioning neural network to generate a spatial condition from the digital image, as taught by Babanin, with the system of Greenen in order to control the diffusion model output based on the input digital image, allowing the system to produce material maps that reflect the spatial features of the input image (Babanin, see last citation).
Additionally, Rombach teaches a latent diffusion model (Rombach, abstract) comprising determining a noised latent tensor for the diffusion neural network from a noise distribution (Rombach, pg. 10686, section 3.1: “encoder E encodes x into a latent representation z”; zt in FIG 3 on pg. 10687); generating, using the diffusion neural network, a denoised latent tensor based on the spatial condition and the noised input (Rombach, denoised latent tensor, see the denoising step of zt in FIG 3, pg. 10687, section 3.2: “predict a denoised variant of their input xt, where xt is a noisy version of the input x”; condition input, pg. 10687, section 3.3: “conditional denoising autoencoder …and paves the way to controlling the synthesis process through inputs y such as text [66], semantic maps [32, 59] or other image-to-image translation tasks”; Further, in combination with the teachings of Greenen in view of Babanin, the generated denoised latent tensor is based on the spatial condition due to the spatial condition’s influence on the performance of the controlled diffusion neural network.); and generating, using a decoder and from the denoised latent tensor, an output (Rombach, see output in the pixel space in FIG 3, pg. 10686, section 3.1: “decoder”). The diffusion network of Greenen takes latent code as input (Greenen, para 40), but fails to explicitly teach the noised and denoised latent tensors. Rombach teaches the known technique of utilizing a latent diffusion model and noised/denoised latent tensors before decoding an output in the pixel space. A person having ordinary skill in the art, before the effective filing date of the claimed invention, could have applied the known technique, as taught by Rombach, in the same way to the system of Greenen and achieved predictable results of a more computationally efficient model by utilizing low-dimensional latent space.
Lastly, Whitaker teaches determining a blended noise input for a diffusion model based on noise input corresponding to a plurality of scales (Whitaker, pg. 3, last para: “combining very high frequency noise with extremely low-frequency noise (the offset)”; see blended pyramid noise versus random Gaussian noise in the image on pg. 4, pg. 4 continues: “The idea is to create noise at different resolutions and stack them, optionally scaling down the lower-resolution noise according to some factor”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the blended noise input of Whitaker with the diffusion neural network of Greenen in view of Babanin and Rombach in order to improve the results of the model, specifically for very light and dark generated images/materials (Whitaker, pg. 2: “degrade all aspects of the image signal using multi-resolution noise, resulting in a model that can generate more diverse images than regular stable diffusion, including extremely light or dark images which have historically been hard to achieve without resorting to using a large number of sampling steps”). Since Whitaker teaches the use of the blended noise input for diffusion models, the combination of Greenen in view of Babanin, Whitaker, and Rombach teaches generating a denoised latent tensor based on the blended noise input and using a controlled diffusion neural network. Greenen in view of Babanin, Whitaker, and Rombach also teaches a noised latent tensor at a first scale (Latent tensor corresponding to noised images at one of a plurality of scales; for example, for the middle noised image on pg. 2 of Whitaker); an additional noised latent tensor at a second scale (Latent tensor corresponding to noised images at one of a plurality of scales; for example, for the rightmost noised image on pg. 2 of Whitaker); thus teaching determining a blended noise input from the noised latent tensor at a first scale and an additional noised latent tensor at a second scale (Pyramid noise on pg. 4 of Whitaker, in combination with the noised latent tensors taught by Rombach above).
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, Rombach, Ding et al. (Ding, Z., Zhang, M., Wu, J., & Tu, Z. (2023). Patched Denoising Diffusion Models For High-Resolution Image Synthesis. arXiv preprint arXiv:2308.01316.), hereinafter Ding, and Kathuria (Kathuria, Ayoosh. Data Augmentation for Bounding Boxes: Scaling and Translation. Paperspace, 2018 [online], [retrieved on 2025-11-10]. Retrieved from the Internet <URL: https://web.archive.org/web/20230713235510/https://blog.paperspace.com/data-augmentation-bounding-boxes-scaling-translation/>).
Regarding claim 3 (dependent on claim 2), Greenen in view of Babanin, Whitaker, and Rombach teaches generating the plurality of noised latent tensors and generating the plurality of material maps based on the spatial condition and the plurality of noised latent tensors (See claim 2 rejection), but fails to teach further comprising generating a rolled noised latent tensor by translating at least one noised latent tensor using a translation factor, wherein generating, using the controlled diffusion neural network, the plurality of material maps based on the spatial condition and the plurality of noised latent tensors comprises generating, using the controlled diffusion neural network, the plurality of material maps based on the spatial condition and the plurality of noised latent tensors including the rolled noised latent tensor.
However, Ding teaches a similar method (Ding, pg. 1, abstract: “denoising diffusion model”) further comprising generating a rolled noised latent tensor by translating at least one noised latent tensor (Ding, pg. 4, FIG 3 - see the shifted noised image patches in the feature space, leftmost patches outlined in yellow; the translated noised patches result in translated noise in the latent feature space; last paragraph on pg. 2: “partial features of neighboring patches are cropped”), wherein generating, using the diffusion neural network, an output based on the noised latent tensor comprises generating, using the diffusion neural network, an output based on the rolled noised latent tensor (Ding, pg. 4, FIG 3: see final decoded image at the far right). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the rolled noised latent tensor, taught by Ding, with the method of Greenen in view of Babanin, Whitaker, and Rombach in order to improve the combination of the tiles, based on the material maps, by reducing artifacts at the image border (Ding, pg. 1, abstract: “Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space”). Since Greenen in view of Babanin, Whitaker, and Rombach teach generating based on the plurality of noised latent tensors (See claim 2 rejection), Greenen in view of Babanin, Whitaker, Rombach, and Ding teach generating based on the plurality of noised latent tensors including the rolled noised latent tensor.
PNG
media_image1.png
447
396
media_image1.png
Greyscale
Yet, Ding fails to teach generating a rolled noised latent tensor by translating the noised latent tensor using a translation factor (emphasis added). However, Kathuria teaches a method for translating image data using a translation factor (Kathuria, translate factor, Translate section on pg. 5: “In addition to making sure that the translate factor is not less than -1, you should also make sure it isn't greater than 1, or else you're just gonna get a black image since whole of the image will be shifted.”; see the attached photo below demonstrating the method).
PNG
media_image2.png
294
764
media_image2.png
Greyscale
Ding and Kathuria each disclose a method for translating an image. A person of ordinary skill in the art, before the effective filing date of the claimed invention, would have recognized that the method of translating the image data to obtain patches, taught by Ding, could have been substituted for the translate method taught by Kathuria because both serve the purpose of obtaining a shifted image displaying a portion of the original image. Furthermore, a person of ordinary skill in the art would have been able to carry out the substitution. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the translate method of Ding, in the combination of Greenen in view of Babanin, Whitaker, Rombach, and Ding, for the translate method of Kathuria according to known methods to yield the predictable result of obtaining image patches for use in a diffusion network, improving the performance of the model. According to both attached images, a person of ordinary skill in the art would recognize that the method of Kathuria would produce the image patches in the method of Ding.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, Rombach, Ding, Kathuria, and Zhang et al. (attached with IDS, Zhang, L., Rao, A., & Agrawala, M. (2023). Adding Conditional Control to Text-to-Image Diffusion Models. arXiv preprint arXiv:2302.05543.), hereinafter Zhang.
Regarding claim 4 (dependent on claim 3), Greenen in view of Babanin, Whitaker, Rombach, Ding, and Kathuria teaches generating the rolled noised latent tensor and generating the plurality of material maps based on the spatial condition and the plurality of noised latent tensors including the rolled noised latent tensor (See claim 3 rejection), but fails to teach wherein: generating, using the conditioning neural network, the spatial condition from the digital image comprises generating, using the conditioning neural network, the spatial condition from the digital image and a binary mask that masks a portion of the scene portrayed by the digital image; and generating, using the controlled diffusion neural network, the plurality of material maps based on the spatial condition and the plurality of noised latent tensors including the rolled noised latent tensor comprises generating the plurality of material maps based on the spatial condition and the plurality of noised latent tensors including the rolled noised latent tensor by using the controlled diffusion neural network to generate content for the portion of the scene masked by the binary mask via inpainting (emphasis added).
However, Zhang teaches the generation of a spatial condition from a digital image and a binary mask that masks a portion of the scene portrayed by the digital image; and using the controlled diffusion neural network to generate content for the portion of the scene masked by the binary mask via inpainting (Zhang, pg. 2: “This paper presents ControlNet, an end-to-end neural network architecture that controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions… We train several ControlNets with various datasets of different conditions, e.g., Canny edges”; FIG 16 on pg. 21: “Masked Diffusion. By diffusing images in masked areas, the Canny-edge model can be used to support pen-based editing of image contents”).
The use of a binary mask and digital image with the conditioning neural network can be utilized to guide the inpainting of a specific region, similar to the generation of the border in the top left content tile in Figure 2 of Greenen (Greenen, para 28: “In this example, each color or shade of an edge of a content tile 200, or of a region proximate an edge of a content tile 220, can be associated with a given boundary condition of a set of boundary conditions”). Greenen teaches the use of a boundary condition and digital image with the conditioning neural network to guide the content tile generation (Greenen, para 28: “In this example, each color or shade of an edge of a content tile 200, or of a region proximate an edge of a content tile 220, can be associated with a given boundary condition of a set of boundary conditions”). Thus, Greenen and Zhang each disclose a method for controlling a diffusion network. A person of ordinary skill in the art, before the effective filing date of the claimed invention, would have recognized that the boundary condition of Greenen could have been substituted for the binary mask input to a conditioning neural network, taught by Zhang, because both serve the purpose of controlling the inpainting location by the diffusion network. Furthermore, a person of ordinary skill in the art would have been able to carry out the substitution. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the boundary condition input to the diffusion network, taught by Greenen in the combination of Greenen in view of Babanin, Whitaker, Rombach, Ding, and Kathuria, for the binary mask of Zhang according to known methods to yield the predictable result of an improved tileable digital material by masking the border of the input image.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, and Sandow et al. (U.S. Patent No. 2024/0420405 A1), hereinafter Sandow.
Regarding claim 5 (dependent on claim 1), Greenen in view of Babanin and Whitaker teaches further comprising teaches generating the plurality of material maps (Greenen, para 35: “BRDF”) and generating a graphical element that includes the digital material by tiling the digital material across a graphical element (Greenen, para 29: “Once a set of content tiles 240 is obtained, that set can be used to fill a region of a virtual (graphical) environment”; para 33: “generate content tiles for purposes such as texture tiling…These assets can include characters, objects, models, animations, or other such aspects or features.”); however, Greenen fails to explicitly teach wherein the plurality of material maps are used to create the tile of the digital material.
However, Sandow teaches a similar method (Sandow, abstract) wherein the operations further comprise generating a graphical element that includes the digital material by using the plurality of material maps to create a tile of the digital material and tiling the digital material across the graphical element (Sandow, para 77: “FIG. 2B depicts an example of a texture stack 200B. As described above, texture stacks may be a series of texture maps that represent the respective characteristics of a given PBR material. These texture maps may be combined together in three-dimensional renderers with each texture map assigned to its relevant material shader to provide accurate data for each shader's physical interaction with lights within a given three-dimensional space”; para 53: “A texture map may be a two-dimensional image of a surface that may be used to cover three-dimensional objects”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the method steps taught by Sandow, wherein each texture map is applied to the graphical element, with the steps taught by Greenen, in Greenen in view of Babanin and Whitaker, in order to apply each type of texture appropriately in the rendering, ensuring that each type interacts properly with light (Sandow, see last citation).
Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, Rombach, and Wang et al. (Wang, J., Yue, Z., Zhou, S., Chan, K. C., & Loy, C. C. (2023). Exploiting Diffusion Prior for Real-World Image Super-Resolution. arXiv preprint arXiv:2305.07015.), hereinafter Wang.
Regarding claim 7 (dependent on claim 1), Greenen in view of Babanin and Whitaker teaches generating, from the blended noise input and using the controlled diffusion neural network, an output based on the spatial condition (See claim 1 rejection), but fails to explicitly teach wherein generating, using the controlled diffusion neural network and based on the spatial condition, the plurality of material maps corresponding to the scene portrayed by the digital image comprises: generating a denoised latent tensor; and generating the plurality of material maps based on the denoised latent tensor by using a decoder to decode overlapping patches of the denoised latent tensor (emphasis added).
However, Rombach teaches a latent diffusion model (Rombach, abstract) comprising generating, using the controlled diffusion neural network, a denoised latent tensor (Rombach, denoised latent tensor, see the denoising step of zt in FIG 3, pg. 10687, section 3.2: “predict a denoised variant of their input xt, where xt is a noisy version of the input x”; condition input, pg. 10687, section 3.3: “conditional denoising autoencoder …and paves the way to controlling the synthesis process through inputs y such as text [66], semantic maps [32, 59] or other image-to-image translation tasks”). The diffusion network of Greenen takes latent code as input (Greenen, para 40), but fails to explicitly teach the noised and denoised latent tensors. Rombach teaches the known technique of utilizing a latent diffusion model and noised/denoised latent tensors before decoding an output in the pixel space. A person having ordinary skill in the art, before the effective filing date of the claimed invention, could have applied the known technique, as taught by Rombach, in the same way to the method of Greenen and achieved predictable results of a more computationally efficient model by utilizing low-dimensional latent space. In combination with the teachings of Greenen in view of Babanin and Whitaker, the generated denoised latent tensor is based on the spatial condition due to the spatial condition’s influence on the performance of the controlled diffusion neural network, and from the blended noise input taught by Whitaker.
Additionally, Wang teaches generating a diffusion model output based on a denoised latent tensor by using a decoder (Wang, pg. 5, 2nd paragraph in the left column: “Since Stable Diffusion is implemented in the latent space of an autoencoder, it is natural to leverage the encoder features of the autoencoder to modulate the corresponding decoder features”; see FIG 2 on pg. 3) to decode overlapping patches of the denoised latent tensor (Wang, pg. 2, 1st paragraph in the right column: “Our approach involves dividing the image into overlapping patches and fusing these patches using a Gaussian kernel at each diffusion iteration”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the above teachings of Wang with the method of Greenen in view of Babanin, Whitaker, and Rombach in order to result in a more coherent output (Wang, pg. 2, 1st paragraph in the right column: “This process smooths out the boundaries, resulting in a more coherent output”).
Regarding claim 16 (dependent on claim 10), Greenen in view of Babanin and Whitaker teaches generating, from the blended noise input and using the controlled diffusion neural network, the plurality of material maps based on the spatial condition, but fails to explicitly teach comprises: generating a denoised latent tensor from the blended noise input using the controlled diffusion neural network and based on the spatial condition; and generating the plurality of material maps from the denoised latent tensor by using a decoder to: decode overlapping patches of the denoised latent tensor; and blend the overlapping patches that have been decoded using truncated Gaussian weights (emphasis added).
However, Rombach teaches a latent diffusion model (Rombach, abstract) comprising generating, using the controlled diffusion neural network, a denoised latent tensor (Rombach, denoised latent tensor, see the denoising step of zt in FIG 3, pg. 10687, section 3.2: “predict a denoised variant of their input xt, where xt is a noisy version of the input x”; condition input, pg. 10687, section 3.3: “conditional denoising autoencoder …and paves the way to controlling the synthesis process through inputs y such as text [66], semantic maps [32, 59] or other image-to-image translation tasks”). The diffusion network of Greenen takes latent code as input (Greenen, para 40), but fails to explicitly teach the noised and denoised latent tensors. Rombach teaches the known technique of utilizing a latent diffusion model and noised/denoised latent tensors before decoding an output in the pixel space. A person having ordinary skill in the art, before the effective filing date of the claimed invention, could have applied the known technique, as taught by Rombach, in the same way to the operation steps of Greenen and achieved predictable results of a more computationally efficient model by utilizing low-dimensional latent space. In combination with the teachings of Greenen in view of Babanin and Whitaker, the generated denoised latent tensor is based on the spatial condition due to the spatial condition’s influence on the performance of the controlled diffusion neural network, and generated from the blended noise input taught by Whitaker.
Additionally, Wang teaches generating a diffusion model by using a decoder (Wang, pg. 5, 2nd paragraph in the left column: “Since Stable Diffusion is implemented in the latent space of an autoencoder, it is natural to leverage the encoder features of the autoencoder to modulate the corresponding decoder features”; see FIG 2 on pg. 3) to: decode overlapping patches of the denoised latent tensor (Wang, pg. 2, 1st paragraph in the right column: “Our approach involves dividing the image into overlapping patches and fusing these patches using a Gaussian kernel at each diffusion iteration”); and blend the overlapping patches that have been decoded using truncated Gaussian weights (Wang, pg. 5, 1st paragraph in the right column: “To integrate overlapping patches, a weight map of size 64 × 64 is generated for each patch using a centered Gaussian kernel. Overlapping pixels are then weighted in accordance with their respective Gaussian weight maps”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the above teachings of Wang with the operation steps of Greenen in the combination of Greenen in view of Babanin, Whitaker, and Rombach in order to result in a more coherent output (Wang, pg. 2, 1st paragraph in the right column: “This process smooths out the boundaries, resulting in a more coherent output”).
Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, and Sandow.
Regarding claim 11 (dependent on claim 10), Greenen in view of Babanin and Whitaker teaches generating the plurality of material maps (Greenen, para 35: “BRDF”) and applying the digital material to a surface of a three-dimensional model (Greenen, para 29: “Once a set of content tiles 240 is obtained, that set can be used to fill a region of a virtual (graphical) environment”; para 33: “generate content tiles for purposes such as texture tiling…These assets can include characters, objects, models, animations, or other such aspects or features.”; three-dimensional, para 44: “This may include not only two-dimensional tiles, but potentially three-dimensional volumes or four-dimensional objects that may change over time, as may relate to animation or changes in appearance or behavior over time”); however, Greenen fails to explicitly teach wherein the plurality of material maps are used to apply the digital material.
However, Sandow teaches a similar method (Sandow, abstract) wherein the operations further comprise using the plurality of material maps to generate a three-dimensional model by using the plurality of material maps to apply the digital material to a surface of the three-dimensional model (Sandow, para 77: “FIG. 2B depicts an example of a texture stack 200B. As described above, texture stacks may be a series of texture maps that represent the respective characteristics of a given PBR material. These texture maps may be combined together in three-dimensional renderers with each texture map assigned to its relevant material shader to provide accurate data for each shader's physical interaction with lights within a given three-dimensional space”; para 53: “A texture map may be a two-dimensional image of a surface that may be used to cover three-dimensional objects”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the method steps taught by Sandow, wherein each texture map is applied to the rendered 3D space, with the steps taught by Greenen in view of Babanin and Whitaker in order to apply each type of texture appropriately in the rendering, ensuring that each type interacts properly with light (Sandow, see last citation).
Regarding claim 12 (dependent on claim 11), Greenen in view of Babanin, Whitaker, and Sandow teaches wherein using the plurality of material maps to apply the digital material to the surface of the three-dimensional model comprises: generating a tileable digital material from the plurality of material maps (Sandow, para 73: “At operation 250A, the electronic device may post-process one or more images. Such post-processing may create a seamlessly tileable digital material; a modular, object-based digital material; and so on”; see FIG 2A and para 71 describing the texture stack); and repeating the tileable digital material across the surface of the three-dimensional model (Greenen, para 70: “Such components can be used to generate content tiles that can be placed so as to satisfy one or more boundary conditions while being used repeatedly across a region to fill that region with a type of content (e.g., texture)”).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, Ding, and Kathuria.
Regarding claim 14 (dependent on claim 13), Greenen in view of Babanin and Whitaker teaches generating the plurality of material maps from the blended noise input using the controlled diffusion neural network and based on the spatial condition over the plurality of diffusion steps (See claim 13 rejection), but fails to teach determining, before each diffusion step, a rolled noised latent tensor by translating the blended noised latent tensor using a translation factor; generating, for each diffusion step, a rolled latent tensor from the rolled noised latent tensor and based on the spatial condition using the controlled diffusion neural network; and generating, after each diffusion step, an unrolled latent tensor by unrolling the rolled latent tensor.
However, Ding teaches a similar method (Ding, pg. 1, abstract: “denoising diffusion model”) further determining, before each diffusion step, a rolled noised latent tensor by translating a noise input (Ding, pg. 4, FIG 3 - see the shifted noised image patches in the feature space, leftmost patches outlined in yellow; the translated noised patches result in translated noise in the latent feature space; last paragraph on pg. 2: “partial features of neighboring patches are cropped”; pg. 5, Inference section: “latent diffusion model”); generating, for each diffusion step, a rolled latent tensor from the rolled noised latent tensor and generating, after each diffusion step, an unrolled latent tensor by unrolling the rolled latent tensor (Ding, pg. 4, FIG 3: see patches combined in the feature collage; pg. 4, last paragraph before section 5: “Before a feature map goes through the decoder, a split and collage operation is applied to it. Thus, the decoder outputs the predicted noise of the shifted patch”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the rolled noised latent tensor, taught by Ding, with the blended noise input and operation steps of Greenen in view of Babanin and Whitaker in order to improve the combination of the material maps when used as tiles by reducing artifacts at the image border (Ding, pg. 1, abstract: “Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space”). In combination with the teachings of Greenen in view of Babanin and Whitaker, the generated rolled latent tensor is based on the spatial condition due to the spatial condition’s influence on the performance of the controlled diffusion neural network and performed using the blended noise input of Whitaker.
Yet, Ding fails to teach wherein generating a rolled noised latent tensor by translating the noised latent tensor using a translation factor (emphasis added). However, Kathuria teaches a method for translating image data using a translation factor (Kathuria, translate factor, Translate section on pg. 5: “In addition to making sure that the translate factor is not less than -1, you should also make sure it isn't greater than 1, or else you're just gonna get a black image since whole of the image will be shifted.”; see the attached photo below demonstrating the method).
Ding and Kathuria each disclose a method for translating an image. A person of ordinary skill in the art, before the effective filing date of the claimed invention, would have recognized that the method of translating the image data to obtain patches, taught by Ding, could have been substituted for the translate method taught by Kathuria because both serve the purpose of obtaining a shifted image displaying a portion of the original image. Furthermore, a person of ordinary skill in the art would have been able to carry out the substitution. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the translate method of Ding, in the combination of Greenen in view of Babanin, Whitaker, and Ding, for the translate method of Kathuria according to known methods to yield the predictable result of obtaining image patches for use in a diffusion network, improving the performance of the model. According to both attached images, a person of ordinary skill in the art would recognize that the method of Kathuria would produce the image patches in the method of Ding.
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, Rombach, and Sandow.
Regarding claim 18 (dependent on claim 17), Greenen in view of Babanin, Whitaker, and Rombach teaches generating the plurality of material maps (Greenen, para 35: “BRDF”) and wherein the one or more processing devices further perform operations comprising generating one or more objects within a three-dimensional design that display the scene portrayed by the digital image (Greenen, para 29: “Once a set of content tiles 240 is obtained, that set can be used to fill a region of a virtual (graphical) environment”; para 33: “generate content tiles for purposes such as texture tiling…These assets can include characters, objects, models, animations, or other such aspects or features.”; three-dimensional, para 44: “This may include not only two-dimensional tiles, but potentially three-dimensional volumes or four-dimensional objects that may change over time, as may relate to animation or changes in appearance or behavior over time”); however, Greenen fails to explicitly teach wherein the plurality of material maps are used to generate the one or more objects.
However, Sandow teaches a similar method (Sandow, abstract) wherein the operations further comprise using the plurality of material maps to generate one or more objects within a three-dimensional design that display the scene portrayed by the digital image (Sandow, para 77: “FIG. 2B depicts an example of a texture stack 200B. As described above, texture stacks may be a series of texture maps that represent the respective characteristics of a given PBR material. These texture maps may be combined together in three-dimensional renderers with each texture map assigned to its relevant material shader to provide accurate data for each shader's physical interaction with lights within a given three-dimensional space”; para 53: “A texture map may be a two-dimensional image of a surface that may be used to cover three-dimensional objects”). It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the method steps taught by Sandow, wherein each texture map is applied to the rendered 3D space, with the system of Greenen in view of Babanin, Whitaker, and Rombach in order to apply each type of texture appropriately in the rendering, ensuring that each type interacts properly with light (Sandow, see last citation).
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Greenen in view of Babanin, Whitaker, Rombach, and Zhang.
Regarding claim 19 (dependent on claim 17), Greenen in view of Babanin, Whitaker, and Rombach teaches wherein: the one or more processing devices further perform operations comprising, generating, using a style encoder, a global condition from the digital image (Greenen, para 34: “A network such as a contrastive language-image pretraining (CLIP) network can be used to take input such as an image or text and then generate an appropriate encoding 420, or latent embedding, which can then be passed as input to the diffusion network”); and
generating, using the controlled diffusion neural network over the plurality of diffusion steps (Taught by Greenen, see claim 17 rejection), the denoised latent tensor based on the spatial condition and the blended noise input (Taught in combination with Whitaker, see details in claim 17 rejection) comprises generating, using the controlled diffusion neural network over the plurality of diffusion steps, the denoised latent tensor based on the spatial condition, the blended noise input, and the global condition (Greenen, see in FIG 4 how the encoding 420 is input to the diffusion network; latent tensors and spatial condition taught in combination with Babanin and Rombach, see claim 17 rejection).
However, Greenen in view of Babanin, Whitaker, and Rombach fails to teach generating, using the conditioning neural network, the spatial condition from the digital image comprises generating, using the conditioning neural network, the spatial condition from the digital image and a binary mask that masks a border of the digital image. Zhang teaches the generation of a spatial condition from a digital image and a binary mask of the digital image (Zhang, pg. 2: “This paper presents ControlNet, an end-to-end neural network architecture that controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions… We train several ControlNets with various datasets of different conditions, e.g., Canny edges”; FIG 16 on pg. 21: “Masked Diffusion. By diffusing images in masked areas, the Canny-edge model can be used to support pen-based editing of image contents”). While Greenen in view of Babanin is relied upon to teach the spatial condition generated by the conditioning neural network, Greenen also teaches a spatial condition, or boundary condition, to influence the output of the diffusion network (Greenen, see para 28 and FIG 2). The use of a binary mask and digital image with the conditioning neural network can be utilized to guide the inpainting of a specific region, similar to the border generated in the top left content tile in Figure 2 using the boundary condition of Greenen (Greenen, para 28: “In this example, each color or shade of an edge of a content tile 200, or of a region proximate an edge of a content tile 220, can be associated with a given boundary condition of a set of boundary conditions”).
Thus, Greenen and Zhang each disclose a method for controlling a diffusion network, with Greenen demonstrating this method at the border of the image. A person of ordinary skill in the art, before the effective filing date of the claimed invention, would have recognized that the boundary condition of Greenen could have been substituted for the binary mask input to a conditioning neural network, taught by Zhang, because both serve the purpose of controlling the output of the diffusion network at the edges. Furthermore, a person of ordinary skill in the art would have been able to carry out the substitution. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to substitute the boundary condition input to the diffusion network, taught by Greenen in view of Babanin, Whitaker, and Rombach, for the binary mask of Zhang according to known methods to yield the predictable result of an improved tileable digital material by masking the border of the input image.
Allowable Subject Matter
Claims 6, 15, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter:
Regarding claims 6 and 15, while the prior art teaches, in combination, noised latent tensors at a first and second scale (based on the lower and higher scale noised images taught by Whitaker, such as in claim 17), the relied upon prior art fails to teach, alone or in reasonable combination, the noised latent tensors processed via lower and higher scale diffusion steps. Regarding claim 20, the prior art fails to teach the claim limitations listed below. Related art described by Yuan, Xu, and Ho (cited in the Non-Final Office Action of 11/13/2025) fails to remedy the missing claim limitations in reasonable combination.
Therefore, the prior art fails to teach as a whole wherein:
Claim 6: generating, using the controlled diffusion neural network and based on the spatial condition, the plurality of material maps from the blended noise input that comprises a mixture of noise input for a diffusion step of the controlled diffusion neural network at a first scale and additional noise input for a diffusion step of the controlled diffusion neural network at a second scale that is higher in scale than the first scale.
Claim 15: determining a lower-scale noised latent tensor by processing a noised latent tensor at a first resolution via one or more lower-scale diffusion steps; determining a higher-scale noised latent tensor by processing an additional noised latent tensor at a second resolution via one or more higher-scale diffusion steps, the second resolution comprising a higher resolution than the first resolution; and blending the lower-scale noised latent tensor with the higher-scale noised latent tensor to generate a blended noise input.
Claim 20: generating, using the decoder, lower-resolution material maps from a lower-resolution version of the denoised latent tensor; generating, using the decoder, higher-resolution material maps from a higher-resolution version of the denoised latent tensor; and generating the plurality of material maps by using a mean matching operation between regions of the lower-resolution material maps and corresponding regions of the higher-resolution material maps.
In view of the foregoing, the prior art references alone or in reasonable combination are insufficient to teach the invention as a whole, as claimed in claims 6, 15, and 20.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EMMA E DRYDEN whose telephone number is (571)272-1179. The examiner can normally be reached M-F 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANDREW BEE can be reached at (571) 270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EMMA E DRYDEN/Examiner, Art Unit 2677
/ANDREW W BEE/Supervisory Patent Examiner, Art Unit 2677