Prosecution Insights
Last updated: April 19, 2026
Application No. 18/538,122

METHOD FOR IMAGE GENERATION USING WAVELET DIFFUSION SCHEME

Non-Final OA §101§102§103
Filed
Dec 13, 2023
Examiner
HANSEN, CONNOR LEVI
Art Unit
2672
Tech Center
2600 — Communications
Assignee
VINAI ARTIFICIAL INTELLIGENCE APPLICATION AND RESEARCH JOINT STOCK COMPANY
OA Round
1 (Non-Final)
75%
Grant Probability
Favorable
1-2
OA Rounds
2y 10m
To Grant
99%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allow Rate
21 granted / 28 resolved
+13.0% vs TC avg
Strong +29% interview lift
Without
With
+29.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
32 currently pending
Career history
60
Total Applications
across all art units

Statute-Specific Performance

§101
19.1%
-20.9% vs TC avg
§103
39.9%
-0.1% vs TC avg
§102
16.8%
-23.2% vs TC avg
§112
23.7%
-16.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 28 resolved cases

Office Action

§101 §102 §103
Detailed Action Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Objections Claim 1 is objected to because of the following informalities: In line 13 “performing an inverse wavelet transform the single target to reconstruct an output image” should read “performing an inverse wavelet transform on the single target to reconstruct an output image”. Appropriate correction is required. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 18-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter (i.e., signal per se. See MPEP § 2106.03). Claim 18 recites “A computer program product for image generation via backward diffusion from a random image, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to…”. The broadest reasonable interpretation of computer readable storage medium encompasses a transitory signal. This is supported on page 9, paragraph 0044 of the specification, “The term "computer readable medium" as used herein refers to any medium that participates in providing data (e.g., instructions) which may be read by a computer, a processor or a like device. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.”. Transitory signals are not considered patent-eligible subject matter because they do not fall within any of the four statutory categories of appropriate subject matter for a patent: process, machines, manufactures and composition of matter. Therefore, claim 18 is rejected under 35 U.S.C. 101. Dependent claims 19 and 20 do not add any tangible structure or physical embodiments to the claimed “computer readable storage medium”. Therefore, the dependent claims are rejected under 35 U.S.C. 101 for the same reason as independent claim 18. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claims 1, 2, 4-8, 10, and 12-16 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Liu et al. (US 20240169488 A1), (hereinafter Liu). Regarding claim 1, Liu teaches a method for image generation via backward diffusion from a random image, the method comprising: obtaining the random image; transforming the random image, using a wavelet transform, to decompose the obtained random image into four wavelet subbands to leverage high frequency information of the obtained random image for further increasing the details of a generated image for a backward diffusion process (Liu, “In contrast, diffusion models of the present disclosure include wavelet U-Net 415. The downsampling blocks 405 are replaced with wavelet transform blocks 420, and the upsampling blocks 410 are replaced with inverse wavelet transform block 425. The wavelet transform blocks 420 apply a wavelet transform, such as DWT, to an input signal.”, pg. 5, paragraph 0058, lines 1-6, “In this example, wavelet transform operation 505 is applied to input signal 500. Some implementations of the transform include a matrix multiplication between input signal 500 and a matrix representation of a wavelet, such as the Haar wavelet. In an example, the product of the transform yields low frequency signal 510 and high frequency signals 515. 'LL' or ("low low") is sometimes used to refer to low frequency signal 510, which represents an approximation of the input image signal at a reduced resolution. High frequency signals 515 may include three channels, 'LH', 'HL', and 'HR.”, pg. 5, paragraph 0061, lines 1-11, Images input to a diffusion model are first transformed into four wavelet subbands (e.i. LL, LH, HL, HR) prior to backward diffusion.); in the backward diffusion process, starting from each timestep t=T down to t=1, gradually generating a less-corrupted sample yt-1 from the four wavelet subbands by using a network pθ(yt-1 | y-t) with parameters θ (Liu, “FIG. 7 shows a diffusion process 700 according to aspects of the present disclosure. As described above with reference to FIG. 3, a diffusion model can include both a forward diffusion process 705 for adding noise to an image ( or features in a latent space) and a reverse diffusion process 710 (e.g., the denoising network) for denoising the images (or features) to obtain a denoised image.”, pg. 6, paragraph 0073, lines 1-7, “The neural network may be trained to perform the reverse process. During the reverse diffusion process 710, the model begins with noisy data Xr, such as a noisy image 715 and denoises the data to obtain the p(x,_1 Ix,). At each step t-1, the reverse diffusion process 710 takes x, such as first intermediate image 720, and t as input.”, pg. 6, paragraph 0075, see Eq. (1) and Fig. 7, Backward propagation, performed based on a parameterized model, is applied to the wavelets of the input image to generate a denoised sample over a range of timesteps.); after obtaining the clean sample y0 through T steps, concatenating four output wavelet subbands as a single target; and performing an inverse wavelet transform the single target to reconstruct an output image (Liu, “Inverse wavelet transform operation 520 may receive low frequency signal 510 and high frequency signals 515 as input. Then, inverse wavelet transform operation 520 reconstructs an image signal at an increased resolution using both the low frequency and high frequency information.”, pg. 5, paragraph 0062, lines 1-5, “At operation 615, a noise map is initialized that includes random noise. The noise map may be in a pixel space or a latent space. By initializing an image with random noise, different variations of an image including the content described by the conditional guidance can be generated. At operation 620, the system generates an image based on the noise map and the conditional guidance vector. For example, the image may be generated using a reverse diffusion process as described with reference to FIG. 3. The reverse diffusion process includes wavelet transforms and inverse wavelet transforms, and is capable of generating images with increased texture detail.”, pg. 6, paragraphs 0071 and 0072, Once each wavelet subband is denoised, an inverse wavelet transform is applied to reconstruct a final output image. This IWT process takes all wavelet subbands together as an input.). Regarding claim 2, Liu teaches the method of claim 1, wherein y0 is a clean sample and yt is a corrupted sample at timestep t (Liu, “The neural network may be trained to perform the reverse process. During the reverse diffusion process 710, the model begins with noisy data XT, such as a noisy image 715 and denoises the data to obtain the p(xt-1 | xt). At each step t-1, the reverse diffusion process 710 takes xt, such as first intermediate image 720, and t as input. Here, t represents a step in the sequence of transitions associated with different noise levels, The reverse diffusion process 710 outputs xt, such as second intermediate image 725 iteratively until xT is reverted back to x0 , the original image 730.”, pg. 6, paragraph 0075, see Eq. (1)). Regarding claim 4, Liu teaches the method of claim 1, wherein the wavelet transform is a Haar wavelet transform (Liu, “In this example, wavelet transform operation 505 is applied to input signal 500. Some implementations of the transform include a matrix multiplication between input signal 500 and a matrix representation of a wavelet, such as the Haar wavelet.”, pg. 5, paragraph 0061). Regarding claim 5, Liu teaches the method of claim 1, wherein the network is modeled to incorporate information into a feature space through a generator to strengthen awareness of high-frequency components (Liu, “the present disclosure proposes directly changing the architecture of the model. For example, embodiments maintain the U-Net "shape" of the model, but instead of using downsampling layers, embodiments substitute wavelet transform layers. The wavelet transformations produce a reduced resolution image features in one channel, and can produce additional channels of other information, such as edge or texture detail information. Furthermore, unlike conventional downsampling layers, information isn't lost during the process. A high resolution image or image features can be constructed using an inverse wavelet transform, which retains the high frequency data.”, pg. 2, paragraph 0023, “Finally, an image decoder 350 decodes the denoised image features 345 to obtain an output image 355 in pixel space 310.”, pg. 4, paragraph 0049, lines 9-11, Images in the diffusion model are processed through an encoder and decoder. The decoder functions as a generator by applying an inverse wavelet transform to reconstruct the image in pixel space, producing an output that retains high frequency details.). Regarding claim 6, Liu teaches the method of claim 1, wherein the network is modeled for M down-sampling and M up-sampling blocks, plus skip connections between blocks of a same resolution, where M is a predefined number (Liu, “U-Net is an artificial neural network (ANN) architecture that comprises many convolutional layers. The layers include pooling operations which downsample an input, and up-convolution operations which up-sample the input, resulting in a schematic 'U' shape. Many U-Nets further include a series of residual blocks, as well as skip connections to propagate signals between the downsampling and upsampling paths.”, pg. 2, paragraph 0021, lines 5-12, “the present disclosure proposes directly changing the architecture of the model. For example, embodiments maintain the U-Net "shape" of the model, but instead of using downsampling layers, embodiments substitute wavelet transform layers.”, pg. 2, paragraph 0023, lines 1-6, “The up-sampled features can be combined with intermediate features having a same resolution and number of channels via a skip connection.”, pg. 4, paragraph 0055, lines 4-6, The diffusion model alters the U-net architecture by replacing each upsampling and downsampling blocks with a corresponding wavelet transform or inverse wavelet transform blocks. The model further includes skipping connections between blocks of the same resolution.). Regarding claim 7, Liu teaches the method of claim 1, wherein the network is modeled using frequency-aware blocks in place of down-sampling and up-sampling operators (Liu, “the present disclosure proposes directly changing the architecture of the model. For example, embodiments maintain the U-Net "shape" of the model, but instead of using downsampling layers, embodiments substitute wavelet transform layers.”, pg. 2, paragraph 0023, lines 1-6). Regarding claim 8, Liu teaches the method of claim 1, wherein the network is modeled using, at a lowest resolution, frequency-bottleneck blocks for attention on low and high-frequency components (Liu, “Many U-Nets further include a series of residual blocks, as well as skip connections to propagate signals between the downsampling and upsampling paths. The U-Net architecture includes a bottleneck in the middle (at the bottom of the "U") to preserve and learn the most important information during training-i.e., the parameters with the largest effect in the image generation process.”, pg. 2, paragraph 0021, lines 10-16, “Embodiments of denoising network 340 include an ANN with a U-Net architecture. However, instead of downsampling blocks and upsampling blocks, embodiments utilize wavelet transform blocks and inverse wavelet transform blocks, respectively, to reduce the resolution of image features and increase the resolution of image features throughout the denoising process.”, pg. 4, paragraph 0051, lines 1-7, The diffusion model encodes the input images by iteratively reducing the resolution to a bottleneck block, where the model learns and preserves important frequency information. It then decodes these features to reconstruct a high-resolution output image.). Claim 10 corresponds to claim 1, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 1. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (Liu, “According to some aspects, computing device 1200 includes one or more processors 1205. In some cases, a processor is an intelligent hardware device, (e.g., a general purpose processing component, a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic device, a discrete gate or transistor logic component, a discrete hardware component, or a combination thereof. In some cases, a processor is configured to operate a memory array using a memory controller.”, pg. 8, paragraph 0111, lines 1-12) configured to execute the method according to claim 1. As indicated in the analysis of claim 1, Liu teaches all the limitations according to claim 1. Therefore, claim 10 is rejected for the same reasons of anticipation as claim 1. Claim 12 corresponds to claim 4, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 4. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 4. As indicated in the analysis of claim 4, Liu teaches all the limitations according to claim 4. Therefore, claim 12 is rejected for the same reasons of anticipation as claim 4. Claim 13 corresponds to claim 5, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 5. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 5. As indicated in the analysis of claim 5, Liu teaches all the limitations according to claim 5. Therefore, claim 13 is rejected for the same reasons of anticipation as claim 5. Claim 14 corresponds to claim 6, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 6. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 6. As indicated in the analysis of claim 6, Liu teaches all the limitations according to claim 6. Therefore, claim 14 is rejected for the same reasons of anticipation as claim 6. Claim 15 corresponds to claim 7, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 7. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 7. As indicated in the analysis of claim 7, Liu teaches all the limitations according to claim 7. Therefore, claim 15 is rejected for the same reasons of anticipation as claim 7. Claim 16 corresponds to claim 8, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 8. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 8. As indicated in the analysis of claim 8, Liu teaches all the limitations according to claim 8. Therefore, claim 16 is rejected for the same reasons of anticipation as claim 8. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 3, 11, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (US 20240169488 A1) in view of Ho et al. (“Denoising Diffusion Probabilistic Models”, 34th Conference on Neural Information Processing Systems, 2020), (hereinafter Ho). Regarding claim 3, Liu teaches the method of claim 1. Liu does not teach wherein the network p θ ( y t - 1 y t ) with parameters θ is p θ y t - 1 y t = N ( y t - 1 ; μ θ y t , t , σ t 2 I ) ; and μ θ y t , t and σ t 2 are a mean and a variance of a parametric network model, respectively. However, Ho teaches wherein the network p θ ( y t - 1 y t ) with parameters θ is p θ y t - 1 y t = N ( y t - 1 ; μ θ y t , t , σ t 2 I ) ; and μ θ y t , t and σ t 2 are a mean and a variance of a parametric network model, respectively (Ho, “Now we discuss our choices in p θ x t - 1 x t = N ( x t - 1 ; μ θ x t , t , ∑ θ ( x t , t ) ) for 1 < t ≤ T. First, we set ∑ θ ( x t , t ) = σ t 2 I   to untrained time dependent constants… Second, to represent the mean μ θ x t , t , we propose a specific parameterization motivated by the following analysis of Lt. With p θ x t - 1 x t = N ( x t - 1 ; μ θ x t , t , σ t 2 I ) , we can write: (see Eq. (8))”, pg. 3, Section 3.2 Reverse process and L1:T-1, lines 1-9). Liu teaches a backward diffusion model which calculates a full covariance ∑ θ ( x t , t ) (Liu, “The neural network may be trained to perform the reverse process. During the reverse diffusion process 710, the model begins with noisy data XT, such as a noisy image 715 and denoises the data to obtain the p(xt-1|xt). At each step t-1, the reverse diffusion process 710 takes xt, such as first intermediate image 720, and t as input.”, pg. 6, paragraph 0075, lines 1-6, see Eq. (1)). Ho teaches replacing a full covariance ∑ θ ( x t , t ) in a backward diffusion model with a fixed scalar variance σ t 2 I (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the backward diffusion model of Liu by replacing the full covariance with the fixed scalar variance as taught by Ho (Ho, pg. 3, Section 3.2 Reverse process and L1:T-1, lines 1-9). The motivation for doing so would have been to simplify training and stabilize optimization for the model (as taught by Ho, “We also see that learning reverse process variances (by incorporating a parameterized diagonal ∑ θ ( x t ) into the variational bound) leads to unstable training and poorer sample quality compared to fixed variances.”, pg. 6, Section 4.2 Reverse process parametrization and training objective ablation, lines 4-6). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Liu with Ho to obtain the invention according to claim 3. Regarding claim 11, Liu teaches the system of claim 10, wherein: y0 is a clean sample and yt is a corrupted sample at timestep t (Liu, “The neural network may be trained to perform the reverse process. During the reverse diffusion process 710, the model begins with noisy data XT, such as a noisy image 715 and denoises the data to obtain the p(xt-1 | xt). At each step t-1, the reverse diffusion process 710 takes xt, such as first intermediate image 720, and t as input. Here, t represents a step in the sequence of transitions associated with different noise levels, The reverse diffusion process 710 outputs xt, such as second intermediate image 725 iteratively until xT is reverted back to x0 , the original image 730.”, pg. 6, paragraph 0075, see Eq. (1)). Liu does not teach wherein the network p θ ( y t - 1 y t ) with parameters θ is p θ y t - 1 y t = N ( y t - 1 ; μ θ y t , t
Read full office action

Prosecution Timeline

Dec 13, 2023
Application Filed
Dec 10, 2025
Non-Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12530785
TRACKING DEVICE, TRACKING METHOD, AND RECORDING MEDIUM
2y 5m to grant Granted Jan 20, 2026
Patent 12524984
HISTOGRAM OF GRADIENT GENERATION
2y 5m to grant Granted Jan 13, 2026
Patent 12518363
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, IMAGE PROCESSING SYSTEM, AND STORAGE MEDIUM WITH PIECEWISE LINEAR FUNCTION FOR TONE CONVERSION ON IMAGE
2y 5m to grant Granted Jan 06, 2026
Patent 12499648
IMAGE PROCESSING APPARATUS, IMAGE CAPTURING APPARATUS, CONTROL METHOD, AND STORAGE MEDIUM FOR DETECTING SUBJECT IN CAPTURED IMAGE
2y 5m to grant Granted Dec 16, 2025
Patent 12482257
REDUCING ENVIRONMENTAL INTERFERENCE FROM IMAGES
2y 5m to grant Granted Nov 25, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+29.2%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 28 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month