Office Action Analysis: 18538122 — METHOD FOR IMAGE GENERATION USING WAVELET DIFFUSION SCHEME

Examiner Intelligence

HANSEN, CONNOR LEVI View full profile →
Grants 78% — above average
Career Allowance Rate
25 granted / 32 resolved
+16.1% vs TC avg
Strong +28% interview lift
Without
With
+27.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
20 currently pending
Career history
62
Total Applications
across all art units
Statute-Specific Performance

§101
4.4%
-35.6% vs TC avg
§103
81.7%
+41.7% vs TC avg
§102
2.6%
-37.4% vs TC avg
§112
11.3%
-28.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 32 resolved cases
Office Action

§101 §102 §103
Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 1 is objected to because of the following informalities:  
In line 13 “performing an inverse wavelet transform the single target to reconstruct an output image” should read “performing an inverse wavelet transform on the single target to reconstruct an output image”.
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 18-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter (i.e., signal per se. See MPEP § 2106.03).

Claim 18 recites “A computer program product for image generation via backward diffusion from a random image, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to…”. The broadest reasonable interpretation of computer readable storage medium encompasses a transitory signal. This is supported on page 9, paragraph 0044 of the specification, “The term "computer readable medium" as used herein refers to any medium that participates in providing data (e.g., instructions) which may be read by a computer, a processor or a like device.  Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.”. Transitory signals are not considered patent-eligible subject matter because they do not fall within any of the four statutory categories of appropriate subject matter for a patent: process, machines, manufactures and composition of matter. Therefore, claim 18 is rejected under 35 U.S.C. 101.  

Dependent claims 19 and 20 do not add any tangible structure or physical embodiments
to the claimed “computer readable storage medium”. Therefore, the dependent claims are rejected under 35 U.S.C. 101 for the same reason as independent claim 18.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 2, 4-8, 10, and 12-16 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Liu et al. (US 20240169488 A1), (hereinafter Liu).

Regarding claim 1, Liu teaches a method for image generation via backward diffusion from a random image, the method comprising: 
obtaining the random image; 
transforming the random image, using a wavelet transform, to decompose the obtained random image into four wavelet subbands to leverage high frequency information of the obtained random image for further increasing the details of a generated image for a backward diffusion process (Liu, “In contrast, diffusion models of the present disclosure include wavelet U-Net 415. The downsampling blocks 405 are replaced with wavelet transform blocks 420, and the upsampling blocks 410 are replaced with inverse wavelet transform block 425. The wavelet transform blocks 420 apply a wavelet transform, such as DWT, to an input signal.”, pg. 5, paragraph 0058, lines 1-6, “In this example, wavelet transform operation 505 is applied to input signal 500. Some implementations of the transform include a matrix multiplication between input signal 500 and a matrix representation of a wavelet, such as the Haar wavelet. In an example, the product of the transform yields low frequency signal 510 and high frequency signals 515. 'LL' or ("low low") is sometimes used to refer to low frequency signal 510, which represents an approximation of the input image signal at a reduced resolution. High frequency signals 515 may include three channels, 'LH', 'HL', and 'HR.”, pg. 5, paragraph 0061, lines 1-11, Images input to a diffusion model are first transformed into four wavelet subbands (e.i. LL, LH, HL, HR) prior to backward diffusion.); 
in the backward diffusion process, starting from each timestep t=T down to t=1, gradually generating a less-corrupted sample yt-1 from the four wavelet subbands by using a network pθ(yt-1 | y-t) with parameters θ (Liu, “FIG. 7 shows a diffusion process 700 according to aspects of the present disclosure. As described above with reference to FIG. 3, a diffusion model can include both a forward diffusion process 705 for adding noise to an image ( or features in a latent space) and a reverse diffusion process 710 (e.g., the denoising network) for denoising the images (or features) to obtain a denoised image.”, pg. 6, paragraph 0073, lines 1-7, “The neural network may be trained to perform the reverse process. During the reverse diffusion process 710, the model begins with noisy data Xr, such as a noisy image 715 and denoises the data to obtain the p(x,_1 Ix,). At each step t-1, the reverse diffusion process 710 takes x, such as first intermediate image 720, and t as input.”, pg. 6, paragraph 0075, see Eq. (1) and Fig. 7, Backward propagation, performed based on a parameterized model, is applied to the wavelets of the input image to generate a denoised sample over a range of timesteps.); 
after obtaining the clean sample y0 through T steps, concatenating four output wavelet subbands as a single target; and performing an inverse wavelet transform the single target to reconstruct an output image (Liu, “Inverse wavelet transform operation 520 may receive low frequency signal 510 and high frequency signals 515 as input. Then, inverse wavelet transform operation 520 reconstructs an image signal at an increased resolution using both the low frequency and high frequency information.”, pg. 5, paragraph 0062, lines 1-5, “At operation 615, a noise map is initialized that includes random noise. The noise map may be in a pixel space or a latent space. By initializing an image with random noise, different variations of an image including the content described by the conditional guidance can be generated. At operation 620, the system generates an image based on the noise map and the conditional guidance vector. For example, the image may be generated using a reverse diffusion process as described with reference to FIG. 3. The reverse diffusion process includes wavelet transforms and inverse wavelet transforms, and is capable of generating images with increased texture detail.”, pg. 6, paragraphs 0071 and 0072, Once each wavelet subband  is denoised, an inverse wavelet transform is applied to reconstruct a final output image. This IWT process takes all wavelet subbands together as an input.).

Regarding claim 2, Liu teaches the method of claim 1, wherein y0 is a clean sample and yt is a corrupted sample at timestep t (Liu, “The neural network may be trained to perform the reverse process. During the reverse diffusion process 710, the model begins with noisy data XT, such as a noisy image 715 and denoises the data to obtain the p(xt-1 | xt). At each step t-1, the reverse diffusion process 710 takes xt, such as first intermediate image 720, and t as input. Here, t represents a step in the sequence of transitions associated with different noise levels, The reverse diffusion process 710 outputs xt, such as second intermediate image 725 iteratively until xT is reverted back to x0 , the original image 730.”, pg. 6, paragraph 0075, see Eq. (1)).

Regarding claim 4, Liu teaches the method of claim 1, wherein the wavelet transform is a Haar wavelet transform (Liu, “In this example, wavelet transform operation 505 is applied to input signal 500. Some implementations of the transform include a matrix multiplication between input signal 500 and a matrix representation of a wavelet, such as the Haar wavelet.”, pg. 5, paragraph 0061).

Regarding claim 5, Liu teaches the method of claim 1, wherein the network is modeled to incorporate information into a feature space through a generator to strengthen awareness of high-frequency components (Liu, “the present disclosure proposes directly changing the architecture of the model. For example, embodiments maintain the U-Net "shape" of the model, but instead of using downsampling layers, embodiments substitute wavelet transform layers. The wavelet transformations produce a reduced resolution image features in one channel, and can produce additional channels of other information, such as edge or texture detail information. Furthermore,
unlike conventional downsampling layers, information isn't lost during the process. A high resolution image or image features can be constructed using an inverse wavelet transform, which retains the high frequency data.”, pg. 2, paragraph 0023, “Finally, an image decoder 350 decodes the
denoised image features 345 to obtain an output image 355 in pixel space 310.”, pg. 4, paragraph 0049, lines 9-11, Images in the diffusion model are processed through an encoder and decoder. The decoder functions as a generator by applying an inverse wavelet transform to reconstruct the image in pixel space, producing an output that retains high frequency details.).

Regarding claim 6, Liu teaches the method of claim 1, wherein the network is modeled for M down-sampling and M up-sampling blocks, plus skip connections between blocks of a same resolution, where M is a predefined number (Liu, “U-Net is an artificial neural network (ANN) architecture that comprises many convolutional layers. The layers include pooling operations which downsample an input, and up-convolution operations which up-sample the input, resulting in a schematic 'U' shape. Many U-Nets further include a series of residual blocks, as well as skip connections to propagate signals between the downsampling and upsampling paths.”, pg. 2, paragraph 0021, lines 5-12, “the present disclosure proposes directly changing the architecture of the model. For example, embodiments maintain the U-Net "shape" of the model, but instead of using downsampling layers, embodiments substitute wavelet transform layers.”, pg. 2, paragraph 0023, lines 1-6, “The up-sampled features can be combined with intermediate features having a same resolution and number of channels via a skip connection.”, pg. 4, paragraph 0055, lines 4-6, The diffusion model alters the U-net architecture by replacing each upsampling and downsampling blocks with a corresponding wavelet transform or inverse wavelet transform blocks. The model further includes skipping connections between blocks of the same resolution.).

Regarding claim 7, Liu teaches the method of claim 1, wherein the network is modeled using frequency-aware blocks in place of down-sampling and up-sampling operators (Liu, “the present disclosure proposes directly changing the architecture of the model. For example, embodiments maintain the U-Net "shape" of the model, but instead of using downsampling layers, embodiments substitute wavelet transform layers.”, pg. 2, paragraph 0023, lines 1-6).

Regarding claim 8, Liu teaches the method of claim 1, wherein the network is modeled using, at a lowest resolution, frequency-bottleneck blocks for attention on low and high-frequency components (Liu, “Many U-Nets further include a series of residual blocks, as well as skip connections to propagate signals between the downsampling and upsampling paths. The U-Net architecture includes a bottleneck in the middle (at the bottom of the "U") to preserve and learn the most important information during training-i.e., the parameters with the largest effect in the image generation process.”, pg. 2, paragraph 0021, lines 10-16, “Embodiments of denoising network 340 include an
ANN with a U-Net architecture. However, instead of downsampling blocks and upsampling blocks, embodiments utilize wavelet transform blocks and inverse wavelet transform blocks, respectively, to reduce the resolution of image features and increase the resolution of image features throughout the denoising process.”, pg. 4, paragraph 0051, lines 1-7, The diffusion model encodes the input images by iteratively reducing the resolution to a bottleneck block, where the model learns and preserves important frequency information. It then decodes these features to reconstruct a high-resolution output image.).

Claim 10 corresponds to claim 1, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 1. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (Liu, “According to some aspects, computing device 1200 includes one or more processors 1205. In some cases, a processor is an intelligent hardware device, (e.g., a general purpose processing component, a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic device, a discrete gate or transistor logic component, a discrete hardware component, or a combination thereof. In some cases, a processor is configured to operate a memory array using a memory controller.”, pg. 8, paragraph 0111, lines 1-12) configured to execute the method according to claim 1. As indicated in the analysis of claim 1, Liu teaches all the limitations according to claim 1. Therefore, claim 10 is rejected for the same reasons of anticipation as claim 1.

Claim 12 corresponds to claim 4, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 4. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 4. As indicated in the analysis of claim 4, Liu teaches all the limitations according to claim 4. Therefore, claim 12 is rejected for the same reasons of anticipation as claim 4.

Claim 13 corresponds to claim 5, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 5. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 5. As indicated in the analysis of claim 5, Liu teaches all the limitations according to claim 5. Therefore, claim 13 is rejected for the same reasons of anticipation as claim 5.

Claim 14 corresponds to claim 6, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 6. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 6. As indicated in the analysis of claim 6, Liu teaches all the limitations according to claim 6. Therefore, claim 14 is rejected for the same reasons of anticipation as claim 6.

Claim 15 corresponds to claim 7, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 7. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 7. As indicated in the analysis of claim 7, Liu teaches all the limitations according to claim 7. Therefore, claim 15 is rejected for the same reasons of anticipation as claim 7.

Claim 16 corresponds to claim 8, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 8. Liu teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 8. As indicated in the analysis of claim 8, Liu teaches all the limitations according to claim 8. Therefore, claim 16 is rejected for the same reasons of anticipation as claim 8.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 3, 11, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (US 20240169488 A1) in view of Ho et al. (“Denoising Diffusion Probabilistic Models”, 34th Conference on Neural Information Processing Systems, 2020), (hereinafter Ho).

Regarding claim 3, Liu teaches the method of claim 1. Liu does not teach wherein the network                                 
                                    
                                            p
                                        
                                            θ
                                        
                                    (
                                    
                                            y
                                        
                                            t
                                            -
                                            1
                                        
                                                    y
                                                
                                                    t
                                                
                                    )
                                
                             with parameters θ is                                 
                                    
                                            p
                                        
                                            θ
                                        
                                                    y
                                                
                                                    t
                                                    -
                                                    1
                                                
                                                            y
                                                        
                                                            t
                                                        
                                    =
                                    N
                                    (
                                    
                                            y
                                        
                                            t
                                            -
                                            1
                                        
                                    ;
                                    
                                            μ
                                        
                                            θ
                                        
                                                    y
                                                
                                                    t
                                                
                                            ,
                                            t
                                        
                                    ,
                                    
                                            σ
                                        
                                            t
                                        
                                            2
                                        
                                    I
                                    )
                                
                            ; and                                 
                                    
                                            μ
                                        
                                            θ
                                        
                                                    y
                                                
                                                    t
                                                
                                            ,
                                            t
                                        
                             and                                 
                                    
                                            σ
                                        
                                            t
                                        
                                            2
                                        
                             are a mean and a variance of a parametric network model, respectively.
	However, Ho teaches wherein the network                         
                            
                                    p
                                
                                    θ
                                
                            (
                            
                                    y
                                
                                    t
                                    -
                                    1
                                
                                            y
                                        
                                            t
                                        
                            )
                        
                     with parameters θ is                         
                            
                                    p
                                
                                    θ
                                
                                            y
                                        
                                            t
                                            -
                                            1
                                        
                                                    y
                                                
                                                    t
                                                
                            =
                            N
                            (
                            
                                    y
                                
                                    t
                                    -
                                    1
                                
                            ;
                            
                                    μ
                                
                                    θ
                                
                                            y
                                        
                                            t
                                        
                                    ,
                                    t
                                
                            ,
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                            I
                            )
                        
                    ; and                         
                            
                                    μ
                                
                                    θ
                                
                                            y
                                        
                                            t
                                        
                                    ,
                                    t
                                
                     and                         
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                     are a mean and a variance of a parametric network model, respectively (Ho, “Now we discuss our choices in                         
                            
                                    p
                                
                                    θ
                                
                                            x
                                        
                                            t
                                            -
                                            1
                                        
                                                    x
                                                
                                                    t
                                                
                            =
                            N
                            (
                            
                                    x
                                
                                    t
                                    -
                                    1
                                
                            ;
                            
                                    μ
                                
                                    θ
                                
                                            x
                                        
                                            t
                                        
                                    ,
                                    t
                                
                            ,
                            
                                    ∑
                                    
                                        θ
                                    
                                    (
                                    
                                            x
                                        
                                            t
                                        
                                    ,
                                    t
                                    )
                                
                            )
                        
                     for 1 < t ≤ T. First, we set                         
                            
                                    ∑
                                    
                                        θ
                                    
                                    (
                                    
                                            x
                                        
                                            t
                                        
                                    ,
                                    t
                                    )
                                
                            =
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                            I
                             
                     to untrained time dependent constants… Second, to represent the mean                         
                            
                                    μ
                                
                                    θ
                                
                                            x
                                        
                                            t
                                        
                                    ,
                                    t
                                
                    , we propose a specific parameterization motivated by the following analysis of Lt. With                         
                            
                                    p
                                
                                    θ
                                
                                            x
                                        
                                            t
                                            -
                                            1
                                        
                                                    x
                                                
                                                    t
                                                
                            =
                            N
                            (
                            
                                    x
                                
                                    t
                                    -
                                    1
                                
                            ;
                            
                                    μ
                                
                                    θ
                                
                                            x
                                        
                                            t
                                        
                                    ,
                                    t
                                
                            ,
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                            I
                            )
                        
                    , we can write: (see Eq. (8))”, pg. 3, Section 3.2 Reverse process and L1:T-1, lines 1-9).
	Liu teaches a backward diffusion model which calculates a full covariance                         
                            
                                    ∑
                                    
                                        θ
                                    
                                    (
                                    
                                            x
                                        
                                            t
                                        
                                    ,
                                    t
                                    )
                                
                     (Liu, “The neural network may be trained to perform the reverse process. During the reverse diffusion process 710, the model begins with noisy data XT, such as a noisy image 715 and denoises the data to obtain the p(xt-1|xt). At each step t-1, the reverse diffusion process 710 takes xt, such as first intermediate image 720, and t as input.”, pg. 6, paragraph 0075, lines 1-6, see Eq. (1)). Ho teaches replacing a full covariance                         
                            
                                    ∑
                                    
                                        θ
                                    
                                    (
                                    
                                            x
                                        
                                            t
                                        
                                    ,
                                    t
                                    )
                                
                     in a backward diffusion model with a fixed scalar variance                         
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                            I
                        
                     (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the backward diffusion model of Liu by replacing the full covariance with the fixed scalar variance as taught by Ho (Ho, pg. 3, Section 3.2 Reverse process and L1:T-1, lines 1-9). The motivation for doing so would have been to simplify training and stabilize optimization for the model (as taught by Ho, “We also see that learning reverse process variances (by incorporating a parameterized diagonal                         
                            
                                    ∑
                                    
                                        θ
                                    
                                    (
                                    
                                            x
                                        
                                            t
                                        
                                    )
                                
                     into the variational bound) leads to unstable training and poorer sample quality compared to fixed variances.”, pg. 6, Section 4.2 Reverse process parametrization and training objective ablation, lines 4-6). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Liu with Ho to obtain the invention according to claim 3.

Regarding claim 11, Liu teaches the system of claim 10, wherein: y0 is a clean sample and yt is a corrupted sample at timestep t (Liu, “The neural network may be trained to perform the reverse process. During the reverse diffusion process 710, the model begins with noisy data XT, such as a noisy image 715 and denoises the data to obtain the p(xt-1 | xt). At each step t-1, the reverse diffusion process 710 takes xt, such as first intermediate image 720, and t as input. Here, t represents a step in the sequence of transitions associated with different noise levels, The reverse diffusion process 710 outputs xt, such as second intermediate image 725 iteratively until xT is reverted back to x0 , the original image 730.”, pg. 6, paragraph 0075, see Eq. (1)).

	Liu does not teach wherein the network                         
                            
                                    p
                                
                                    θ
                                
                            (
                            
                                    y
                                
                                    t
                                    -
                                    1
                                
                                            y
                                        
                                            t
                                        
                            )
                        
                     with parameters θ is                         
                            
                                    p
                                
                                    θ
                                
                                            y
                                        
                                            t
                                            -
                                            1
                                        
                                                    y
                                                
                                                    t
                                                
                            =
                            N
                            (
                            
                                    y
                                
                                    t
                                    -
                                    1
                                
                            ;
                            
                                    μ
                                
                                    θ
                                
                                            y
                                        
                                            t
                                        
                                    ,
                                    t
                                
                            ,
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                            I
                            )
                        
                    ; and                         
                            
                                    μ
                                
                                    θ
                                
                                            y
                                        
                                            t
                                        
                                    ,
                                    t
                                
                     and                         
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                     are a mean and a variance of a parametric network model, respectively. However, as indicated in the analysis of claim 3, Ho teaches the addition of this limitation. Accordingly, claim 11 is rejected for the same reasons of obviousness as claim 3.

Claim 18 corresponds to claim 2, with the addition of a computer program product comprising a computer readable storage medium and wherein the network                                 
                                    
                                            p
                                        
                                            θ
                                        
                                    (
                                    
                                            y
                                        
                                            t
                                            -
                                            1
                                        
                                                    y
                                                
                                                    t
                                                
                                    )
                                
                             with parameters θ is                                 
                                    
                                            p
                                        
                                            θ
                                        
                                                    y
                                                
                                                    t
                                                    -
                                                    1
                                                
                                                            y
                                                        
                                                            t
                                                        
                                    =
                                    N
                                    (
                                    
                                            y
                                        
                                            t
                                            -
                                            1
                                        
                                    ;
                                    
                                            μ
                                        
                                            θ
                                        
                                                    y
                                                
                                                    t
                                                
                                            ,
                                            t
                                        
                                    ,
                                    
                                            σ
                                        
                                            t
                                        
                                            2
                                        
                                    I
                                    )
                                
                            ; and                                 
                                    
                                            μ
                                        
                                            θ
                                        
                                                    y
                                                
                                                    t
                                                
                                            ,
                                            t
                                        
                             and                                 
                                    
                                            σ
                                        
                                            t
                                        
                                            2
                                        
                             are a mean and a variance of a parametric network model, respectively. As indicated in the analysis of claim 2, Liu teaches all the limitations according to claim 2. Liu teaches the addition of a computer program product comprising a computer readable storage medium (Liu, “Some examples of the method, apparatus, non-transitory computer readable medium…”, pg. 7, paragraph 0091, lines 1-2). 
Liu does not teach the addition of wherein the network                         
                            
                                    p
                                
                                    θ
                                
                            (
                            
                                    y
                                
                                    t
                                    -
                                    1
                                
                                            y
                                        
                                            t
                                        
                            )
                        
                     with parameters θ is                         
                            
                                    p
                                
                                    θ
                                
                                            y
                                        
                                            t
                                            -
                                            1
                                        
                                                    y
                                                
                                                    t
                                                
                            =
                            N
                            (
                            
                                    y
                                
                                    t
                                    -
                                    1
                                
                            ;
                            
                                    μ
                                
                                    θ
                                
                                            y
                                        
                                            t
                                        
                                    ,
                                    t
                                
                            ,
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                            I
                            )
                        
                    ; and                         
                            
                                    μ
                                
                                    θ
                                
                                            y
                                        
                                            t
                                        
                                    ,
                                    t
                                
                     and                         
                            
                                    σ
                                
                                    t
                                
                                    2
                                
                     are a mean and a variance of a parametric network model, respectively. However, as indicated in the analysis of claim 3, Ho teaches the addition of this limitation. Accordingly, claim 18 is rejected for the same reasons of obviousness as claim 3.

Regarding claim 19, Liu in view of Ho teaches the computer program product of claim 18, wherein the wavelet transform is a Haar wavelet transform (Liu, “In this example, wavelet transform operation 505 is applied to input signal 500. Some implementations of the transform include a matrix multiplication between input signal 500 and a matrix representation of a wavelet, such as the Haar wavelet.”, pg. 5, paragraph 0061).

Regarding claim 20, Liu in view of Ho teaches the computer program product of claim 18, wherein instructions are configured to cause the computer to model the network for one or more of the following features: 
to incorporate information into a feature space through a generator to strengthen awareness of high-frequency components (Liu, “the present disclosure proposes directly changing the architecture of the model. For example, embodiments maintain the U-Net "shape" of the model, but instead of using downsampling layers, embodiments substitute wavelet transform layers. The wavelet transformations produce a reduced resolution image features in one channel, and can produce additional channels of other information, such as edge or texture detail information. Furthermore, unlike conventional downsampling layers, information isn't lost during the process. A high resolution image or image features can be constructed using an inverse wavelet transform, which retains the high frequency data.”, pg. 2, paragraph 0023, “Finally, an image decoder 350 decodes the denoised image features 345 to obtain an output image 355 in pixel space 310.”, pg. 4, paragraph 0049, lines 9-11, Images in the diffusion model are processed through an encoder and decoder. The decoder functions as a generator by applying an inverse wavelet transform to reconstruct the image in pixel space, producing an output that retains high frequency details); 
for M down-sampling and M up-sampling blocks, plus skip connections between blocks of a same resolution, where M is a predefined number; 
using frequency-aware blocks in place of down-sampling and up-sampling operators; 
using, at a lowest resolution, frequency-bottleneck blocks for attention on low and high-frequency components; or 
to incorporate original signals Y to different feature pyramids of an encoder, introducing frequency residual connections using wavelet down-sample layers.

Claims 9 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (US 20240169488 A1) in view of Zeng et al. (US 20250086849 A1), (hereinafter Zeng).

Regarding claim 9, Liu teaches the method of claim 1, wherein the network is modeled introducing frequency residual connections using wavelet down-sample layers (Liu, “In some embodiments of diffusion model 205, one or more (or all) of the convolutional blocks are replaced with "ResBlocks," or units of a ResNet. A ResNet is a neural network architecture that addresses issues associated with training deep neural networks.”, pg. 3, paragraph 0037, lines 1-5, ).
	
Liu does not teach wherein the network is modeled incorporating original signals Y to different feature pyramids of an encoder.
However, Zeng teaches wherein the network is modeled incorporating original signals Y to different feature pyramids of an encoder (Zeng, “FIG. 10 shows an example of a machine learning model 1000 according to aspects of the present disclosure… Feature pyramid component 1025 generates a text feature pyramid based on the text prompt, the layout information, and the precision level. The text feature pyramid includes a set of text feature maps at a set of scales, respectively… Diffusion model 1030 generates an image based on the text feature pyramid, where the image includes an object corresponding to the element of the text prompt at the target region.”, paragraphs 0126-0131).
Liu teaches a wavelet-driven diffusion model that incorporates the original signal to an encoder (Liu, “Diffusion models work by iteratively adding noise to the data during a forward process and then learning to recover the data by denoising the data during a reverse process. For example, during training, diffusion model 300 may take an original image 305 in a pixel space 310 as input and apply image encoder 315 to convert original image 305 into original image features 320 in a latent space 325.”, pg. 4, paragraph 0048, lines 1-7, see Fig. 3, original image 305, encoder 315, and original image features 320). Liu further teaches embodiments in which text signal inputs are used by the diffusion model (Liu, “The denoising network 340 can also be guided based on a text prompt 360”, pg. 4, paragraph 0052, lines 1-2). Zeng teaches a multi-scale diffusion model that incorporates text signal inputs into different feature pyramids of an encoder for image generation (see above). Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the wavelet-driven diffusion model of Liu to include feature pyramids as taught by the multi-scale diffusion model of Zeng (Zeng, paragraphs 0126-0131). The motivation for doing so would have been to increase controllability in text-guided image synthesis (as suggested by Liu, “The unconventional steps of constructing a precision-encoded mask pyramid and a subsequent feature map pyramid representation to jointly encode precision level, semantics, and composition information ( e.g., layout information), embodiments of the present disclosure increase controllability in image synthesis using diffusion models.”, pg. 3, paragraph 0035, lines 12-18). Further, one skilled in the art could have combined the elements as described above by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results. Therefore, it would have been obvious to combine the teachings of Liu with Zeng to obtain the invention according to claim 9.

Claim 17 corresponds to claim 9, with the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium configured to execute the method according to claim 9. Liu in view of Zeng teaches the addition of a system comprising: a processor; a data bus coupled to the processor; a memory coupled to the data bus; and a computer-usable medium (see analysis of claim 10) configured to execute the method according to claim 9. As indicated in the analysis of claim 9, Liu in view of Zeng teaches all the limitations according to claim 9.  Therefore, claim 17 is rejected for the same reasons of obviousness as claim 9.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CONNOR LEVI HANSEN whose telephone number is (703)756-5533. The examiner can normally be reached Monday-Friday 9:00-5:00 (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CONNOR L HANSEN/Examiner, Art Unit 2672

/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672
Read full office action
Prosecution Timeline

Dec 13, 2023
Application Filed
Dec 12, 2025
Non-Final Rejection mailed — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/042,602
Patent 12633085
METHOD FOR DETERMINING THE STORAGE FUNCTIONALITY OF AN IMAGING PLATE FOR X-RAY IMAGES
3y 2m to grant Granted May 19, 2026
17/928,394
Patent 12530785
TRACKING DEVICE, TRACKING METHOD, AND RECORDING MEDIUM
3y 1m to grant Granted Jan 20, 2026
17/932,201
Patent 12524984
HISTOGRAM OF GRADIENT GENERATION
3y 4m to grant Granted Jan 13, 2026
18/152,283
Patent 12518363
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, IMAGE PROCESSING SYSTEM, AND STORAGE MEDIUM WITH PIECEWISE LINEAR FUNCTION FOR TONE CONVERSION ON IMAGE
2y 12m to grant Granted Jan 06, 2026
18/160,126
Patent 12499648
IMAGE PROCESSING APPARATUS, IMAGE CAPTURING APPARATUS, CONTROL METHOD, AND STORAGE MEDIUM FOR DETECTING SUBJECT IN CAPTURED IMAGE
2y 10m to grant Granted Dec 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+27.9%)
2y 11m (~5m remaining)
Median Time to Grant
Low
PTA Risk
Based on 32 resolved cases by this examiner. Grant probability derived from career allowance rate.
METHOD FOR IMAGE GENERATION USING WAVELET DIFFUSION SCHEME

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

METHOD FOR IMAGE GENERATION USING WAVELET DIFFUSION SCHEME

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email