Last updated: May 29, 2026
Application No. 18/434,411
STEP DISTILLATION FOR LATENT DIFFUSION MODELS

Final Rejection §102§103
Filed
Feb 06, 2024
Priority
May 26, 2023 — provisional 63/504,563
Examiner
MOTSINGER, SEAN T
Art Unit
2673
Tech Center
2600 — Communications
Assignee
Snap Inc.
OA Round
2 (Final)
Interview Optional

— +11.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 78% grant rate with +11.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 685 resolved cases, 2023–2026
Examiner Intelligence

MOTSINGER, SEAN T View full profile →
Grants 78% — above average
Career Allowance Rate
536 granted / 685 resolved
+16.2% vs TC avg
Moderate +12% lift
Without
With
+11.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
20 currently pending
Career history
709
Total Applications
across all art units
Statute-Specific Performance

§101
6.7%
-33.3% vs TC avg
§103
71.3%
+31.3% vs TC avg
§102
6.4%
-33.6% vs TC avg
§112
13.1%
-26.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 685 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicants Arguments filed on 3/3/3036 have been entered and made of record.

Applicant’s arguments, see page 7, filed 3/3/2026, with respect to the claim objections have been fully considered and are persuasive.  The objections of claims 15 and 17 has been withdrawn due to the amendments. 

Applicant’s arguments, see page 7, filed 3/3/2026, with respect to the rejections under 35 U.S.C. 112(b) have been fully considered and are persuasive.  The rejection of claim 4 has been withdrawn due to the amendments. 

Applicant's arguments filed 3/3/2026 with respect to the prior art  have been fully considered but they are not persuasive. 

Applicant argues:
Claim 18 was rejected under 35 U.S.C. § 102(a) over Meng ("On Distillation of Guided Diffusion Models"). 
Claims 1-5, 7, 9, 13-17 and 19-20 were rejected under 35 U.S.C. § 103 over Meng ("On Distillation of Guided Diffusion Models") in view of Zheng (U.S. 2024/0169500). 

 Claims 6, 8 and 10-12 were rejected under 35 U.S.C. § 103 over Meng ("On Distillation of Guided Diffusion Models") in view of Zheng (U.S. 2024/0169500) in view of Kim ("On Architectural Compression of Text-to-Image Diffusion Models"). 
Applicant respectfully traverses the rejections. In order to expedite the prosecution of this application and without agreeing with the merits of the grounds of the rejections, Applicant has amended the claims as set forth above and respectfully requests withdrawal of the rejections. 
As discussed during the interview, the cited art, alone or in combination, does not disclose "the second latent diffusion machine learning model derived from the first latent diffusion machine learning model by adding or removing a particular ResNet block to an existing layer within the second latent diffusion machine learning model, wherein each of the first denoising steps execute a first number of ResNet blocks, and each of the second denoising steps execute a second number of ResNet blocks, the second number being smaller than the first number," as recited in the amended independent claims. 
Furthermore, the dependent claims are patentable for at least similar reasons as articulated above with respect to their respective independent claims. The dependent claims are also patentable because of the additional features recited therein.

While the examiner and applicant discussed the prior art with respect to proposed amendments during the interview there was not time to review every cited reference in full. Upon further consideration the examiner detemined that Kim ("On Architectural Compression of Text-to-Image Diffusion Models") discloses the newly amended features.  While applicant asserts that Kim does not disclose the newly amended features the examiner notes that this assertion is little more than allegation that the amendment overcomes the prior art. Applicant does not particularly argue newly amended subject matter with and particular portion of Kim. The examiner believes Kim discloses the newly amended features in combination with Meng as articulated in the new grounds of rejection below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 18  is/are rejected under 35 U.S.C. 103 as being unpatentable over Meng et al “On Distillation of Guided Diffusion Models” arXiv:2210.03142 April 12 2023 in view of  Bo-Kyeong Kim et al On Architectural Compression of Text-to-Image Diffusion Models arXiv:2305.15798v1 May 25 2023..

accessing a first latent diffusion machine learning model,(see section 3.2 note that the first model corresponds to the teacher models, see section 2 last paragraph note that the models used are latent diffusion models )  the first latent diffusion machine learning model trained to perform a first number of denoising steps (see section 3.2 note that the student network is a version of the teacher model with fewer steps that the teacher see caption of figure 4 note that the steps are denoising steps see section 2 last paragraph note that the models have latent spaces). 

accessing a second latent diffusion machine learning model (see section 3.2 note that the first model corresponds to the teacher models, see section 2 last paragraph note that the models have latent spaces )  that was derived from the first latent diffusion machine learning model, the second latent diffusion machine learning model trained to perform a second number of denoising steps (see section 3.2 note that the student network is a version of the teacher model with fewer steps that the teacher see caption of figure 4 note that the steps are denoising steps.);

 generating noise data;  (see section B2 and B3 note that during training a noise element N(0,I) is used)

processing the noise  (see section B2 and B3 note that during training a noise element N(0,I) is used) data via the first latent diffusion machine learning model to generate one or more first images (see section 3.3 and algorithm 2 note during training that the output of the teacher model is differenced from that of the student,  and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9   ); 

processing the noise data (see section B2 and B3 note that during training a noise element N(0,I) is used) via the second latent diffusion machine learning model to generate one or more second images; (see section 3.3 and algorithm 2 note that the output of the teacher model is differenced from that of the student,  and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9);

and modifying a parameter of the second latent diffusion machine learning model based on a comparison of the one or more first images with the one or more second images. (see section 3.3 and algorithm 2 note that the output of the teacher model is differenced from that of the student, and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9);

Meng does not expressly disclose the second latent diffusion machine learning model derived from the first latent diffusion machine learning model by adding or removing a particular ResNet block to an existing layer within the second latent diffusion machine learning model, wherein each of the first denoising steps execute a first number of ResNet blocks, and each of the second denoising steps execute a second number of ResNet blocks, the second number being smaller than the first number. 

Kim discloses the second latent diffusion machine learning model derived from the first latent diffusion machine learning model by adding or removing a particular ResNet block to an existing layer within the second latent diffusion machine learning model, wherein each of the first denoising steps execute a first number of ResNet blocks, and each of the second denoising steps execute a second number of ResNet blocks, the second number being smaller than the first number  (see section 3.1.1  “. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks. We hypothesize the existence of some unnecessary pairs and use the following removal strategies, as shown in Figure 3”  see also section 3 “ We
reduce this per-step computation, leading to Block-removed Knowledge-distilled
SDMs “ the examiner notes that blocks are removed on a per-step basis i.e each steps will have few blocks in the reduced method); 

The motivation to combine is “improved computational efficiency and initializes the compact model with the original weights by benefiting from the shared dimensionality. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks” see section 3.1.1. One of ordinary skill in the art could have used a U-NET with a reduced number of blocks as in Kim in the method of Meng to reach the aforementioned advantage. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Kim to reach the aforementioned advantage.

Claim(s) 1-17, 19 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Meng et al “On Distillation of Guided Diffusion Models” arXiv:2210.03142 April 12 2023 in view of  Bo-Kyeong Kim et al On Architectural Compression of Text-to-Image Diffusion Models arXiv:2305.15798v1 May 25 2023. In further view of Zheng US 2024/0169500.

Re claim 1 Meng discloses . 

accessing a first latent diffusion machine learning model,(see section 3.2 note that the first model corresponds to the teacher models, see section 2 last paragraph note that the models used are latent diffusion models )  the first latent diffusion machine learning model trained to perform a first number of denoising steps (see section 3.2 note that the student network is a version of the teacher model with fewer steps that the teacher see caption of figure 4 note that the steps are denoising steps see section 2 last paragraph note that the models have latent spaces). 

accessing a second latent diffusion machine learning model (see section 3.2 note that the first model corresponds to the teacher models, see section 2 last paragraph note that the models have latent spaces )  that was derived from the first latent diffusion machine learning model, the second latent diffusion machine learning model trained to perform a second number of denoising steps (see section 3.2 note that the student network is a version of the teacher model with fewer steps that the teacher see caption of figure 4 note that the steps are denoising steps.);

 generating noise data; (see section B2 and B3 note that during training a noise element N(0,I) is used)

processing the noise  (see section B2 and B3 note that during training a noise element N(0,I) is used) data via the first latent diffusion machine learning model to generate one or more first images (see section 3.3 and algorithm 2 note during training that the output of the teacher model is differenced from that of the student,  and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9   ); 

processing the noise data (see section B2 and B3 note that during training a noise element N(0,I) is used) via the second latent diffusion machine learning model to generate one or more second images; (see section 3.3 and algorithm 2 note that the output of the teacher model is differenced from that of the student,  and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9   );

and modifying a parameter of the second latent diffusion machine learning model based on a comparison of the one or more first images with the one or more second images. (see section 3.3 and algorithm 2 note that the output of the teacher model is differenced from that of the student, and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9);

Meng does not expressly disclose the second latent diffusion machine learning model derived from the first latent diffusion machine learning model by adding or removing a particular ResNet block to an existing layer within the second latent diffusion machine learning model, wherein each of the first denoising steps execute a first number of ResNet blocks, and each of the second denoising steps execute a second number of ResNet blocks, the second number being smaller than the first number. 

Kim discloses the second latent diffusion machine learning model derived from the first latent diffusion machine learning model by adding or removing a particular ResNet block to an existing layer within the second latent diffusion machine learning model, wherein each of the first denoising steps execute a first number of ResNet blocks, and each of the second denoising steps execute a second number of ResNet blocks, the second number being smaller than the first number  (see section 3.1.1  “. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks. We hypothesize the existence of some unnecessary pairs and use the following removal strategies, as shown in Figure 3”  see also section 3 “ We
reduce this per-step computation, leading to Block-removed Knowledge-distilled
SDMs “ the examiner notes that blocks are removed on a per-step basis i.e each steps will have few blocks in the reduced method); 

The motivation to combine is “improved computational efficiency and initializes the compact model with the original weights by benefiting from the shared dimensionality. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks” see section 3.1.1. One of ordinary skill in the art could have used a U-NET with a reduced number of blocks as in Kim in the method of Meng to reach the aforementioned advantage. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Kim to reach the aforementioned advantage.

While the examiner believes it is clearly implicit that Meng is intended to operate on a computer. Meng does not expressly disclose a system comprising: at least one processor; and at least one memory component storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations. Zheng discloses A system comprising: at least one processor; and at least one memory component storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations (see paragraph 57 and 58). The motivation to combine is to implement the invention via a general-purpose computer see paragraph 57. Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Meng Kim and Zheng to reach the aforementioned advantage. 

Re claim 2 Meng discloses wherein the operations further comprise: 

restructuring the first latent diffusion machine learning model to perform a third number of denoising steps, the first number of denoising steps being larger than the third number of denoising steps; (see section 3.3 note that after a first distillation step with 2N steps at the teacher, in the next step a Network with N steps is used as the teacher)

and restructuring the second latent diffusion machine learning model to perform a fourth number of denoising steps, the second number of denoising steps being larger than the fourth number of denoising steps, (see section 3.3 note that after a first distillation step with N steps at the student, in the next step a Network with N/2 steps is used as the student)

wherein processing the noise data via the first latent diffusion machine learning model comprises processing the noise data via the restructured first latent diffusion machine learning model (see section 3.2 note that training processing is performed again using the new teacher to further distill the network), (see section B2 and B3 note that during training a noise element N(0,I) is used).

and wherein processing the noise data via the second latent diffusion machine learning model comprises processing the noise data via the restructured second latent diffusion machine learning model. (see section 3.2 note that training processing is performed again using the new student to further distill the network), (see section B2 and B3 note that during training a noise element N(0,I) is used).

Re claim 3 Meng further discloses wherein the third number of denoising steps is half the first number of denoising steps (see section 3.2 note that the third number is N and the first number is 2N)

Re claim 4 Meng further discloses wherein the second number of denoising steps is double  the fourth number of denoising steps (see section 3.2 note that the fourth number is N/2 and the second number is N).

Re claim 5 Meng further discloses wherein the fourth number of denoising steps is half the third number of denoising steps. (See section 3.2 note that the third number is N and the fourth number is N/2).

Re claim 6  Meng further discloses wherein the first and second latent diffusion machine learning models are stable diffusion models (see abstract note that the latent diffusion models are “stable diffusion models” )

Meng and Zheng do not clearly disclose including cross-attention blocks and ResNet blocks. Kim discloses diffusion network including cross-attention blocks and ResNet blocks (see figure 3 note that Kim uses using several residual blocks and cross attention blocks). The examiner notes that Meng discloses using a U-Net architecture (see section 3.1 last paragraph). Kim discloses a similar U-net architecture. The motivation to combine is “improved computational efficiency and initializes the compact model with the original weights by benefiting from the shared dimensionality. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks” see section 3.1.1. One of ordinary skill in the art could have used a U-NET with a reduced number of blocks as in Kim in the method of Meng to reach the aforementioned advantage. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Zheng and Kim to reach the aforementioned advantage.

Re claim 7 Meng further discloses  wherein the second latent diffusion machine learning model is derived from the first latent diffusion machine learning model by restructuring a UNet architecture of the first latent diffusion machine learning model  ( see section 3.1 last paragraph “The model architecture we use is a U-Net model similar to the ones used in [6] for pixel-space diffusion models and [1, 28] for latent-space diffusion models,” See section 3.1 and 3.2 note that the model is restructured using a teacher student model )

Re claim 8 Meng and Zheng do not expressly disclose wherein restructuring of the UNet architecture includes changing the architecture of cross attention. Kim further disclose wherein restructuring of the UNet architecture includes changing the architecture of cross attention and ResNet blocks (see section 3.1.1  “. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks. We hypothesize the existence of some unnecessary pairs and use the following removal strategies, as shown in Figure 3” ).

The motivation to combine is “improved computational efficiency and initializes the compact model with the original weights by benefiting from the shared dimensionality. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks” see section 3.1.1. One of ordinary skill in the art could have used a U-NET with a reduced number of blocks as in Kim in the method of Meng to reach the aforementioned advantage. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Zheng and Kim to reach the aforementioned advantage.

Re claim 9 Meng further discloses wherein the first latent diffusion machine learning model includes a denoising architecture, and wherein the first number of denoising steps include a number of iterations for transmitting the output of a prior iteration as input to a current iteration of the denoising architecture (see section 2 second paragraph equation 2 note that the output of the pervious step  is used to calculate the next portion for N number of sampling steps corresponding to the number of iterations).

Re claim 10 Meng discloses wherein processing the noise data via the first latent diffusion machine learning model to generate one or more first images includes iteratively processing the noise data via the denoising architecture for the first number of denoising step(see section 3.2 note that the student network is a version of the teacher model with fewer steps that the teacher see caption of figure 4 note that the steps are denoising steps see section 2 last paragraph note that the models have latent spaces)  generate the one or more first images (see section 3.3 The examiner further notes that the output of the networks are images see figure 4 8 and 9   );

Meng and Zheng does not expressly disclose processing the noise data via the denoising architecture for the first number of denoising steps to generate first latent features, and processing the generated first latent features via a decoder of the first latent diffusion machine learning model to generate the one or more first images.

Kim discloses processing the noise data via the denoising architecture for the first number of denoising steps to generate first latent features (see section 3 first paragraph “Conditioned on the text and time-step embeddings, the U-Net performs multiple denoising steps on latent representations. At each denoising step, the U-Net produces the noise residual to compute the latent for the next step (see the top part of Figure 3).” Note that denoising steps are performed for latent representations), and processing the generated first latent features via a decoder of the first latent diffusion machine learning model to generate the one or more first images ( see section 1 second paragraph  “Within a SDM, a U-Net [50, 6] conducts an iterative sampling procedure to gradually eliminate noise from random latents and is assisted by a text encoder [43] and an image decoder [9, 64] to produce text-aligned images” note that the latent data  are processed by an image  decoder to produce the image).

The motivation to combine is “improved computational efficiency and initializes the compact model with the original weights by benefiting from the shared dimensionality. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks” see section 3.1.1. One of ordinary skill in the art could have used a U-NET with a reduced number of blocks and  encoding and  decoding method as described in Kim in the method of Meng to produce the image to reach the aforementioned advantage. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Zheng and Kim to reach the aforementioned advantage.

Re claim 11 Meng discloses wherein processing the noise data via the second latent diffusion machine learning model to generate one or more second images includes iteratively processing the noise data via the denoising architecture for a third number of denoising steps to generate the one or more first images (see section 3.2 note that the student network is a version of the teacher model with fewer steps that the teacher see caption of figure 4 note that the steps are denoising steps see section 2 last paragraph note that the models have latent spaces, note that multiple iterations of this are performed)  generate the one or more first images (see section 3.3 The examiner further notes that the output of the networks are images see figure 4 8 and 9   );

Meng does not expressly disclose processing the noise data via the denoising architecture to generate second latent features, and processing the generated second latent features via the decoder of the second latent diffusion machine learning model to generate the one or more first images.

Kim discloses processing the noise data via the denoising architecture to generate second latent features (see section 3 first paragraph “Conditioned on the text and time-step embeddings, the U-Net performs multiple denoising steps on latent representations. At each denoising step, the U-Net produces the noise residual to compute the latent for the next step (see the top part of Figure 3), and processing the generated second latent features via the decoder of the second latent diffusion machine learning model to generate the one or more first images. ( see section 1 second paragraph  “Within a SDM, a U-Net [50, 6] conducts an iterative sampling procedure to gradually eliminate noise from random latents and is assisted by a text encoder [43] and an image decoder [9, 64] to produce text-aligned images” note that the latent data  are processed by an image  decoder  to produce the image).

The motivation to combine is “improved computational efficiency and initializes the compact model with the original weights by benefiting from the shared dimensionality. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks” see section 3.1.1 One of ordinary skill in the art could have used a U-NET with a reduced number of blocks and  encoding and  decoding method as described in Kim in the method of Meng to produce the image to reach the aforementioned advantage. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Zheng and Kim to reach the aforementioned advantage.

Re claim 12 Kim further discloses wherein the decoder of the second latent diffusion machine learning model is the same as the decoder of the first latent diffusion machine learning model (see section 1 second paragraph  “Within a SDM, a U-Net [50, 6] conducts an iterative sampling procedure to gradually eliminate noise from random latents and is assisted by a text encoder [43] and an image decoder [9, 64] to produce text-aligned images” note that the latent data  are processed by an image  decoder  to produce the image). Further in section 3 first paragraph and figure 3 first paragraph, the examiner notes that the in the teacher student model only the U-net is distilled reduced which is not part of the decoder, the decoder network would be the same for the student and the teacher. See also section 1 last paragraph “We compress SDMs by removing architectural blocks from the U-Net, achieving up to 51% reduction in model size and 43% improvement in latency on CPU and GPU. We also introduce an interesting finding on the minor role of innermost blocks.”

Re claim 13 Meng discloses wherein the comparison of the one or more first images with the one or more second images is based on a mean squared error between the images (see algorithm 2 section be note that the loss function includes a mean square error between the output of  teacher and student model  see also section 3.2 note that the student is trained to match the output of the teacher see also section 2 first paragraph note that the notation in a loss function  ∥* ∥ 2 2 corresponds to the means square error).

Re claim 14 Meng discloses wherein the parameter is changed based on the value of the mean squared error. (see algorithm 2 section be note that the loss function includes a mean square error between the output of teacher and student model see also section 3.2 note that the student is trained to match the output of the teacher see also section 2 first paragraph note that the notation in a loss function ∥* ∥ 2 2 corresponds to the means square error). 

Re claim 15 Meng and Kim does not expressly disclose wherein processing the noise data via the first latent diffusion machine learning model includes adding random noise and an output image generated during a previous iteration to the first latent diffusion machine learning model causing the generation of the output image of the current iteration.  Zheng further discloses herein processing the noise data via the first latent diffusion machine learning model includes adding random noise and an output image generated during a previous iteration to the first latent diffusion machine learning model causing the generation of the output image of the current iteration (see paragraph 25 “In some examples, a diffusion model takes a noisy image (x.sub.t) and predicts the noise corresponding to the noisy image. During reverse diffusion, the diffusion model denoises the noisy image at each step and generates a less noisy image (x.sub.t−1). Instead of predicting the random noise, embodiments of the present disclosure predict an estimated clean image or x.sub.0 and then add noise back to the estimated clean image to obtain a noisy output image. Then the diffusion model generates x.sub.t−1 based on the noisy output image. At the next iteration, the diffusion model takes noisy image x.sub.t−1 as input, predicts a new estimated clean image, and repeats the same operation for the subsequent iterations. The new estimated clean image is better than the estimated clean image previously generated (e.g., the new estimated clean image has higher image quality and is less blurry). Accordingly, at each denoising step, the diffusion model is configured to predict an estimated clean image and adds noise back to the estimated clean image.”) The motivation to combine to improve the quality of the image see paragraph 25. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Kim and Zheng to reach the aforementioned advantage. 

Re claim 16 Meng discloses receiving a prompt for image generated from a user; and processing the prompt via the second latent diffusion machine learning model with the modified parameter to generate one or more user-requested images (see for example figure 13 and associated caption note that the output images are generated from the distilled model (corresponding to the trained second diffusion model) from a text prompt to generate the requested images).

Re claim 17 Meng further discloses wherein the output of the second latent diffusion machine learning model is processed through a decoder to generate the output image of the current iteration. (See for example section 4.3 note that that the diffusion model may contain an encoder or decoder.)

Re claim 19 Meng and Kim does not expressly disclose wherein processing the noise data via the first latent diffusion machine learning model includes adding random noise and an output image generated during a previous iteration to the first latent diffusion machine learning model causing the generation of the output image of the current iteration.  Zheng further discloses herein processing the noise data via the first latent diffusion machine learning model includes adding random noise and an output image generated during a previous iteration to the first latent diffusion machine learning model causing the generation of the output image of the current iteration (see paragraph 25 “In some examples, a diffusion model takes a noisy image (x.sub.t) and predicts the noise corresponding to the noisy image. During reverse diffusion, the diffusion model denoises the noisy image at each step and generates a less noisy image (x.sub.t−1). Instead of predicting the random noise, embodiments of the present disclosure predict an estimated clean image or x.sub.0 and then add noise back to the estimated clean image to obtain a noisy output image. Then the diffusion model generates x.sub.t−1 based on the noisy output image. At the next iteration, the diffusion model takes noisy image x.sub.t−1 as input, predicts a new estimated clean image, and repeats the same operation for the subsequent iterations. The new estimated clean image is better than the estimated clean image previously generated (e.g., the new estimated clean image has higher image quality and is less blurry). Accordingly, at each denoising step, the diffusion model is configured to predict an estimated clean image and adds noise back to the estimated clean image.”) The motivation to combine to improve the quality of the image see paragraph 25. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Kim and Zheng to reach the aforementioned advantage. 

Re claim 20 Meng discloses

accessing a first latent diffusion machine learning model,(see section 3.2 note that the first model corresponds to the teacher models, see section 2 last paragraph note that the models used are latent diffusion models )  the first latent diffusion machine learning model trained to perform a first number of denoising steps (see section 3.2 note that the student network is a version of the teacher model with fewer steps that the teacher see caption of figure 4 note that the steps are denoising steps see section 2 last paragraph note that the models have latent spaces). 

accessing a second latent diffusion machine learning model (see section 3.2 note that the first model corresponds to the teacher models, see section 2 last paragraph note that the models have latent spaces )  that was derived from the first latent diffusion machine learning model, the second latent diffusion machine learning model trained to perform a second number of denoising steps (see section 3.2 note that the student network is a version of the teacher model with fewer steps that the teacher see caption of figure 4 note that the steps are denoising steps.);

 generating noise data; (see section B2 and B3 note that during training a noise element N(0,I) is used)

processing the noise  (see section B2 and B3 note that during training a noise element N(0,I) is used) data via the first latent diffusion machine learning model to generate one or more first images (see section 3.3 and algorithm 2 note during training that the output of the teacher model is differenced from that of the student,  and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9); 

processing the noise data (see section B2 and B3 note that during training a noise element N(0,I) is used) via the second latent diffusion machine learning model to generate one or more second images; (see section 3.3 and algorithm 2 note that the output of the teacher model is differenced from that of the student,  and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9);

and modifying a parameter of the second latent diffusion machine learning model based on a comparison of the one or more first images with the one or more second images. (see section 3.3 and algorithm 2 note that the output of the teacher model is differenced from that of the student, and the network trained such that the outputs match, The examiner further notes that the output of the networks are images see figure 4 8 and 9);

Meng does not expressly disclose the second latent diffusion machine learning model derived from the first latent diffusion machine learning model by adding or removing a particular ResNet block to an existing layer within the second latent diffusion machine learning model, wherein each of the first denoising steps execute a first number of ResNet blocks, and each of the second denoising steps execute a second number of ResNet blocks, the second number being smaller than the first number. 

Kim discloses the second latent diffusion machine learning model derived from the first latent diffusion machine learning model by adding or removing a particular ResNet block to an existing layer within the second latent diffusion machine learning model, wherein each of the first denoising steps execute a first number of ResNet blocks, and each of the second denoising steps execute a second number of ResNet blocks, the second number being smaller than the first number  (see section 3.1.1  “. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks. We hypothesize the existence of some unnecessary pairs and use the following removal strategies, as shown in Figure 3”  see also section 3 “ We
reduce this per-step computation, leading to Block-removed Knowledge-distilled
SDMs “ the examiner notes that blocks are removed on a per-step basis i.e each steps will have few blocks in the reduced method); 

The motivation to combine is “improved computational efficiency and initializes the compact model with the original weights by benefiting from the shared dimensionality. In the original U-Net, each stage with a common spatial size consists of multiple blocks, and most stages contain pairs of residual (R) [12] and cross-attention (A) [65, 20] blocks” see section 3.1.1. One of ordinary skill in the art could have used a U-NET with a reduced number of blocks as in Kim in the method of Meng to reach the aforementioned advantage. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Meng Kim to reach the aforementioned advantage.

While the examiner believes it is clearly implicit that Meng is intended to operate on a computer. Meng and Kim does not expressly disclose A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to. Zheng discloses A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to (see paragraph 57 and 58). The motivation to combine is to implement the invention via a general-purpose computer see paragraph 57. Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Meng Kim and Zheng to reach the aforementioned advantage. 

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T MOTSINGER whose telephone number is (571)270-1237. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chineyere Wills-Burns can be reached at (571) 272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SEAN T MOTSINGER/Primary Examiner, Art Unit 2673
Read full office action
Prosecution Timeline

Feb 06, 2024
Application Filed
Dec 18, 2025
Non-Final Rejection mailed — §102, §103
Feb 11, 2026
Applicant Interview (Telephonic)
Feb 18, 2026
Examiner Interview Summary
Mar 03, 2026
Response Filed
Mar 25, 2026
Final Rejection mailed — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/470,959
Patent 12639970
WORK ASSISTANCE SYSTEM AND WORK ASSISTANCE METHOD
2y 8m to grant Granted May 26, 2026
18/494,942
Patent 12639857
IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
2y 7m to grant Granted May 26, 2026
18/376,891
Patent 12626336
APPARATUS AND METHOD FOR IMPROVING IMAGE QUALITY USING JAMES-STEIN COMBINER
2y 7m to grant Granted May 12, 2026
18/217,180
Patent 12592001
EXCREMENT DETERMINATION METHOD, EXCREMENT DETERMINATION DEVICE, AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM
2y 9m to grant Granted Mar 31, 2026
18/172,191
Patent 12586198
IMAGE ANALYSIS FOR IDENTIFYING OBJECTS AND CLASSIFYING BACKGROUND EXCLUSIONS
3y 1m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
90%
With Interview (+11.7%)
2y 11m (~7m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 685 resolved cases by this examiner. Grant probability derived from career allowance rate.