Prosecution Insights
Last updated: April 19, 2026
Application No. 18/492,572

HARDWARE-AWARE EFFICIENT ARCHITECTURES FOR TEXT-TO-IMAGE DIFFUSION MODELS

Final Rejection §103
Filed
Oct 23, 2023
Examiner
KY, KEVIN
Art Unit
2671
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
2 (Final)
76%
Grant Probability
Favorable
3-4
OA Rounds
2y 6m
To Grant
99%
With Interview

Examiner Intelligence

Grants 76% — above average
76%
Career Allow Rate
420 granted / 549 resolved
+14.5% vs TC avg
Strong +25% interview lift
Without
With
+25.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
33 currently pending
Career history
582
Total Applications
across all art units

Statute-Specific Performance

§101
17.6%
-22.4% vs TC avg
§103
46.5%
+6.5% vs TC avg
§102
20.8%
-19.2% vs TC avg
§112
9.9%
-30.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 549 resolved cases

Office Action

§103
DETAILED ACTION Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked. As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph: (A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; (B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. Claim limitations in this application that use the word “means” (or “step”) (e.g. claims 19-24) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Referring to the specifications as filed, the apparatus in claims 19-24 corresponds to Fig. 1 a system-on-a-chip (SOC) 100, which may include a central processing unit (CPU) 102 or a multi-core CPU configured for text-to-image diffusion models. ¶30 further discloses “the general-purpose processor 102 may include means for receiving, means for generating, and means for training”. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-4, 4-10, 13-16, and 19-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Park et al (US 20240281924) in view of Korviakov et al (US 20230394285), in further view of Karpman et al (US Patent 11995803 B1). Regarding claim 1, Park disclose an apparatus (Fig. 5 apparatus 500), comprising: at least one memory (¶32 One or more embodiments of the apparatus include at least one processor; at least one memory storing instructions executable by the at least one processor); and at least one processor coupled to the at least one memory, the at least one processor configured to (col ¶32 One or more embodiments of the apparatus include at least one processor; at least one memory storing instructions executable by the at least one processor): receive a text-semantic input (Fig. 10 text prompt 1005) at a first stage of a neural network, the first stage including a first convolutional block (¶125 As an example shown in FIG. 10, the layer at 16-pixel includes 5 blocks of the interleaved attention and convolutional layers. Here, 16-pixel means 16-by-16 pixels. The layer at 32-pixel includes 5 blocks of the interleaved attention and convolutional layers. Similarly, layers at 64-pixel, 128-pixel, and 256-pixel include 5 blocks of the interleaved attention and convolutional layers) and no attention layers; receive, at a second stage, the first output from the first stage, the second stage comprising a first down sampling block including a first attention layer and a second convolutional block (¶125 As an example shown in FIG. 10, the layer at 16-pixel includes 5 blocks of the interleaved attention and convolutional layers. Here, 16-pixel means 16-by-16 pixels. The layer at 32-pixel includes 5 blocks of the interleaved attention and convolutional layers. Similarly, layers at 64-pixel, 128-pixel, and 256-pixel include 5 blocks of the interleaved attention and convolutional layers; image generation network 1045 includes downsampling residual blocks and then upsampling residual blocks, where a layer of the downsampling residual blocks is connected to a layer of the upsampling residual blocks by a skip connection in a U-net architecture.); receive, at a third stage, a second output from the second stage, the third stage comprising a first up sampling block including a second attention layer and a first set of convolutional blocks (As an example shown in FIG. 10, the layer at 16-pixel includes 5 blocks of the interleaved attention and convolutional layers. Here, 16-pixel means 16-by-16 pixels. The layer at 32-pixel includes 5 blocks of the interleaved attention and convolutional layers. Similarly, layers at 64-pixel, 128-pixel, and 256-pixel include 5 blocks of the interleaved attention and convolutional layers; ¶125 image generation network 1045 includes downsampling residual blocks and then upsampling residual blocks, where a layer of the downsampling residual blocks is connected to a layer of the upsampling residual blocks by a skip connection in a U-net architecture.); receive, at a fourth stage, the first output from the first stage and a third output from the third stage, the fourth stage comprising a second up sampling block including no attention layers and a second set of convolutional blocks (¶125 In some cases, skip connections 1050 in the asymmetric U-Net architecture exist between layers at the same resolution. For example, image generation network 1045 includes downsampling residual blocks and then upsampling residual blocks, where a layer of the downsampling residual blocks is connected to a layer of the upsampling residual blocks by a skip connection in a U-net architecture); and generate an image at the fourth stage, based on the text-semantic input (¶124 where the input low-resolution image 1015 (64-pixel image) passes through 3 downsampling residual blocks and then 6 upsampling residual blocks with attention layers to generate the high-resolution image 1055 (512-pixel image)). Park fails to specifically teach where Korviakov teaches generate a first output comprising a feature map at the first stage (¶17 CNN is a deep learning neural network, wherein one or more building blocks are based on a convolution operation; ¶18 The input data may be related to any kind of data, for example, image data, text data, voice data, etc.; ¶19 the device may perform a convolution operation, which may be, for example, an operation that transforms input feature maps having the first number of channels into output feature maps); and fails to specifically teach where Karpman teaches the first stage including a first convolutional block and no attention layers (col 5 lines 15-36 & 45-50 the base image diffusion model 120 defines a deep learning network (e.g., a convolutional neural network, a residual neural network, etc.) configured (e.g., through the training described) to generate images from random (e.g., Gaussian) noise based on text prompts and/or descriptions. The base image diffusion model 120 can include a U-net architecture (e.g., Efficient U-Net) defined from residual and multi-head attention blocks that enable the base image diffusion model 120 to progressively denoise (e.g., infill, generate, augment) image data according to cross-attention inputs based on the text prompt; self-attention layers in the base diffusion model architecture can be omitted to improve memory efficiency and inference time), and the fourth stage comprising a second up sampling block including no attention layers (col 5 lines 15-36 & 45-50 system can then pass the base image to the set of high-resolution diffusion models 116 for upsampling and output; self-attention layers in the base diffusion model architecture can be omitted to improve memory efficiency and inference time). Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of generate a first output comprising a feature map at the first stage from Korviakov, and the teaching of the first stage including a first convolutional block and no attention layers and the fourth stage comprising a second up sampling block including no attention layers from Karpman into the method as disclosed by Park. The motivation for doing this is to improve training neural networks to perform tasks and further to improve memory efficiency and inference time. Regarding claim 2, the combination of Park, Korviakov and Karpman disclose the apparatus of claim 1, in which the neural network comprises a text-to-image diffusion-based generative model (Karpman col. 2 lines 51-55 Text-to-image diffusion model 112 may be a probabilistic generative model used to generate image data). The motivation to combine the references is discussed above in the rejection for claim 1. Regarding claim 3, the combination of Park, Korviakov and Karpman disclose the apparatus of claim 1, in which the neural network comprises a UNet (Park ¶124 As an example shown in FIG. 10, image generation network 1045 is rearranged to an asymmetric U-Net architecture). Regarding claim 4, the combination of Park, Korviakov and Karpman disclose the apparatus of claim 1, in which the first stage comprises a first additional convolutional block, the second stage comprises a second additional convolutional block, the third stage comprises a third additional convolutional block, and the fourth stage comprises a fourth additional convolutional block (Park ¶125 As an example shown in FIG. 10, the layer at 16-pixel includes 5 blocks of the interleaved attention and convolutional layers. Here, 16-pixel means 16-by-16 pixels. The layer at 32-pixel includes 5 blocks of the interleaved attention and convolutional layers. Similarly, layers at 64-pixel, 128-pixel, and 256-pixel include 5 blocks of the interleaved attention and convolutional layers; image generation network 1045 includes downsampling residual blocks and then upsampling residual blocks, where a layer of the downsampling residual blocks is connected to a layer of the upsampling residual blocks by a skip connection in a U-net architecture.). Regarding claim(s) 7-10 (drawn to a method): The rejection/proposed combination of Park, Korviakov and Karpman, explained in the rejection of apparatus claim(s) 1-4, anticipates/renders obvious the steps of the method of claim(s) 7-10 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 1-4 is/are equally applicable to claim(s) 7-10. Regarding claim(s) 13-16 (drawn to a CRM): The rejection/proposed combination of Park, Korviakov and Karpman, explained in the rejection of apparatus claim(s) 1-4, anticipates/renders obvious the steps of the computer readable medium of claim(s) 13-16 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 1-4 is/are equally applicable to claim(s) 13-16. See Park ¶81-83. Regarding claim(s) 19-22 (drawn to an apparatus): The rejection/proposed combination of Park, Korviakov and Karpman, explained in the rejection of apparatus claim(s) 1-4, anticipates/renders obvious the steps of the system of claim(s) 19-22 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 1-4 is/are equally applicable to claim(s) 19-22. See Park ¶81-83. Claim(s) 5, 11, 17, and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Park, Korviakov and Karpman as applied to claim 4, 10, 16 and 22 above, and further in view of Guo et al (US 20230351185). Regarding claim 5, the combination of Park, Korviakov and Karpman disclose the apparatus of claim 4, but fails to teach where Guo teaches in which the at least one processor is further configured to: train the neural network to obtain a converged neural network (¶44 Referring to FIG. 2, in response to each pruning algorithm pruning the neural network, the processor 130 retrains the pruned neural network (step S220). Specifically, after each pruning, the processor 130 may retrain the pruned neural network. When the neural network (model) converges, the processor 130 may use another pruning algorithm to prune the pruned neural network); and train a pruned neural network based on the converged neural network (¶44 Specifically, after each pruning, the processor 130 may retrain the pruned neural network. When the neural network (model) converges, the processor 130 may use another pruning algorithm to prune the pruned neural network. For example, the neural network is retrained after channel pruning, and when the neural network converges, weight pruning is then performed). Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of in which the at least one processor is further configured to: train the neural network to obtain a converged neural network, and train a pruned neural network based on the converged neural network from Guo into the apparatus as disclosed by the combination of Park, Korviakov and Karpman. The motivation for doing this is to improve techniques for optimizing neural networks. Regarding claim(s) 11 (drawn to a method): The rejection/proposed combination of Park, Korviakov, Karpman, and Guo explained in the rejection of apparatus claim(s) 5, anticipates/renders obvious the steps of the method of claim(s) 11 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 5 is/are equally applicable to claim(s) 11. Regarding claim(s) 17 (drawn to a CRM): The rejection/proposed combination of Park, Korviakov, Karpman and Guo, explained in the rejection of apparatus claim(s) 5, anticipates/renders obvious the steps of the computer readable medium of claim(s) 17 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 5 is/are equally applicable to claim(s) 17. See Park ¶81-83. Regarding claim(s) 23 (drawn to an apparatus): The rejection/proposed combination of Park, Korviakov, Karpman, and Guo, explained in the rejection of apparatus claim(s) 5, anticipates/renders obvious the steps of the system of claim(s) 23 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 5 is/are equally applicable to claim(s) 23. See Park ¶81-83. Claim(s) 6, 12, 18, and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Park, Korviakov, Karpman, and Guo as applied to claim 4 above, and further in view of Fukuda et al (20200034702). Regarding claim 6, the combination of Park, Korviakov, Karpman, and Guo disclose the apparatus of claim 5, in which the converged neural network comprises a teacher neural network (Guo Fig. 7 e.g. trained neural network) and the pruned neural network comprises a student neural network (Guo Fig. 7 e.g. pruned neural network), but fail to teach where Fukuda teaches, the at least one processor is further configured to train the student neural network based on a block-wise error calculation for each stage of the student neural network relative to a same stage of the teacher neural network (¶56 At block 340, a student training section may train a student neural network with a teacher input data and the corresponding soft label output obtained at the most recent iteration of block 330. For example, in the embodiment of FIG. 4, at the first iteration, the student training section may train the student neural network, at block 340, with Input Data 1 and a soft label output that the Teacher NN1 has output in response to receiving Input Data 1. In an embodiment, the student training section, at block 340, may train the student neural network such that soft label errors between (1) a soft label output generated by the student neural network in response to receiving the teacher input data (e.g., Input Data 1) and (2) the soft label output generated by the selected teacher neural network (e.g., Teacher NN1) in response to receiving the same teacher input data, is minimized). Therefore, it would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to have implemented the teaching of the at least one processor is further configured to train the student neural network based on a block-wise error calculation for each stage of the student neural network relative to a same stage of the teacher neural network from Fukuda into the apparatus as disclosed by the combination of Park, Korviakov, Karpman, and Guo. The motivation for doing this is to improve training a student neural network with a teacher neural network. Regarding claim(s) 12 (drawn to a method): The rejection/proposed combination of Park, Korviakov, Karpman, Guo, and Fukuda explained in the rejection of apparatus claim(s) 6, anticipates/renders obvious the steps of the method of claim(s) 12 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 6 is/are equally applicable to claim(s) 12. Regarding claim(s) 18 (drawn to a CRM): The rejection/proposed combination of Park, Korviakov, Karpman, Guo, and Fukuda explained in the rejection of apparatus claim(s) 6, anticipates/renders obvious the steps of the computer readable medium of claim(s) 18 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 6 is/are equally applicable to claim(s) 18. See Park ¶81-83. Regarding claim(s) 24 (drawn to an apparatus): The rejection/proposed combination of Park, Korviakov, Karpman, Guo, and Fukuda, explained in the rejection of apparatus claim(s) 6, anticipates/renders obvious the steps of the system of claim(s) 24 because these steps occur in the operation of the proposed combination as discussed above. Thus, the arguments similar to that presented above for claim(s) 6 is/are equally applicable to claim(s) 24. See Park ¶81-83. Response to Arguments Applicant’s arguments with respect to claim(s) 1-24 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEVIN KY whose telephone number is (571)272-7648. The examiner can normally be reached Monday-Friday 9-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at 571-272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /KEVIN KY/Primary Examiner, Art Unit 2671
Read full office action

Prosecution Timeline

Oct 23, 2023
Application Filed
Oct 15, 2025
Non-Final Rejection — §103
Dec 10, 2025
Examiner Interview Summary
Dec 10, 2025
Applicant Interview (Telephonic)
Dec 11, 2025
Response Filed
Mar 09, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597158
POSE ESTIMATION
2y 5m to grant Granted Apr 07, 2026
Patent 12597291
IMAGE ANALYSIS FOR PERSONAL INTERACTION
2y 5m to grant Granted Apr 07, 2026
Patent 12586393
KNOWLEDGE-DRIVEN SCENE PRIORS FOR SEMANTIC AUDIO-VISUAL EMBODIED NAVIGATION
2y 5m to grant Granted Mar 24, 2026
Patent 12586559
METHOD AND APPARATUS FOR GENERATING SPEECH OUTPUTS IN A VEHICLE
2y 5m to grant Granted Mar 24, 2026
Patent 12579382
NATURAL LANGUAGE GENERATION USING KNOWLEDGE GRAPH INCORPORATING TEXTUAL SUMMARIES
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+25.3%)
2y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 549 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month