Last updated: April 19, 2026
Application No. 18/735,050
PROCESSOR-AWARE OPTIMIZATIONS FOR ON-DEVICE ACCELERATION OF DIFFUSION MODELS

Non-Final OA §103
Filed
Jun 05, 2024
Examiner
DHARIA, PRABODH M
Art Unit
2629
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
1 (Non-Final)
Interview Optional

— +6.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1257 resolved cases, 2023–2026
Examiner Intelligence

DHARIA, PRABODH M View full profile →
Grants 86% — above average
Career Allow Rate
1075 granted / 1257 resolved
+23.5% vs TC avg
Moderate +6% lift
Without
With
+6.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
11 currently pending
Career history
1268
Total Applications
across all art units
Statute-Specific Performance

§101
1.2%
-38.8% vs TC avg
§103
61.6%
+21.6% vs TC avg
§102
13.6%
-26.4% vs TC avg
§112
8.7%
-31.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1257 resolved cases
Office Action

§103
Detail Office Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner cites particular columns and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.

Status: Please all the replies and correspondence should be addressed to Examiner’s art unit 2629. Receipt is acknowledged of papers submitted on 06-05-2024 under new application; which have been placed of record in the file. Claims 1-25 are pending.

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. Applicant has complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. . 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c).


Information Disclosure Statement
The information disclosure statement (IDS) submitted on  05-23-2025 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-5   and 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over WANG JONG et al. (CN 116007937 A);  hereinafter referenced as WANG et al.  in view of TAGHIZADEH; Mohammad (US-20220150661-A1) hereinafter referenced as TAGHIZADEH  et al. and MRINI; KHALIL  et al. (US-20210279414-A1)  hereinafter referenced as MRINI et al. (Further, foreign document with English translation are provided in combination. Please Note that paragraph references refer to English translation and figure references refer to figures provided in foreign document).

	Regarding Claim 1, WANG et al.  discloses a method performed by one or more computers  (para. 38), comprising: identifying a data item (para. 7, discloses data items are identified with fault category); processing the data item with a denoising neural network to generate a denoised version of the data item (paras. 8, disclosing masking (denoising) the identified data with a denoising neural network to generate a denoised version of the data), the denoising neural network defining a self-attention mechanism, wherein generating the denoised version of the data item comprises invoking the self-attention mechanism to process a set of attention inputs to generate an attention output (please see para. 8) by: obtaining a query matrix Q that contains elements representing a set of queries q, a key matrix K that contains elements representing a set of keys k, and a value matrix V that contains elements representing a set of values v corresponding to the set of keys k (para. 91); generating function of an attention matrix A without reciting an attention matrix A, by calculating a product of the query matrix Q and the key matrix K (para. 86-87 and 91); executing a first, single compiled program module that calculates, for each row in the attention matrix A (para. 86-87 pleases notice X- represents row), a respective maximum value L among the elements in the row (pleases see para. 86-91 where l  is the value L)  and a modified exponential sum S for the elements in the row, wherein the respective maximum values L and the respective modified exponential sums S  stored in a reduced matrix R  (please see paras. 85-91 where in equation  are sup represents exponential value and a modified exponential sum S for the elements in the row, wherein the respective maximum values L and the respective modified exponential sums S, please notice   all the mathematical operation are done on the various matrix obviously all the output would be in matrix); and executing a second, single compiled program module that both performs an element-wise softmax function on the elements of the reduced matrix R and multiplies the result of the element-wise softmax function by the value matrix V to produce the attention output (please see paras. 83-99, disclosing executing a second, single compiled program module that both performs an element-wise softmax function on the elements of the reduced matrix R and multiplies the result of the element-wise softmax function by the value matrix V to produce the attention output, please notice mask matrix is denoising matrix and yMSA is the attention output matrix, further please notice   all the mathematical operation are done on the various matrix obviously all the output would be in matrix).
	The prior art of WANG et al. fails to explicitly recite attention-matrix. 
	However, prior art of TAGHIZADEH et al.  recite attention-matrix (paras. 46-48).
WANG et al. teaches A method performed by one or more computers, comprising: identifying a data item; processing the data item with a denoising neural network to generate a denoised version of the data item, the denoising neural network defining a self-attention mechanism, wherein generating the denoised version of the data item comprises invoking the self-attention mechanism.
 WANG et al. teaches process a set of attention inputs to generate an attention output by: obtaining a query matrix Q that contains elements representing a set of queries q, a key matrix K that contains elements representing a set of keys k, and a value matrix V that contains elements representing a set of values v corresponding to the set of keys k
TAGHIZADEH et al.  teaches attention-matrix
	WANG et al. does not recite attention-matrix.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.
	In combination, WANG et al. performs the same function as it does separately of managing A method performed by one or more computers, comprising: identifying a data item; processing the data item with a denoising neural network to generate a denoised version of the data item
	TAGHIZADEH et al. performs the same function as it does separately of denoising neural network to generate a denoised version of the data item using attention-matrix
Therefore one of ordinary skill in the art could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately.
The results of the combination would have been predictable and resulted in modifying the invention of WANG et al. to attention-matrix as disclosed by TAGHIZADEH et al.   thereby combining multi-channel enhancing and denoising into one model as TAGHIZADEH et al.   discusses at para.7.
Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.
	Further Regarding claim 1, The prior art of JONG; WANG fails to recite reduce matrix.
	However prior art of MRINI et al.  discloses reduce matrix as well as attention matrix (please  see para. 5-6 disclosing  label attention is a word representation matrix (paras. 30-31, 37-39 the encoder uses number of word matrix to encode matrix of word representation,   and encoding used to compressed  matrix of word representation, and Compression technique used to reduce dimension of the matrix of word representation, reduce dimension requires word matrix, which forms reduce matrix).
	WANG et al. teaches A method performed by one or more computers, comprising: identifying a data item; processing the data item with a denoising neural network to generate a denoised version of the data item, the denoising neural network defining a self-attention mechanism, wherein generating the denoised version of the data item comprises invoking the self-attention mechanism.
 WANG et al. teaches process a set of attention inputs to generate an attention output by: obtaining a query matrix Q that contains elements representing a set of queries q, a key matrix K that contains elements representing a set of keys k, and a value matrix V that contains elements representing a set of values v corresponding to the set of keys k
MRINI et al.  teaches reduce-matrix
	WANG et al. does not recite reduce-matrix.
Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.
	In combination, WANG et al. performs the same function as it does separately of managing A method performed by one or more computers, comprising: identifying a data item; processing the data item with a denoising neural network to generate a denoised version of the data item
	MRINI et al. performs the same function as it does separately of denoising neural network to generate a denoised version of the data item using reduce-matrix
Therefore one of ordinary skill in the art could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately.
	The results of the combination would have been predictable and resulted in modifying the invention of WANG et al. to reduce-matrix as disclosed by MRINI et al. thereby parsing natural language sentences using an artificial neural network (ANN) as MRINI et al. discusses at para 4.
	Therefore, the claimed subject matter would have been obvious to a person having ordinary skill in the art at the time the invention was made.

	Regarding Claim 2, MRINI et al. discloses the data item comprises at least one of an image, an audio signal, or a text sample (para. 4, disclosing text samples).
	TAGHIZADEH et al.   discloses data item is audio (para. 7)

	Regarding Claim 3, TAGHIZADEH et al.   discloses iteratively denoising the data item to generate a final denoised version of the data item, wherein the iterative denoising is guided by a conditioning input that characterizes one or more desired properties for denoising the data item (paras. 14-16

	Regarding Claim 4, TAGHIZADEH et al.   discloses generating the attention matrix A comprises calculating a scaled product of the query matrix Q and a transposed version of the key matrix  K (paras. 46-48)..

	Regarding Claim 5, MRINI et al.  discloses the reduced matrix R has fewer elements than the attention matrix A, and wherein performing the element-wise softmax function on the reduced matrix R is less computationally expensive than if the element-wise softmax function were performed on the attention matrix A (paras. paras. 30-31, disclosing  a word representation matrix may be compressed into a reduced number of dimensions compared to an input matrix (e.g., using a neural network layer). This matrix compression may be performed prior to combining the word representation matrices of different label attention heads so as to retain the ability to differentiate the output of each label attention head in the output of the label attention layer).

	Regarding Claim 10, TAGHIZADEH et al.   discloses providing for output a final denoised version of the data item (para. 77)

	Regarding Claim 11, MRINI et al. discloses providing for output the final denoised version of the data item comprises storing the final denoised version of the data item in a memory device, displaying the final denoised version of the data item as an image, playing the final denoised version of the data item as an audio or video stream, presenting the final denoised version of the data item as text, or providing the final denoised version of the data item to a decoding model for conversion from an embedding space to a text, image, audio, or video data item (para. 5, discloses a decoder configured to identify at least one span of words of an input sentence corresponding to a syntactic category based on the output matrix of word vectors, para. 40, discloses  Decoder 120 identifies a span of words from an input sentence corresponding to one or more of the syntactic categories based on the output matrix of the encoder. In some examples, the decoder 120 may incorporate a CKY parsing algorithm. In some cases, decoder 120 applies a softmax function to determine a likelihood of each word corresponding to a syntactic category, para. 110, discloses, the encoder includes a plurality of layers, wherein an output of each layer prior to a final layer is provided as an input for a subsequent layer, and wherein the label attention layer includes the final layer. In some examples, the label attention layer includes a softmax function and a dropout function)

Allowable Subject Matter
Claims 16-25 allowed.
The following is an examiner’s statement of reasons for allowance:  
after further consideration as well as extensive search, all of the prior art cited on 892’s 1449’s, searched in NPL and searched in PGPUB, fails to recite or disclose all the limitations of independent claims with uniquely distinct features represented by underlined bold claim limitations recited below;

	executing a first, single compiled program module that calculates, for each row in the attention matrix A, a respective maximum value L among the elements in the row and a modified exponential sum S for the elements in the row, wherein the respective maximum values L and the respective modified exponential sums S are stored in a reduced matrix R; and executing a second, single compiled program module that both performs an clement-wise softmax function on the elements of the reduced matrix R and multiplies the result of the element-wise softmax function by the value matrix V to produce the attention output.

	Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Claims 6-9 and 12-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Applicant is requested review cited prior art cited on USPTO 892.
The prior art of Majumdar Somshubra (US 20240135514 A1) disclosure; paras. 88-776, disclosing, a scene-based image editing system that implements scene-based image editing techniques using intelligent image understanding. Indeed, in one or more embodiments, the scene-based image editing system utilizes one or more machine learning models to process a digital image in anticipation of user interactions for modifying the digital image. For example, in some implementations, the scene-based image editing system performs operations that build a knowledge set for the digital image and/or automatically initiate workflows for certain modifications before receiving user input for those modifications. Based on the pre-processing, the scene-based image editing system facilitates user interactions with the digital image as if it were a real scene reflecting real-world conditions. For instance, the scene-based image editing system enables user interactions that target pre-processed semantic areas (e.g., objects that have been identified and/or masked via pre-processing) as distinct components for editing rather than target the individual underlying pixels. Further, the scene-based image editing system automatically modifies the digital image to consistently reflect the corresponding real-world conditions.  The scene-based image editing system utilizes machine learning to process a digital image in anticipation of future modifications. In particular, in some cases, the scene-based image editing system employs one or more machine learning models to perform preparatory operations that will facilitate subsequent modification. In some embodiments, the scene-based image editing system performs the pre-processing automatically in response to receiving the digital image. For instance, in some implementations, the scene-based image editing system gathers data and/or initiates a workflow for editing the digital image before receiving user input for such edits. Thus, the scene-based image editing system allows user interactions to directly indicate intended edits to the digital image rather than the various preparatory steps often utilized for making those edits.  The scene-based image editing system pre-processes a digital image to facilitate object-aware modifications. In particular, in some embodiments, the scene-based image editing system pre-processes a digital image in anticipation of user input for manipulating one or more semantic areas of a digital image, such as user input for moving or deleting one or more objects within the digital image.  The scene-based image editing system utilizes a segmentation neural network to generate, for each object portrayed in a digital image, an object mask. In some cases, the scene-based image editing system utilizes a hole-filing model to generate, for each object (e.g., for each corresponding object mask), a content fill (e.g., an inpainting segment). In some implementations, the scene-based image editing system generates a completed background for the digital image by pre-filling object holes with the corresponding content fill. Accordingly, in one or more embodiments, the scene-based image editing system pre-processes the digital image in preparation for an object-aware modification, such as a move operation or a delete operation, by pre-generating object masks and/or content fills before receiving user input for such a modification. Thus, upon receiving one or more user inputs targeting an object of the digital image for an object-aware modification (e.g., a move operation or a delete operation), the scene-based image editing system leverages the corresponding pre-generated object mask and/or content fill to complete the modification. For instance, in some cases, the scene-based image editing system detects, via a graphical user interface displaying the digital image, a user interaction with an object portrayed therein (e.g., a user selection of the object). In response to the user interaction, the scene-based image editing system surfaces the corresponding object mask that was previously generated. The scene-based image editing system further detects, via the graphical user interface, a second user interaction with the object (e.g., with the surfaced object mask) for moving or deleting the object. Accordingly, the moves or deletes the object, revealing the content fill previously positioned behind the object.
The prior art of LI Mohan et al. (US 20220157294 A1) disclosure; paras. 46-182, discloses, a computer-implemented method for speech recognition is provided. The method comprises: receiving a frame of speech audio; encoding the frame of speech audio; calculating a halting probability based on the encoding of the frame of speech audio; adding the halting probability to a first accumulator variable; in response to the first accumulator variable exceeding or reaching a first threshold, calculating a context vector based on the halting probability and the encoding of the frame of speech audio; performing a decoding step using the context vector to derive a token; and executing a function based on the derived token. The executed function comprises at least one of text output or command performance. The described method reduces latency for and improves the quality, e.g. the accuracy, of speech recognition, especially online speech recognition, as compared to existing state-of-the-art speech recognition methods. Online speech recognition methods are methods where speech recognition is performed as frames of audio are received, e.g. the speech recognition method does not have access to the entirety of an utterance prior to beginning performance of the method. Therefore, online speech recognition methods can perform speech recognition and execute functions, e.g. output text or perform commands, responsive to the receipt of live audio, e.g. from a microphone, rather than audio having to be recorded then speech recognition performed as is the case for offline speech recognition methods.  The context vector may be further based on respective encodings and halting probabilities for a preceding one or more frames of speech audio.  In response to the count of the preceding one or more frames of speech audio exceeding or reach a second threshold, the calculation of the context vector may be triggered prior to the first accumulator variable exceeding or reaching the first threshold.  The use of the second threshold to trigger the calculation of the context vector early provides an upper bound on the latency of the speech recognition method, e.g. the latency in decoding is limited to a maximum of the number of frames equal to the second threshold. The halting probability and the context vector may be calculated using a self-attention decoder layer of a decoder neural network.  The self-attention decoder layer may be a multi-head self-attention decoder layer comprising a plurality of attention heads. The halting probability and the context vector may be calculated using an attention head of the plurality of attention heads.  A respective halting probability may be calculated for each of the other attention heads of the plurality of attention heads.  For each of the other attention heads of the plurality of attention heads, the respective halting probability may be added to the first accumulator variable. In response to the first accumulator variable exceeding or reaching the first threshold, a context vector based on the halting probability and the encoding of the frame of speech audio may be calculated. There may be a respective accumulator variable for each of the other attention heads, For each of the other attention heads of the plurality of attention heads, the respective halting probability may be added to the respective accumulator variable. In response to the respective accumulator variable exceeding or reaching the first threshold, a context vector may be calculated based on the respective halting probability and the encoding of the frame of speech audio. In response to a final one or more attention heads of the plurality of attention heads exceeding or reaching the first threshold, the decoding step may be performed. The remainder of the one or more attention heads may have previously exceeded or reached the first threshold.  A combined context vector may be calculated based on the context vectors for each of the plurality of attention heads. The decoder neural network may include one or more further self-attention decoder layers. A respective combined context vector may be calculated using each of the one or more further self-attention decoder layers. The calculation of the halting probability may be further based on a preceding context vector. The encoding of the chunk of speech audio may be by an encoder neural network comprising one or more self-attention encoder layers. The context vector may be calculated without trimming the halting probabilities. Not trimming the halting probabilities has the advantage that attention to the last of the encodings is not lost in such a case that the value of the last of the halting probabilities is not small where the accumulator variable has already approached the first threshold.




Any inquiry concerning this communication or earlier communications from the examiner should be directed to PRABODH M DHARIA whose telephone number is (571)272-7668. The examiner can normally be reached Monday -Friday 9:00 AM to 5:30 PM.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.


If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benjamin Lee can be reached on 571-272-2963. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Any response to this action should be mailed to:
Commissioner of Patents and Trademarks
P.O. Box 1450
Alexandria VA 22313-1450
/Prabodh M Dharia/
Primary Examiner 
Art Unit 2629
03-02-2026
Read full office action
Prosecution Timeline

Jun 05, 2024
Application Filed
Mar 03, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/519,602
Patent 12602736
GRAPHICS PROCESSING DEVICE AND METHOD
2y 5m to grant Granted Apr 14, 2026
18/696,967
Patent 12603025
DISPLAY APPARATUS, CONTROL METHOD, DISPLAY DEVICE AND COMPUTER STORAGE MEDIUM
2y 5m to grant Granted Apr 14, 2026
18/240,847
Patent 12597503
MEDICAL VISUALISATION DEVICES AND SYSTEMS
2y 5m to grant Granted Apr 07, 2026
18/398,229
Patent 12597090
Image dithering apparatus and method having a random dithering mechanism
2y 5m to grant Granted Apr 07, 2026
18/500,663
Patent 12591448
SERVER-SIDE CONTROL OF POWER BASED ON SERVICE-LEVEL AGREEMENT
2y 5m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
92%
With Interview (+6.3%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 1257 resolved cases by this examiner. Grant probability derived from career allow rate.