Last updated: April 19, 2026
Application No. 18/424,375
EFFICIENT SPECULATIVE DECODING IN AUTOREGRESSIVE GENERATIVE ARTIFICIAL INTELLIGENCE MODELS

Final Rejection §101§103
Filed
Jan 26, 2024
Examiner
PATEL, SHREYANS A
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
2 (Final)
Interview Optional

— +7.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 403 resolved cases, 2023–2026
Examiner Intelligence

PATEL, SHREYANS A View full profile →
Grants 89% — above average
Career Allow Rate
359 granted / 403 resolved
+27.1% vs TC avg
Moderate +7% lift
Without
With
+7.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 3m
Avg Prosecution
46 currently pending
Career history
449
Total Applications
across all art units
Statute-Specific Performance

§101
21.3%
-18.7% vs TC avg
§103
36.0%
-4.0% vs TC avg
§102
22.6%
-17.4% vs TC avg
§112
8.8%
-31.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 403 resolved cases
Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments with respect to 35 U.S.C. 112 rejection of claims 2-3, 22-23 and 31 have been considered and found persuasive due to amendments, and the rejection has been withdrawn.
Applicant's arguments with respect to 35 U.S.C. 101 in regards to claims 1-40 have been considered, however are not found to be persuasive due to the following reasons explained below. See detailed rejection of claims 1-40 below. 
Applicant's arguments with respect to 35 U.S.C. 103 in regards to claims 1, 12, 21 and 32 have been considered, however are not found to be persuasive due to the following reasons explained below. See detailed reason for rejection below.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-40 are rejected under 35 U.S.C. 101.
Claims 1 and 21 are rejected because it is directed to the abstract idea of verifying and selecting information. The claim describes a high-level functional process of generating candidate data (tokens), checking that data against a secondary reference (the second machine learning model), and selecting the verified result. This is a fundamental mental process and a method of organizing human activity, similar to a person brainstorming a list of possibilities and then cross-referencing them with a textbook to ensure accuracy before finalizing a response. The claim does not recite a specific technical improvement but rather focuses on the logical flow of information through generic computing components.
The claims fails to integrate this abstract idea into a practical application. The recited components processors, memory and machine learning models are a high level of generality to perform their basic functions of data storage, processing, and output. The claim does not describe a specific improvement to the internal functioning of the computer or a technical solution to a specific technological problem. Instead, it merely automates the abstract concept of speculative generation and verification using conventional computer tools. The "beam width" and "beam search" limitations are standard data processing parameters that do not provide the necessary how to transform the abstract logic into a technical implementation. The claims lack an inventive concept because the elements, both individually and as an ordered combination, are well-understood, routine, and conventional in the field.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims are (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. There is further no improvement to the computing device.
Claims 12 and 32 are rejected because it is directed to the abstract idea of mathematical comparison and selection. The core of the claims is the comparison of probability distributions to select specific tokens. This is a mathematical exercise that can be performed mentally or with the aid of pen and paper. The claim organizes this mathematical activity into a sequence of receiving, comparing, and selecting data, which falls within the category of mental processes or methods of organizing human activity. 
The claims do not integrate the abstract idea into a practical application because it lacks a specific technological structure or a technical improvement to a computer's capability. The claims are drafted in purely functional terms, focusing on the result of comparing distributions rather than a specific way of improving computer hardware or software efficiency. There is no indication that the claimed method resolves a specific technical bottleneck (narrowing it down to specific process/method) in a way that is not routine or conventional. The use of a second machine learning model as a comparison point is a functional requirement that does not provide a technical practical application beyond the abstract comparison itself.
Claims 12 and 32 do not recite an inventive concept. The steps of receiving data, comparing statistical distributions, and outputting a selection are fundamental to the operation of any statistical analysis system. The fixed number of tokens corresponding to a defined beam width is a conventional constraint used in nearly all beam search algorithms. When viewed as an ordered combination, these elements represent nothing more than the standard application of probability theory within a generic computing environment. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims are (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. There is further no improvement to the computing device. 
Dependent claims 2-11, 13-20, 22-31 and 33-40 are further recite an abstract idea performable by a mathematical concepts and do not amount to significantly more than the abstract idea as they do not provide steps other than what is conventionally known natural language processing. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6-22 and 26-40 are rejected under 35 U.S.C. 103 as being unpatentable over Niu et al. (US 2022/0129629) in view of Miao et al. (“Spec Infer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification”, May 16, 2023).

Claims 1 and 21,
Niu teaches a processing system comprising: at least one memory having executable instructions stored thereon; and one or more processors configured to execute the executable instructions to cause the processing system to ([0029] a computing device for implementing the dynamic blocking for paraphrase generation; computing device 300 includes a processor coupled to memory):
generate, based on an input prompt ([0016] the encoder 110 receives an input of a natural language sentence 103, e.g., “the chart above illustrates how world population has changed through history.” The sentence 103 (including a start token <s>) can be represented as a source sequence S as a list of tokens S=(S.sub.0, S.sub.1, . . . , S.sub.M)) and using a first machine learning model ([0016] the language model includes an encoder 110 and an autoregressive decoder 120), a set of tokens including one or more subsets of tokens, each respective subset of the one or more subsets corresponding to a respective portion of a response to the input prompt, each respective subset including a fixed number of tokens ([0017] [0021-0022] the decoder 120 generates the paraphrased sentence 109 by emitting a generated sequence of tokens represented as G=(G0, G1, . . . , GN); the decoder may perform beam search to generate a set of candidates for the next token (generating candidate subsets for the next-step portion of the output); beam search is used to generate several candidates and the top-ranked two may be kept (retaining a fixed top number)).
The difference between the prior art and the claimed invention is that Niu does not explicitly teach each respective subset including a fixed number of tokens corresponding to a beam width for a beam search through the set of tokens; output the set of tokens to a second machine learning model for verification; receive, at the first machine learning model from the second machine learning model, information identifying a selected sequence of tokens from the generated set of tokens; and output the selected sequence of tokens as the response to the input prompt.
Miao teaches each respective subset including a fixed number of tokens corresponding to a beam width for a beam search through the set of tokens ([3.2] SpecInfer uses a three-layer MLP as the neural architecture of the matching length predictor and considers a configuration space of beam search for each SSM, where the beam width b ∈ [1,2,4] and the beam depth d ∈ [1,2,4,8,16], therefore the MLP outputs a vector of 15 numbers, each represent the predicted matching length for a speculative configuration);
output the set of tokens to a second machine learning model for verification ([Algorithm 2] the LLM takes a token tree N as an input and generates a token O(u) for each node u ∈ N (sending the speculated structure to an LLM verifier); see Algorithm 2: N =SPECULATE(S) and O=TREEPARALLELDECODE(LLM,N));
receive, at the first machine learning model from the second machine learning model, information identifying a selected sequence of tokens from the generated set of tokens ([Algorithm 2] VERIFY examines the speculated token tree N against the LLM’s output O and produces a sequence of verified tokens V, which can be directly appended to the current token sequence S; see Algorithm 2 steps 16-18); and
output the selected sequence of tokens as the response to the input prompt ([Algorithm 2] outputs/returns he generated (verified) sequence, after appending verified tokens, the algorithm returns the sequence if t = EOS then return S).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Niu with teachings of Miao by modifying the system and method for unsupervised paraphrase generation as taught by Niu to include a fixed number of tokens corresponding to a beam width for a beam search through the set of tokens; output the set of tokens to a second machine learning model for verification; receive, at the first machine learning model from the second machine learning model, information identifying a selected sequence of tokens from the generated set of tokens; and output the selected sequence of tokens as the response to the input prompt as taught by Miao for the benefit of preserving the same generative performance as incremental decoding (Miao [Section 4.2]).  

Claims 2 and 22,
Niu in view of Miao further teach the processing system of Claim 1, wherein to generate the set of tokens, the one or more processors are configured to cause the processing system to: generate a first subset of tokens corresponding to a first portion of the response to the input prompt based on the input prompt and using the first machine learning model; and speculatively generate, based on the input prompt and the first subset of tokens and using the first machine learning model, a second subset of tokens corresponding to a second portion of the response to the input prompt ([Niu - 0016-0017] language model includes an encoder; the encoder receives an input of a natural language sentence; the sentence includes a start token (s) can be represented as a source sequence S as a list of tokens; the input source sequence is the encoded by the encoder as a vector representation; the decoder receives the encoded vector representation; generates a paraphrased sentence; the paraphrased sentence by emitting a generated sequence of tokens represented as G).

Claims 6 and 26,
Niu in view of Miao further teach the processing system of Claim 1, wherein: the set of tokens comprises a tree data structure including a root node corresponding to the input prompt and one or more levels associated with the one or more subsets of tokens, each respective subset of the one or more subsets of tokens corresponds to a respective level in the tree data structure, and each level in the tree data structure has a width equal to the fixed number of tokens ([Miao – 4.1 Tree Attention] see equations 3-5; matrix that represents the attention scores between different tokens in the input sequence; verifying a token tree is computing the attention scores for individual token sequences).

Claims 7 and 27,
Niu in view of Miao further teach the processing system of Claim 6, wherein the one or more processors are further configured to cause the processing system to: generate an attention map based on the tree data structure, the attention map being configured to identify sequences of tokens in the tree data structure ([Miao – 4.1 Tree Attention] the attention mechanism to apply it to the tree structures; for each node u in a token tree, its attention output is defined as the output of computing attention Su (the token sequence represented by u); and output the attention map to the second machine learning model for the second machine learning model to use in verifying the sequence of tokens ([Miao – 4.2 Verification] SpecInfer’s verification process; decoder see figure 3).

Claims 8 and 28,
Niu in view of Maio further teach the processing system of Claim 1, wherein the first machine learning model and the second machine learning model comprise a same generative artificial intelligence model ([Introduction] generative LLM inference with speculative inference and token tree verification; tree-based parallel decoding algorithm).

Claims 9 and 29,
Niu in view of Miao further teach the processing system of Claim 8, wherein the second machine learning model is configured to generate the selected sequence of tokens based on maximizing a number of tokens included in the selected sequence of tokens ([Miao – Introduction] the correctness of all token sequences). 

Claims 10 and 30,
Niu in view of Maio further teach the processing system of Claim 1, wherein a number of the one or more subsets of tokens corresponds to a maximum number of tokens generated by a single pass through the first machine learning model ([Miao – Introduction] the correctness of all token sequences represented by a token tree). 

Claims 11 and 31,
Niu in view of Maio further teach the processing system of Claim 1, wherein: the first machine learning model corresponds to a draft model in a speculative decoding pipeline; and the second machine learning model corresponds to a target model in the speculative decoding pipeline ([Miao – Fig. 3] sequence-based, token-based and tree-based decoding). 

Claims 12 and 32,
Niu teaches a processing system, comprising: at least one memory having executable instructions stored thereon; and one or more processors configured to execute the executable instructions to cause the processing system to ([0029] a computing device for implementing the dynamic blocking for paraphrase generation; computing device 300 includes a processor coupled to memory):
receive an input prompt and a set of tokens generated by a first machine learning model, the set of tokens including one or more subsets of tokens, each respective subset of the one or more subsets corresponding to a respective portion of a response to the input prompt, each subset including a fixed number of tokens corresponding to a defined beam width for a beam search through the set of tokens ([0016-0017] [0021-0022] the encoder 110 receives an input of a natural language sentence 103, e.g., “the chart above illustrates how world population has changed through history.” The sentence 103 (including a start token <s>) can be represented as a source sequence S as a list of tokens S=(S.sub.0, S.sub.1, . . . , S.sub.M); the language model includes an encoder 110 and an autoregressive decoder 120); the decoder 120 generates the paraphrased sentence 109 by emitting a generated sequence of tokens represented as G=(G0, G1, . . . , GN); the decoder may perform beam search to generate a set of candidates for the next token (generating candidate subsets for the next-step portion of the output); beam search is used to generate several candidates and the top-ranked two may be kept (retaining a fixed top number));
compare a probability distribution associated with each respective subset of tokens in the set of tokens ([0022] [claim 1] subsequent token has having the highest probability from a beam search; selecting a candidate token having a highest likelihood score).
The difference between the prior art and the claimed invention is that Niu does not explicitly teach to a corresponding probability distribution generated by a second machine learning model for the respective subset of tokens; select tokens from the set of tokens based on the comparing; and output, to the first machine learning model, an indication of the selected tokens.
Miao teaches to a corresponding probability distribution generated by a second machine learning model for the respective subset of tokens ([4.2] enabling SpecInfer to examine all tokens in parallel by visiting the LLM’s parameters once; this parallel decoding procedure generates an output tensor O that includes a token for each node u ∈ N. Algorithm 2 shows SpecInfer’s verification process, which starts from the root of N and iteratively examines a node’s speculated results against the LLM’s original output; see Fig. 2);
select tokens from the set of tokens based on the comparing ([Algorithm 2] VERIFY examines the speculated token tree N against the LLM’s output O and produces a sequence of verified tokens V, which can be directly appended to the current token sequence S; see Algorithm 2 steps 16-18)); and 
output, to the first machine learning model, an indication of the selected tokens ([Algorithm 2] outputs/returns he generated (verified) sequence, after appending verified tokens, the algorithm returns the sequence if t = EOS then return S).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Niu with teachings of Miao by modifying the system and method for unsupervised paraphrase generation as taught by Niu to include to a corresponding probability distribution generated by a second machine learning model for the respective subset of tokens; select tokens from the set of tokens based on the comparing; and output, to the first machine learning model, an indication of the selected tokens as taught by Miao for the benefit of preserving the same generative performance as incremental decoding (Miao [Section 4.2]).  

Claims 13 and 33,
The processing system of Claim 12, wherein: the set of tokens comprises a tree data structure including a root node corresponding to the input prompt and one or more levels associated with the one or more subsets of tokens, each respective subset of the one or more subsets of tokens corresponds to a respective level in the tree data structure, and each level in the tree data structure has a width equal to the fixed number of tokens (Claim 13 contains subject matter similar to claim 6, and thus is rejected under similar rationale). 

Claims 14 and 34,
The processing system of Claim 13, wherein: the one or more processors are further configured to cause the processing system to receive an attention map based on the tree data structure; the attention map is configured to identify sequences of tokens in the tree data structure; and the one or more processors are configured to compare the probability distribution associated with each respective subset of tokens in the set of tokens to the corresponding probability distribution generated by the second machine learning model for the respective subset of tokens based at least in part on the attention map (Claim 14 contains subject matter similar to claim 7, and thus is rejected under similar rationale). 

Claims 15 and 35,
The processing system of Claim 12, wherein to select the tokens from the set of tokens based on the comparing, the one or more processors are configured to cause the processing system to select a sequence of tokens from the set of tokens based on maximizing a number of tokens included in the selected sequence of tokens (Claim 15 contains subject matter similar to claim 9, and thus is rejected under similar rationale).  

Claims 16 and 36,
Miao further teaches the processing system of Claim 12, wherein to select the tokens from the set of tokens, the one or more processors are configured to cause the processing system to select the tokens based on recursive adjustment of a target distribution associated with the set of tokens ([Section 4.2] all tokens are matched against the original LLM are computed; see equations in section 4.2). 

Claims 17 and 37,
Miao further teaches the processing system of Claim 16, wherein to recursively adjust the target distribution, the one or more processors are configured to cause the processing system to: determine whether to accept or reject a first token in a subset of tokens from the set of tokens (Verification); and adjust a probability distribution used to verify a second token in the set of tokens subsequent to the first token based on the determination of whether to accept or reject the first token (all tokens are matched against the original LLM are computed; see equations in section 4.2). 

Claims 18 and 38,
Miao further teaches the processing system of Claim 17, wherein to adjust the probability distribution, the one or more processors are configured to cause the processing system to subtract a probability value associated with the first token from the probability distribution based on determining to reject the first token (see equations in Section 4.1 Tree Attention and Section 4.2 Verification). 

Claims 19 and 39,
The processing system of Claim 12, wherein to select the tokens from the set of tokens, the one or more processors are configured to cause the processing system to: reject a first token at a first level of a tree data structure representing the set of tokens (verifying all sequences of a token tree against the original LLM’s output); generate an adjusted probability distribution based on the rejection of the first token (incremental decoding approach); discard or ignore, from the tree data structure, children tokens of the first token at levels deeper than the first level of the tree data structure; and determine whether to accept or reject a second token at the first level of the tree data structure based on the adjusted probability distribution ([4.2 Verification] speculates its next token includes a child node). 

Claims 20 and 40,
Miao further teaches the processing system of Claim 12, wherein to select the tokens from the set of tokens, the one or more processors are configured to cause the processing system to: reject each subset of tokens generated by the first machine learning model ([Fig. 2] [Section 3.1] fine-tuning the tokens); and 
sample, using the second machine learning model, a token based on a target distribution that excludes probabilities associated with each subset of tokens generated by the first machine learning model, wherein the selected tokens comprise the sampled token ([Fig. 2] [Section 3.1] converting a text corpus into a collection of prompt samples and uses the LLM to generate a token sequence for each prompt). 

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

SHREYANS A. PATEL
Primary Examiner
Art Unit 2653



/SHREYANS A PATEL/Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Jan 26, 2024
Application Filed
Oct 31, 2025
Non-Final Rejection — §101, §103
Feb 03, 2026
Response Filed
Mar 04, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/934,906
Patent 12586597
ENHANCED AUDIO FILE GENERATOR
2y 5m to grant Granted Mar 24, 2026
18/744,449
Patent 12586561
TEXT-TO-SPEECH SYNTHESIS METHOD AND SYSTEM, A METHOD OF TRAINING A TEXT-TO-SPEECH SYNTHESIS SYSTEM, AND A METHOD OF CALCULATING AN EXPRESSIVITY SCORE
2y 5m to grant Granted Mar 24, 2026
17/983,671
Patent 12548549
ON-DEVICE PERSONALIZATION OF SPEECH SYNTHESIS FOR TRAINING OF SPEECH RECOGNITION MODEL(S)
2y 5m to grant Granted Feb 10, 2026
18/589,789
Patent 12548583
ACOUSTIC CONTROL APPARATUS, STORAGE MEDIUM AND ACCOUSTIC CONTROL METHOD
2y 5m to grant Granted Feb 10, 2026
18/201,105
Patent 12536988
SPEECH SYNTHESIS METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
89%
Grant Probability
96%
With Interview (+7.4%)
2y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 403 resolved cases by this examiner. Grant probability derived from career allow rate.