Last updated: April 19, 2026
Application No. 19/030,632
DRAFT MODEL SELECTION FOR SPECULATIVE DECODING WITH MULTIPLE EXPERT MODELS

Non-Final OA §101§102§103
Filed
Jan 17, 2025
Examiner
BOWEN, RICHARD L
Art Unit
2165
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +27.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 544 resolved cases, 2023–2026
Examiner Intelligence

BOWEN, RICHARD L View full profile →
Grants 80% — above average
Career Allow Rate
437 granted / 544 resolved
+25.3% vs TC avg
Strong +28% interview lift
Without
With
+27.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
14 currently pending
Career history
558
Total Applications
across all art units
Statute-Specific Performance

§101
14.5%
-25.5% vs TC avg
§103
41.1%
+1.1% vs TC avg
§102
20.5%
-19.5% vs TC avg
§112
13.5%
-26.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 544 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 26, 2025 is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



Claims 1-20 are rejected under 35 U.S.C. §101 because the claimed invention is directed to an abstract idea without significantly more.
As per claims 1-20, The claims recite the abstract idea of applying mathematical concepts or certain methods of organizing human activity.
The following is an analysis based on 2019 Revised Patent Subject Matter Eligibility Guidance (2019 PEG).

Step 1, Statutory Category?
Claims 1-10 are directed to a method
Claims 11-19 are directed to a system 
Claim 20 are directed to a non-transitory, computer-readable medium

Step 2A, Prong I: Judicial Exception Recited?
The examiner submits that the foregoing claim limitations constitute a “mathematical concepts”, as the claims cover basic mathematical concepts given the broadest reasonable interpretation.

As per claim 1, the claim recites the limitations of:
Training a policy that is configured to select, for each of the at least one expert model, a draft model that is maximally aligned with the expert model, the policy being trained using a training dataset comprising inputs from the plurality of contexts; -- training using different calculations and variables is considered to be a mathematical concept. 
Determining a pair of expert and draft models for processing the first user query using the trained policy; -- determining using different calculations and variables is considered to be a mathematical concept.

In the alternative, these limitations are directed to certain methods of organizing human activity. Namely the training and determining is similar to using generalists and experts when problem solving. The additional features of the claims amount to apply on a computer.

Step 2A, Prong II: Integrated into a Practical Application?
The claims recite the following additional limitations:
As per claim 1, the claim similarly recites extra solution activity of gathering data in the following limitations:
receiving input of a first user query;
These limitations recite insignificant extra-solution activity as preliminary data gathering as retrieval/receiving of data such as 'obtaining information' as identified in MPEP 2106.05(g) and does not provide integration into a practical application.
The claim similarly recites extra solution activity such as outputting data in the following limitations: 
Generating an output based on the first user query using the determined pair of models.
These limitations recite insignificant extra-solution activity as post-solution data outputting per MPEP 2106.05(g) and does not provide integration into a practical application.

ADDITIONAL ELEMENTS: 
As per claim 1 the claim recites the following additional elements: 
Using multiple large language models (LLMs), the multiple LLMs including at least one expert model and a plurality of draft models which is/are a high-level recitation of generic computer components, computer elements used as a tool, and represent mere instructions to apply the abstract idea on a computer as in MPEP 2106.05(f), and does not provide integration into a practical application.
               Therefore, claim 1 does not integrate the recited abstract ideas into a practical application.
               
Step 2B: Claim recites additional elements or limitations that amount to an inventive concept?
When considered individually or in combination, the additional limitations and elements of claim 1 does not amount to significantly more than the judicial exception for the same reasons discussed above as to why the additional limitations do not integrate the abstract idea into a practical application. The additional elements of outlined in Step 2A performing functions as designed simply accomplishes execution of the abstract ideas.
The additional limitations identified as insignificant extra-solution activity above the conclusions are carried over and they also do not provide significantly more. 
Receiving data is an example of gathering data that has been found by the courts to be well understood routine and conventional. For Berkheimer support see court recognized activities in MPEP 2106.05(d)(II).  Receiving data of a solution is an example of providing output data via a user interface is also well understood, routine, and conventional, see MPEP 2106.05(d)(II).
The additional elements using multiple large language models reciting generic computer components as mere instructions to apply on a computer per MPEP 2106.05(f) are carried over and do not provide significantly more than the abstract idea.
In conclusions from above for the elements reciting generic computer components as mere instructions to apply on a computer per MPEP 2106.05(f) are carried over and do not provide significantly more than the abstract idea. Looking at the limitations in combination and the claims as a whole does not change this conclusion and the claim is ineligible.
Therefore, the additional limitations of claim 1 do not amount to significantly more than the judicial exception.
Thus, claim 1 recite abstract ideas with additional elements rendered at a high level of generality resulting in claims that do not integrate the abstract idea into a practical application or amount to significantly more than the judicial exception. 
Therefore, claim 1 is not patent eligible. 
Independent claims 11 and 20 have substantially similar limitations; therefore, they are reject for the same reasons as claim 1. 
As per the dependent claims, these claims include additional mathematical concepts and/or certain methods of human activity as described above in claim 1, along with adding additional insignificant activity and/or apply on a computer steps, therefore, they are rejected for similar reasons. 

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 8-11 and 18-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Ramanujan et al. (U.S. Publication No. 2025/0117626 A1, hereinafter referred to as “Ramanujan”).
Regarding claim 1, Ramanujan discloses a method for generating outputs using multiple large language models (LLMs), the multiple LLMs including at least one expert model and a plurality of draft models, wherein the method comprises: (“computing device 100 receives an inference request 110. The inference request includes an input prompt 112 and an indication 114 of a selected delta AI model. In other words, the inference request specifies a selected delta AI model of the two or more delta AI models implemented by the computing device. As will be described in more detail below, the input prompt is input to the base AI model and a delta AI model of the set of delta AI models implemented by the computing device to generate respective result vectors, which are then combined to generate an output vector for the input prompt. As such, the input prompt takes the form of any data suitable for input to the AI models for inference. As non-limiting examples, the input prompt may be an encoded representation of text data (e.g., a natural language input typed, spoken, or otherwise provided by a human user), image data (e.g., for image classification or facial recognition), audio data (e.g., for speech recognition), and/or video data.” – “base AI model” is considered to be the “at least one expert model” and “delta AI models” are considered to be “plurality of draft models”)(e.g., figure 1A and paragraphs [0023])
training a policy that is configured to select, for each of the at least one expert model, a draft model that is maximally aligned with the expert model, the policy being trained using a training dataset comprising inputs from a plurality of contexts; (“During inferencing, an input vector is provided separately to the base AI model and a selected delta AI model. The resulting vectors output by the base AI model and selected delta AI model are combined to generate an output vector, which is then output as a response to the input vector. However, this does not preclude the computing device from concurrently generating a different output vector using the base model and a different delta AI model as a response to a different input vector—e.g., corresponding to a different user or customer. In this manner, the same computing device can concurrently fulfill inference requests specifying different delta AI models that adapt the base AI model in different ways.” – delta AI model is considered to be maximally aligned with the expert model, because it coincides with the inference request)(e.g., Figs 1A, 1B, 2, 3A and 3B and paragraphs [0016], [0023]-[0025] and [0081])
receiving input of a first user query; (“computing device 100 receives an inference request 110.”)(e.g., abstract and paragraph [0023])
determining a pair of expert and draft models for processing the first user query using the trained policy; and (“In other words, the inference request specifies a selected delta AI model of the two or more delta AI models implemented by the computing device. As will be described in more detail below, the input prompt is input to the base AI model and a delta AI model of the set of delta AI models implemented by the computing device to generate respective result vectors, which are then combined to generate an output vector for the input prompt.”)(e.g., figures 1A and 1B and paragraphs [0023] and [0027])
generating an output based on the first user query using the determined pair of models. (“The computing device then generates an output vector by combining the base model result vector and the delta model result vector via a combination operation. In the example of FIG. 1A, base model result vector 116 and delta model result vector 118 are combined via a combination operation 120 to generate an output vector 122. The base model result vector and delta model result vector are combined in any suitable way, depending on the implementation.”)(e.g., figures 1A and 1B and paragraph [0028]).

Regarding claim 8, Ramanujan discloses the method of claim 1. Ramanujan further discloses wherein determining the pair of expert and draft models for processing the first user query comprises obtaining, via the trained policy, a distribution over the plurality of draft models and selecting a first one of the draft models based on the distribution. (“During inferencing, an input vector is provided separately to the base AI model and a selected delta AI model. The resulting vectors output by the base AI model and selected delta AI model are combined to generate an output vector, which is then output as a response to the input vector. However, this does not preclude the computing device from concurrently generating a different output vector using the base model and a different delta AI model as a response to a different input vector—e.g., corresponding to a different user or customer.” “SDN 730 is configured to route network traffic through a data plane 732, based on network routing information, policies, rules, and/or other configuration information set at the control plane 728 of the SDN. The SDN may be used to implement network routes along which network traffic is routed, network topologies in which to organize networking in the computing system, enforce policies and rules, and perform other functions related to networking in the computing system.”)(e.g., Figures 1A and 1B and paragraphs [0016], [0022], [0023], [0063] and [0064]).

Regarding claim 9, Ramanujan discloses the method of claim 8. Ramanujan further discloses wherein generating the output based on the first user query using the determined pair of models comprises configuring the first draft model to assist decoding for the first user query. (“In the example of FIG. 1A, the input prompt 112 is input to the base AI model 106 to thereby generate a base model result vector 116. Similarly, the input prompt is input to the selected delta AI model to thereby generate a delta model result vector. In this example, the selected delta AI model is delta AI model 108A, which generates a delta model result vector 118. In other words, the same input prompt is separately provided both to the base AI model and the selected delta AI model, which perform inferencing to output different respective result vectors for the input prompt.”)(e.g., figures 1A and 1B and paragraphs [0027]).

Regarding claim 10, Ramanujan discloses the method of claim 1. Ramanujan further discloses wherein the policy comprises a neural network. (“Other non-limiting examples of AI and/or machine learning (ML) technologies that may be used to implement a base AI model include support vector machines, multi-layer neural networks, convolutional neural networks, and/or recurrent neural networks.”)(e.g., paragraph [0021]).

Regarding claim 11, Ramanujan discloses a computing system for generating outputs using multiple large language models (LLMs), the multiple LLMs including at least one expert model and a plurality of draft models, wherein the computing system comprises: (“computing device 100 receives an inference request 110. The inference request includes an input prompt 112 and an indication 114 of a selected delta AI model. In other words, the inference request specifies a selected delta AI model of the two or more delta AI models implemented by the computing device. As will be described in more detail below, the input prompt is input to the base AI model and a delta AI model of the set of delta AI models implemented by the computing device to generate respective result vectors, which are then combined to generate an output vector for the input prompt. As such, the input prompt takes the form of any data suitable for input to the AI models for inference. As non-limiting examples, the input prompt may be an encoded representation of text data (e.g., a natural language input typed, spoken, or otherwise provided by a human user), image data (e.g., for image classification or facial recognition), audio data (e.g., for speech recognition), and/or video data.” – “base AI model” is considered to be the “at least one expert model” and “delta AI models” are considered to be “plurality of draft models”)(e.g., figure 1A and paragraphs [0023])
a processor; a memory coupled to the processor, the memory storing computer-executable instructions that, when executed by a processor, configure the processor to: (e.g., figure 8 and paragraphs [0070] and [0074])
train a policy that is configured to select, for each of the at least one expert model, a draft model that is maximally aligned with the expert model, the policy being trained using a training dataset comprising inputs from a plurality of contexts; (“During inferencing, an input vector is provided separately to the base AI model and a selected delta AI model. The resulting vectors output by the base AI model and selected delta AI model are combined to generate an output vector, which is then output as a response to the input vector. However, this does not preclude the computing device from concurrently generating a different output vector using the base model and a different delta AI model as a response to a different input vector—e.g., corresponding to a different user or customer. In this manner, the same computing device can concurrently fulfill inference requests specifying different delta AI models that adapt the base AI model in different ways.” – delta AI model is considered to be maximally aligned with the expert model, because it coincides with the inference request)(e.g., Figs 1A, 1B, 2, 3A and 3B and paragraphs [0016], [0023]-[0025] and [0081])
receive input of a first user query; (“computing device 100 receives an inference request 110.”)(e.g., abstract and paragraph [0023])
determine a pair of expert and draft models for processing the first user query using the trained policy; and (“In other words, the inference request specifies a selected delta AI model of the two or more delta AI models implemented by the computing device. As will be described in more detail below, the input prompt is input to the base AI model and a delta AI model of the set of delta AI models implemented by the computing device to generate respective result vectors, which are then combined to generate an output vector for the input prompt.”)(e.g., figures 1A and 1B and paragraphs [0023] and [0027])
generate an output based on the first user query using the determined pair of models. (“The computing device then generates an output vector by combining the base model result vector and the delta model result vector via a combination operation. In the example of FIG. 1A, base model result vector 116 and delta model result vector 118 are combined via a combination operation 120 to generate an output vector 122. The base model result vector and delta model result vector are combined in any suitable way, depending on the implementation.”)(e.g., figures 1A and 1B and paragraph [0028]).
Claims 18 and 19 have substantially similar limitations as stated in claim 8 and 10, respectively; therefore, they are rejected under the same subject matter. 
	
Regarding claim 20, Ramanujan discloses a non-transitory, computer-readable medium storing instructions for generating outputs using multiple large language models (LLMs), the multiple LLMs including at least expert model and a plurality of draft models, wherein the instructions, when executed by a processor, configure the processor to: (“computing device 100 receives an inference request 110. The inference request includes an input prompt 112 and an indication 114 of a selected delta AI model. In other words, the inference request specifies a selected delta AI model of the two or more delta AI models implemented by the computing device. As will be described in more detail below, the input prompt is input to the base AI model and a delta AI model of the set of delta AI models implemented by the computing device to generate respective result vectors, which are then combined to generate an output vector for the input prompt. As such, the input prompt takes the form of any data suitable for input to the AI models for inference. As non-limiting examples, the input prompt may be an encoded representation of text data (e.g., a natural language input typed, spoken, or otherwise provided by a human user), image data (e.g., for image classification or facial recognition), audio data (e.g., for speech recognition), and/or video data.” – “base AI model” is considered to be the “at least one expert model” and “delta AI models” are considered to be “plurality of draft models”)(e.g., figure 1A and 8 and paragraphs [0023], [0068] and [0070])
train a policy that is configured to select, for each of the at least one expert model, a draft model that is maximally aligned with the expert model, the policy being trained using a training dataset comprising inputs from a plurality of contexts; (“During inferencing, an input vector is provided separately to the base AI model and a selected delta AI model. The resulting vectors output by the base AI model and selected delta AI model are combined to generate an output vector, which is then output as a response to the input vector. However, this does not preclude the computing device from concurrently generating a different output vector using the base model and a different delta AI model as a response to a different input vector—e.g., corresponding to a different user or customer. In this manner, the same computing device can concurrently fulfill inference requests specifying different delta AI models that adapt the base AI model in different ways.” – delta AI model is considered to be maximally aligned with the expert model, because it coincides with the inference request)(e.g., Figs 1A, 1B, 2, 3A and 3B and paragraphs [0016], [0023]-[0025] and [0081])
receive input of a first user query; (“computing device 100 receives an inference request 110.”)(e.g., abstract and paragraph [0023])
determine a pair of expert and draft models for processing the first user query using the trained policy; and (“In other words, the inference request specifies a selected delta AI model of the two or more delta AI models implemented by the computing device. As will be described in more detail below, the input prompt is input to the base AI model and a delta AI model of the set of delta AI models implemented by the computing device to generate respective result vectors, which are then combined to generate an output vector for the input prompt.”)(e.g., figures 1A and 1B and paragraphs [0023] and [0027])
generate an output based on the first user query using the determined pair of models. (“The computing device then generates an output vector by combining the base model result vector and the delta model result vector via a combination operation. In the example of FIG. 1A, base model result vector 116 and delta model result vector 118 are combined via a combination operation 120 to generate an output vector 122. The base model result vector and delta model result vector are combined in any suitable way, depending on the implementation.”)(e.g., figures 1A and 1B and paragraph [0028]).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 2-7 and 12-17 are rejected under 35 U.S.C. 103 as being unpatentable over Ramanujan in view of Agarwal et al. (U.S. Publication No. 2025/0124256 A1, hereinafter referred to as “Agarwal”).
Regarding claim 2, Ramanujan discloses the method of claim 1. However, Ramanujan does not appear to specifically disclose wherein training the policy comprises: obtaining an offline training dataset comprising user queries; and for each query in the training dataset: generating, based on the query, outputs from the at least one expert model and each of the plurality of draft models; determining similarity of the output of each draft model to the output of the at least one expert model; and adding the output similarity data to the training dataset.
On the other hand, Agarwal, which relates to efficient knowledge distillation framework for training machine-learned models (title), does disclose wherein training the policy comprises: obtaining an offline training dataset comprising user queries; and (“The example method can include, in an offline process, obtaining the plurality of predictions of the teacher machine-learned sequence processing model and updating the machine-learned student sequence processing model based on the multiscale refinement objective.”)(e.g., paragraph [0017]
for each query in the training dataset: generating, based on the query, outputs from the at least one expert model and each of the plurality of draft models; (“to help align the student model and the teacher model, example implementations according to the present disclosure can train the student model by using the outputs of the teacher model to “correct” outputs organically generated by the student model. For example, the student model can include a machine-learned sequence processing model configured to generate sequences of information based on input context information. The student model can generate new data elements in the sequence. The teacher model can process the sequence generated by the student model and generate, for each data element generated by the student model, an example of how the teacher model would have processed the sequence at that step. In this manner, for instance, the student model can explore its own output space by organically generating sequences while also receiving feedback on its performance from the teacher model.” “For instance, one component of a multiscale refinement objective can be based on a comparison of student generated value(s) 106-2 and teacher generated value(s) 110. For example, the training system(s) 116 can compare student generated value(s) 106-2 and teacher generated value(s) 110 to evaluate how well the individual output(s) of the student model 104 align with the preferred output(s) of the teacher model 108.”)(e.g., paragraphs [0040] and [0077])
determining similarity of the output of each draft model to the output of the at least one expert model; and (“For instance, one component of a multiscale refinement objective can be based on a comparison of student generated value(s) 106-2 and teacher generated value(s) 110. For example, the training system(s) 116 can compare student generated value(s) 106-2 and teacher generated value(s) 110 to evaluate how well the individual output(s) of the student model 104 align with the preferred output(s) of the teacher model 108.”)(e.g., paragraph [0077])
adding the output similarity data to the training dataset. (“For example, aligning a student distribution to a teacher distribution based on a mode-seeking divergence metric can cause the student distribution to learn to assign probability mass broadly over the output space, even for areas of the output space that might have low probability under the teacher distribution. This can increase a diversity of the output of the student model. However, for student models with relatively limited expressivity (as compared to the teacher model), a mode-seeking divergence can help the student model focus its expressivity on a narrower range of the output space around a mode of the teacher probability distribution. This can increase aspects of correctness of the student model by prioritizing alignment with higher-probability regions of the teacher distribution. However, focusing on a narrower range of the output space can decrease output diversity.)(e.g., paragraphs [0111]-[0113]).
Ramanujan discloses a system that employs responding to AI requests by incorporating two models (base AI model and delta model) to respond to the requests. However, Ramanujan is silent with respect to training the models offline. On the other hand, Agarwal, which also relates to knowledge distillation and employing larger and smaller models to respond to AI requests (title and abstract) does disclose that it is known to train the models offline, and to align the student models (similar to draft models of Ramanujan) with teacher models (similar to base model Ramanujan). This provides an effective manner to determine how aligned the results are with between the two models, and it would be desirable for the teacher and student models to be selected and processed together when retrieval results do not want diversity of results and want accuracy and similar results produced by both models. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of Applicant’s claimed invention to combine the offline training that considers similarity/divergence between the student/teacher models (base/draft models of Ramanujan) to enhance the manner in which models are selected to produce quicker results, by aligning the smaller and larger models to effectively process the user AI requests. 

Regarding claim 3, Ramanujan in view of Agarwal discloses the method of claim 2. Agarwal further discloses wherein the similarity of the output of a draft model to the output of the at least one expert model is represented by a similarity score that is computed using a similarity metric. (divergence metric is a measure of similarity of the outputs between the models)(e.g., figure 3 and paragraphs [0074], [0077] and [0111]-[0113]).

Regarding claim 4, Ramanujan in view of Agarwal discloses the method of claim 3. Ramanujan in view of Agarwal discloses wherein the output similarity data includes: an indication of a query from the training dataset; (“delta AI models are implemented by injecting trained decomposition matrices into the base AI model, or otherwise merging parameters of the delta AI model with the base AI model. This results in an adapted iteration of the base model that has been tailored toward a different context, although is similar in size to the base model when deployed.”)(Ramanujan: e.g., figures 1A and 1B and paragraph [0015])(“Input sequence 200 can provide context for one or more predictions by student model 104 or teacher model 108. For instance, one or more predictions by student model 104 or teacher model 108 can be conditioned on context data in input sequence 200.”)(Agarwal: e.g., paragraph [0084])
Ramanujan further discloses an identifier of a draft model; and (“The AI model fleet management service causes a network routing table to be updated at a control plane of a software-defined network (SDN), where the network routing table includes unique identifiers corresponding to different AI models, and network addresses at which the AI models are accessible.”)(e.g., paragraphs [0025] and [0067])
Agarwal further discloses a similarity score computed for the draft model. (“the reinforcement learning signal 114 can include a signal indicative of a quality (e.g. high quality, low quality) associated with one or more actions taken by the student model 104 (e.g. one or more student-generated values 106-3). In some instances, the quality can be a performance quality associated with one or more tasks (e.g., mathematical accuracy score on a mathematical reasoning task, factual accuracy score on a text generation task, entailment score associated with a summarization task, artistic quality score on a creative content generation task, etc.). In some instances, the reinforcement learning signal 114 can include a signal indicative of a desirability of one or more outcomes caused by the student model 104 (e.g. one or more desirable properties of a generated textual sequence or generated image). The reinforcement learning signal 114 can include, for example, a numerical indicator indicative of quality of an output sequence.”)(e.g., paragraph [0074])

Regarding claim 5, Ramanujan in view of Agarwal discloses the method of claim 3. Agarwal further discloses wherein the similarity score is computed based on an inference speed associated with the draft model. (“The student model 104 can be characterized by a computing cost (e.g. inference cost, pretraining cost, fine-tuning cost, memory usage, etc.) that is lower than a computing cost of the teacher model 108.”)(e.g., paragraphs [0067], [0106] and [0142])

Regarding claim 6, Ramanujan in view of Agarwal discloses the method of claim 5. Agarwal further discloses wherein the similarity score is computed as a weighted sum of a value of the similarity metric and the inference speed associated with the draft model. (“The student model 104 can be characterized by a computing cost (e.g. inference cost, pretraining cost, fine-tuning cost, memory usage, etc.) that is lower than a computing cost of the teacher model 108.” “In some instances, the mixture distribution can be obtained by interpolating between a student distribution and a teacher distribution based on a weight (e.g., 0.1, 0.9, 0.5 etc.). In some instances, the mixture distribution can correspond to a weighted combination of a student distribution and a teacher distribution (e.g. 0.1*student+0.9*reference). In some instances, the weight can be a hyperparameter that can be learned during training.” ”the objective function can be a weighted sum (e.g. (1 minus alpha)*(RL signal)+alpha*divergence). In some experiments, a student model 104 trained with alpha as high as 0.5 can achieve a higher entailment score than the teacher model 108 it learned from, even when the teacher model 108 is much larger (e.g. 38 times as many parameters) than the student model 104.”)(e.g., paragraphs [0067], [0074], [0115]-[0118] and [0165]).

Regarding claim 7, Ramanujan discloses the method of claim 1. However, Ramanujan does not appear to specifically disclose further comprising, for a new draft model: obtaining, for each query in the training dataset, an output from the new draft model; and adding the outputs from the new draft model to the offline training dataset.
On the other hand, Agarwal, which relates to efficient knowledge distillation framework for training machine-learned models (title), does disclose further comprising, for a new draft model: obtaining, for each query in the training dataset, an output from the new draft model; and (“Model development platform 12 can provide a number of different toolkits that developer systems can employ in the development of new or adapted machine-learned models.” “One or more output(s) 3 can be iteratively or recursively generated to sequentially process and accomplish steps toward answering the question. For instance, an initial output can be executed by an external system or be processed by machine-learned model(s) 1 to complete an initial step of obtaining an answer to the question (e.g., querying a database, performing a computation, executing a script, etc.). Multiple steps can be performed, with a final output being obtained that is responsive to the question.”)(e.g., paragraphs [0203], [0204], [0229] and [0256])
adding the outputs from the new draft model to the offline training dataset. (“The example method can include, in an offline process, obtaining the plurality of predictions of the teacher machine-learned sequence processing model and updating the machine-learned student sequence processing model based on the multiscale refinement objective.” “in an offline process, obtaining the plurality of predictions of the teacher machine-learned sequence processing model and updating the machine-learned student sequence processing model based on the multiscale refinement objective.”)(e.g., paragraphs [0017] and [0143]).
It would have been obvious to combine Agarwal with Ramanujan for the same reasons provided in claim 2, above.
Claims 12-17 have substantially similar limitations as stated in claims 2-7, respectively; therefore, they are rejected under the same subject matter. 

Conclusion
The prior art made of record, listed on form PTO-892, and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHARD L BOWEN whose telephone number is (571)270-5982. The examiner can normally be reached Monday through Friday 7:30AM - 4:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached at (571)270-1760. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RICHARD L BOWEN/            Primary Examiner, Art Unit 2165
Read full office action
Prosecution Timeline

Jan 17, 2025
Application Filed
Jan 09, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/411,634
Patent 12602365
Method for Transmitting a Bloom Filter From a Transmitter Unit to a Receiver Unit
2y 5m to grant Granted Apr 14, 2026
18/760,920
Patent 12597044
TRANSFORMING QUALITATIVE SURVEY INTO QUANTITATIVE SURVEY USING DOMAIN KNOWLEDGE AND NATURAL LANGUAGE PROCESSING
2y 5m to grant Granted Apr 07, 2026
18/911,562
Patent 12596752
INFORMATION PROCESSING APPARATUS, CONTENT GENERATION SYSTEM, AND CONTROL METHOD
2y 5m to grant Granted Apr 07, 2026
18/056,317
Patent 12585921
NODE SELECTION APPARATUS AND METHOD FOR MAXIMIZING INFLUENCE USING NODE METADATA IN NETWORK WITH UNKNOWN TOPOLOGY
2y 5m to grant Granted Mar 24, 2026
18/428,626
Patent 12585699
SYSTEM, METHOD, AND COMPUTER PROGRAM FOR MULTIMODAL VIDEO RETRIEVAL
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
80%
Grant Probability
99%
With Interview (+27.7%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 544 resolved cases by this examiner. Grant probability derived from career allow rate.