Last updated: April 19, 2026
Application No. 18/355,055
KNOWLEDGE DISTILLATION FROM NON-STREAMING TO STREAMING ENCODER

Non-Final OA §103
Filed
Jul 19, 2023
Examiner
OGUNBIYI, OLUWADAMILOL M
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
3 (Non-Final)
Interview Optional

— +18.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 304 resolved cases, 2023–2026
Examiner Intelligence

OGUNBIYI, OLUWADAMILOL M View full profile →
Grants 78% — above average
Career Allow Rate
236 granted / 304 resolved
+15.6% vs TC avg
Strong +19% interview lift
Without
With
+18.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 12m
Avg Prosecution
31 currently pending
Career history
335
Total Applications
across all art units
Statute-Specific Performance

§101
20.1%
-19.9% vs TC avg
§103
47.0%
+7.0% vs TC avg
§102
12.1%
-27.9% vs TC avg
§112
13.7%
-26.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 304 resolved cases
Office Action

§103
DETAILED ACTION
Claims 1, 2, 4 – 7, 9 – 21 and 23 – 31 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant’s submission filed on 08 December 2025 has been entered.
Response to Amendment
With regard to the Final Office Action from 09 October 2025, the Applicant has filed a response on 08 December 2025.
Claim 8 is cancelled.
New claim — 31 has been added.
The 35 U.S.C. 112(f) interpretation given to limitations of claim 30 is maintained.
Response to Arguments

With regard to the 35 U.S.C. 103 rejection given to previous claim 8 based on the use of the Kundu, Souvik, et al. reference, the Applicant indicates (Remarks: page 9 par 3) that this reference fails to teach or suggest the use of auxiliary non-streaming layers for the purpose of bridging the context gap between non-streaming and streaming models. The Examiner indicates that the claim limitation fails to indicate any limitation concerning ‘bridging the context gap between the non-streaming and streaming models’ so this isn’t considered by the Examiner. The Applicant further indicates that the auxiliary classifier of the applied reference is different from the auxiliary non-streaming layers inserted between an encoder of the non-streaming model and the encoder of the streaming model for layer-wise distillation, nor does it teach removing such layers after training. The Examiner once again indicates that the claim limitations do not teach of the indicated layer-wise distillation indicated here. Regarding the other indicated limitations, the applied reference, referring to Figure 4, shows a representation from a teacher/non-streaming model, through an auxiliary layer, to a student/streaming model. It uses an intermediate auxiliary classifier as the claimed (auxiliary non-streaming layer) between the non-streaming (teacher) model and the streaming (student) model, thereby rendering it suitable to read upon the limitations of this claim. However, for the purpose of moving the application forward, the Examiner acquiesces to the Applicant’s arguments. Based on the new grounds of rejection raised by the amendment to the independent claims, the Examiner will address the claims by their current presentation presented later on in this Office Action.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Such claim limitations are:
“means for determining one or more words in a speech signal …” in claim 30;
“means for taking an action based on the determined one or more words” in claim 30.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover one or more processors as the corresponding structure described in [0012] of the Specification as performing the claimed function, and equivalents thereof.
If Applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, Applicant may: (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 4, 5, 6, 7, 10, 11, 12, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 27, 28, 29, 30 and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Doutre et al. (US 2022/0343894 A1: hereafter — Doutre) in view of Wang, Lin, and Kuk-Jin Yoon. (“Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks.” IEEE transactions on pattern analysis and machine intelligence 44.6 (2021): 3048-3068: hereafter — Wang).
For claim 1, Doutre discloses a device configured to automatically recognize speech (Doutre: [0015] — automatic speech recognition), the device comprising:
memory configured to store (Doutre: [0005] — hardware memory) a speech signal representative of speech and a streaming model configured to recognize a portion of a first captured utterance in real-time, the streaming model comprising an on-device, real-time streaming automatic speech recognition (ASR) model (Doutre: [0016] — ‘a streaming ASR model refers to a speech recognition model deployed in an ASR system that may be used to transcribe speech in real-time’ with the system receiving sentences of speech as input (indicating a storage of the speech signal representing speech and the transcription of speech in real time or near-real-time indicates the recognition of a portion of a first captured utterance in real-time); ‘on-device speech recognition tasks have come to use streaming ASR models’; ‘a streaming ASR model refers to a speech recognition model deployed in an ASR system’ (indicating memory for storing a streaming model); [0004] — a streaming ASR model) including an encoder and an inferencer (Doutre: [0007] — ‘The streaming ASR student model may include a conformer based encoder’ (teaching of the presence of an encoder at the streaming model); [0016] — ‘To perform transcription in real-time, a streaming ASR model produces and updates transcription results (i.e., hypotheses or predictions) on a frame-by-frame basis’ (the generation of hypotheses and predictions teaches of performing inferencing to infer words contained in the speech, thereby indicating the presence of an inference at the streaming model));
one or more processors implemented in circuitry coupled to the memory (Doutre: [0046] — a processor and a memory), the one or more processors being configured to:
determine one or more words in the speech signal based on one or more transfers of learned knowledge from a non-streaming model to the [[encoder of the streaming model, but not to the inferencer of the streaming model]] (Doutre: [0031] — a non-streaming ASR model distils its knowledge to a streaming ASR model to perform speech recognition and obtain transcription (obtaining transcription means determining words); [0038] — obtaining a segment of a transcript which could be a word or a phrase (a phrase being more words)), wherein the non-streaming model is configured to recognize a second utterance of a plurality of words based on only an entirety of the second captured utterance (Doutre: [0016] — ‘[i]n contrast, a non-streaming ASR model leverages the full context of a given speech input to produce its transcription results’ (to indicate that the non-streaming model produces transcription for speech by making use of the full context, the full context being the entirety of the captured utterance)), [[wherein the one or more transfers of learned knowledge are based on a plurality of auxiliary non-streaming layers inserted between an encoder of the non-streaming model and the encoder of the streaming model for the purpose of facilitating knowledge transfer, the plurality of auxiliary non-streaming layers being separate from the streaming model and the non-streaming model, and wherein the auxiliary non-streaming layers [[are present during training and]] are removed for inference (Doutre: [0028] — the streaming model is deployed on the device and not on the remote system, so that all speech recognition performed on the device are performed only by the streaming model (the non-streaming model or any other model or layer is not present during inference); [0016] — ‘To perform transcription in real-time, a streaming ASR model produces and updates transcription results (i.e., hypotheses or predictions) on a frame-by-frame basis’ (further indicating the use of only the streaming model for the purpose of inference, absent any other model such as the claimed non-streaming auxiliary layers)); and
take an action based on the determined one or more words, wherein the action comprises at least one of processing the one or more words into text, responding to a command, or responding to a query (Doutre: [0022] — performing an operation/task based on a recognised speech command, with an utterance 12U, being a query; [0024] — providing a response 122 to the query).
The reference of Doutre does not directly present teaching for the limitation: determine one or more words in the speech signal based on one or more transfers of learned knowledge from a non-streaming model to the encoder of the streaming model, but not to the inferencer of the streaming model … This can however be inferred from this same reference as:
determine one or more words in the speech signal based on one or more transfers of learned knowledge from a non-streaming model to the encoder of the streaming model, but not to the inferencer of the streaming model (Doutre: [0028] — a transfer of the distilled knowledge from the teacher model to the student model; [0029] — the presence an encoder with multiple layers, the model having an encoder-decoder architecture, (in the presence of an encoder-decoder architecture, the encoder would come first and would be responsible for receiving the input information); [0031] — ‘Here, the teacher model 210 distills its knowledge to the student model 152 by training the student model 152 with a plurality of student training samples 232 that include, at least in part, labels or transcriptions 212 generated by the teacher model 210. In this sense, during the training process 200, the student model 152 learns to predict its own transcriptions 154 from the transcriptions 212 produced by the teacher model’ (showing that the prediction or inference of the transcription would occur after the receipt of the training samples. This is suitable to teach of receiving the distilled knowledge at the encoder, rather than at the inferencer); [0032] — training samples 232 are sent from the teacher model to the student model (this indicates that the non-streaming model transfers data to the streaming model, training samples being transferred, which would obviously be received by the encoder of the streaming model given that the encoder of a model would typically receive input information); ‘[s]ince the student model 150 is a streaming model that produces and updates speech recognition results that form the transcription 154 on a frame-by-frame basis’ (showing the later inference of speech recognition results on a frame-by-frame basis to teach of a later transfer to the inferencer, further showing that the learned knowledge was not transferred to the inferencer, but to the encoder, an encoder being capable of receiving input data and converting it into a format that would be used by the streaming model to make its recognition inference/prediction)).
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate this above teaching in order to be able to address the indicated limitation of the claim to come up with the claimed invention, since the encoder of a model is well-known in the art to be the first part in an encoder-decoder architecture which receives input information so that the information can be processed into a format that that the model would understand and be able to work with, given the predictable result of separating the steps of the initial processing of the input, and the later generation of predictions of speech recognition, producing a properly streamlined recognition process. The performing of the function of the inference would have to come after the encoder has received the distilled knowledge information.
The reference of Doutre provides teaching for an encoder at the non-streaming mode [0040] and an encoder at the streaming model [0029], with the task of distilling knowledge from the non-streaming model to the streaming model. This reference however fails to teach the further limitation regarding the transfer of learned knowledge based on a plurality of auxiliary non-streaming layers inserted between the non-streaming model and the streaming model.
The reference of Wang is now introduced to teach this as:
determine one or more words in the speech signal based on one or more transfers of learned knowledge from a non-streaming model to the encoder of the streaming model, but not to the inferencer of the streaming model, wherein the non-streaming model is configured to recognize a second utterance of a plurality of words based on only an entirety of the second captured utterance, wherein the one or more transfers of learned knowledge are based on a plurality of auxiliary non-streaming layers inserted between an encoder of the non-streaming model and the encoder of the streaming model for the purpose of facilitating knowledge transfer, the plurality of auxiliary non-streaming layers being separate from the streaming model and the non-streaming model, and wherein the auxiliary non-streaming layers are present during training [[and are removed for inference]] (Wang: Page 3049 Col 2 2 — knowledge transfer from teacher (non-streaming) model to the student (streaming) model; page 3056 Col 2 KD via Layer-Wise Estimation — a teacher (non-streaming) model first gets compressed through network pruning in order to be used for creating (or training) a student (streaming) model (the compressions and pruning of the teacher model results in a new intermediate or auxiliary model which is based off the non-streaming model and lies between the non-streaming model and the streaming model but not part of either) for the purpose of performing knowledge distillation (facilitating knowledge transfer)).
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to modify the teaching of Doutre which teaches of distilling knowledge from the non-streaming model to the streaming model, by incorporating the known teaching of Wang which introduces a plurality of auxiliary layers between the non-streaming model and the streaming model, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of transferring knowledge derived from the non-streaming model to the streaming model in situations where the training is data-heavy and gets improved in this manner by using small amounts of data, aligning the selected parameters of the non-streaming model with the trained streaming model (Wang: 5.2).
For claim 2, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the non-streaming model comprises a trained non-streaming model (Doutre: [0036] — a series of training for the teacher (non-streaming) model) and wherein the one or more transfers of learned knowledge from the non-streaming model to the streaming model comprise training the streaming model using the non-streaming model (Doutre: [0028], [0030] — the streaming ASR models learns how to generate speech recognition results from a non-streaming ASR model).
For claim 4, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the encoder comprises multiple layers (Doutre: [0029] — an encoder network with multiple layers).
For claim 5, claim 4 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the encoder is configured to receive transferred knowledge at selected layers of the multiple layers (Doutre: [0029] — ‘[w]hen the student model 152 is an RNN-T model, the student model 152 may include an encoder network with multiple layers of unidirectional Long Short Term Memory (LSTM) cells (e.g., 8 layers of LSTMs with 2048 cells)’ (indicating that the knowledge is received at the encoder of the streaming model, which has multiple layers which receive the distilled knowledge); [0031] — a non-streaming ASR model distils its knowledge to a streaming ASR model (with the citation from [0029] this shows the transfer of the knowledge to available layers of the multiple layers at the encoder of the streaming model)).
For claim 6, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the one or more transfers of learned knowledge are from an encoder of the non-streaming model to the encoder of the streaming model (Doutre: [0031] — the non-streaming model distils its knowledge to the streaming model; [0007] — both the streaming and non-streaming models may include a conformer-based encoder).
For claim 7, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the non-streaming model comprises a non-streaming ASR model (Doutre: [0004] — the presence of non-streaming ASR models).
For claim 10, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the one or more transfers of learned knowledge are based on a knowledge distillation (KD) loss function (Wang: Page 3050 Col 1 Par 2 — a knowledge distillation loss used during distillation).
For claim 11, claim 10 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the KD loss function comprises at least one of a distance loss, a Kullback-Leibler divergence loss, or an autoregressive predictive coding loss (Wang: Page 3053 Col 1 Par 3, Page 3063 Col 1 Par 2 — distance loss).
For claim 12, claim 10 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the KD loss function comprises at least two of a distance loss, a Kullback-Leibler divergence loss, or an autoregressive predictive coding loss (Wang: Page 3053 Col 1 Par 3, Page 3063 Col 1 Par 2 — distance loss; Page 3053 Col 1 Par 3 — a KL-divergence loss (as indicated by the cited document G. Aguilar, Y. Ling, Y. Zhang, B. Yao, X. Fan, and C. Guo, “Knowledge distillation from internal representations,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 5, pp. 7350–7357, 2020)).
For claim 15, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the speech comprises an utterance comprising the one or more words (Doutre: [0031] — a non-streaming ASR model distils its knowledge to a streaming ASR model to perform speech recognition and obtain transcription (obtaining transcription means determining words); [0038] — obtaining a segment of a transcript which could be a word or a phrase (a phrase being more words)).
For claim 16, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, further comprising one or more microphones configured to capture the speech signal (Doutre: [0024] — microphone to capture an utterance in an audio stream).
For claim 17, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein at least one of the one or more transfers of learned knowledge occurs prior to the streaming model being located on the device (Doutre: [0045] — an implementation of the speech recognition system 150 on computing device 500 which can be a server, indicating that the speech recognition system which contains the streaming model 152 can be located on the server, and thereby receives its knowledge transfer before being located on the device).
For claim 18, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein at least one of the one or more transfers of learned knowledge occurs after the streaming model is located on the device (Doutre: [0028] — ‘prior to inference or deployment of the model 152 (i.e., prior to implementation), the streaming ASR model 152 learns how to generate speech recognition results (i.e., transcripts 154 or labels) from a non-streaming ASR model (e.g., shown as teacher model 210 in FIGS. 2A-3B) located in the remote system 140’ indicating that before deployment of the streaming model which is located on the device, the streaming model would have been trained by a remote non-streaming model, to show that the knowledge transfer would occur after the streaming model is located on the device), the device comprising a mobile device, a smart speaker system, a vehicle, or a robot (Doutre: [0022] — user devices comprising mobile phones, smart speakers).
For claim 19, claim 1 is incorporated and the combination of Doutre in view of Wang discloses the device, wherein the action comprises at least one of processing speech into text, responding to a command, or responding to a query (Doutre: [0022] — performing an operation/task based on a recognised speech command).
As for claim 20, method claim 20 and device claim 1 are related as method detailing procedures for using the claimed device, with each claimed element’s function corresponding to the claimed device parts. Accordingly, claim 20 is similarly rejected under the same rationale as applied above with respect to device claim 1.
As for claim 21, method claim 21 and device claim 2 are related as method detailing procedures for using the claimed device, with each claimed element’s function corresponding to the claimed device parts. Accordingly, claim 21 is similarly rejected under the same rationale as applied above with respect to device claim 2.
As for claim 23, method claim 23 and device claim 4 are related as method detailing procedures for using the claimed device, with each claimed element’s function corresponding to the claimed device parts. Accordingly, claim 23 is similarly rejected under the same rationale as applied above with respect to device claim 4.
As for claim 25, method claim 25 and device claim 6 are related as method detailing procedures for using the claimed device, with each claimed element’s function corresponding to the claimed device parts. Accordingly, claim 25 is similarly rejected under the same rationale as applied above with respect to device claim 6.
As for claim 26, method claim 26 and device claim 7 are related as method detailing procedures for using the claimed device, with each claimed element’s function corresponding to the claimed device parts. Accordingly, claim 26 is similarly rejected under the same rationale as applied above with respect to device claim 7.
For claim 27, claim 20 is incorporated and the combination of Doutre in view of Wang discloses the method, wherein the one or more transfers of learned knowledge are based on a plurality of auxiliary non-streaming layers between the streaming model and the non-streaming model, the plurality of auxiliary non-streaming layers being separate from the streaming model and the non-streaming model (Wang: Page 3049 Col 2 2 — knowledge transfer from teacher (non-streaming) model to the student (streaming) model; page 3056 Col 2 KD via Layer-Wise Estimation — a teacher (non-streaming) model first gets compressed through network pruning in order to be used for creating (or training) a student (streaming) model (the compressions and pruning of the teacher model results in a new intermediate or auxiliary model which is based off the non-streaming model and lies between the non-streaming model and the streaming model but not part of either) for the purpose of performing knowledge distillation (facilitating knowledge transfer)).
As for claim 28, method claim 28 and device claim 19 are related as method detailing procedures for using the claimed device, with each claimed element’s function corresponding to the claimed device parts. Accordingly, claim 28 is similarly rejected under the same rationale as applied above with respect to device claim 19.
As for claim 29, computer program product claim 29 and method claim 1 are related as computer program product storing executable instructions required for performing the claimed method steps on a computer. Doutre in [0047] provides teaching for a non-transitory memory storage suitable to read upon the limitations of this claim. Accordingly, claim 29 is similarly rejected under the same rationale as applied above with respect to method claim 1.
For claim 30, it is analysed and rejected by the same reasons set forth in the rejection of claim 1 above given that both instant claims have similar limitations.
For claim 31, claim 1 is incorporated and the combination of Doutre in view of Wang disclose the device, wherein the auxiliary non-streaming layers facilitate layer- wise distillation between selected layers of the non-streaming and streaming model encoders, and the one or more processors are configured to enable knowledge transfer using both labeled and unlabeled data, and to avoid output misalignment between the non-streaming and streaming models (Wang: Page 3056 Col 2 KD via Layer-Wise Estimation — knowledge distillation layer-wise estimation (distillation) occurring as having a new layer that is a compression and pruning of the teacher model to then get fed 1×1 to the student model (teaching of aligning both models, thereby avoiding misalignment);
Doutre: [0029] — an encoder at the streaming model; [0040] — teaching for an encoder at the non-streaming mode; [0032] — labelling unlabelled data (teaching of the presence of labelled data and unlabelled data)).
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Doutre (US 2022/0343894 A1) in view of Wang (“Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks.” IEEE transactions on pattern analysis and machine intelligence 44.6 (2021): 3048-3068) as applied to claim 1, further in view of Xia et al. (US 2021/0374603 A1: hereafter — Xia).
For claim 9, claim 1 is incorporated and the combination of Doutre in view of Wang provides teaching for knowledge transfer from the plurality of auxiliary non-streaming layers.
This combination of Doutre in view of Wang however fails to teach the further limitation of this claim, for which Xia is now introduced to teach as:
the device, wherein the one or more transfers of learned knowledge are based on a modified attention mask associated with the plurality of auxiliary non-streaming layers (Xia: [0052] — multiple transformer layers that employ an attention mask whereby certain tokens can only attend to other tokens (indicating a modified nature of the attention mask)).
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Xia which presents a modified attention mask, with the teaching of the combination of Doutre in view of Wang which provides the transfer of learned knowledge from non-streaming layers, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of focusing on only certain layers of the non-streaming model to be applied to the streaming model. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Doutre (US 2022/0343894 A1) in view of Wang (“Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks.” IEEE transactions on pattern analysis and machine intelligence 44.6 (2021): 3048-3068) as applied to claim 23, further in view of BUDUGUPPA et al. (US 2023/0196024 A1: hereafter — Buduguppa).
For claim 24, claim 23 is incorporated but the combination of Doutre in view of Wang fails to explicitly teach the limitation of this claim, for which Buduguppa is now introduced to teach as the method, wherein the encoder transfers knowledge at selected layers of the multiple layers (Buduguppa: [0046] — certain layers are selected to form the starting point for training the student model).
The combination of Doutre in view of Wang provides teaching for an encoder with multiple layers. It differs from the claimed invention in that the claimed invention further provides teaching of the encoder transferring knowledge at selected layers of the multiple layers. This isn’t new to the art as the reference of Buduguppa is seen to teach above.
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to incorporate the teaching of Budugguppa which selects layers for training the student model for the knowledge transfer, with the teaching of the combination of Doutre in view of Wang which provides multiple layers at the encoder, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of having a student model with a beginning configuration that ‘more readily and accurately learns to mimic the output of the teacher model’ (Buduguppa: [0046]).
Allowable Subject Matter
Claims 13 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
With regard to dependent claim 13, the prior art of record fail to teach, inter alia, 
a device which computes a KD loss function, this particularly comprising a weighted sum of the distance loss, the Kullback-Leibler divergence loss, and the autoregressive predictive coding loss.
Claims 13 and 14 are hereby objected to as being dependent upon a rejected base claim.
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
KRISHNAN et al. (US 2023/0401831 A1) provides teaching for training a student model by having the student model closely follow and mirror every intermediate state of the teacher model instead of waiting for a fully refined/trained teacher model [0062].
Yu et al. (US 20240175697 A1) provides teaching for training a student model using a teacher model, but that, during inference time, only the student model is used [0063].
Li, Hao-Ting, et al. (“Layer-level knowledge distillation for deep neural network learning.” Applied Sciences 9.10 (2019): 1966) provides teaching for an Auxiliary structure learning as a way of applying the teacher model to train the student model (3.1. Auxiliary Structure Learning (ASL)).
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to OLUWADAMILOLA M. OGUNBIYI whose telephone number is (571)272-4708. The Examiner can normally be reached Monday – Thursday (8:00 AM – 5:30 PM Eastern Standard Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, PARAS D. SHAH can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/OLUWADAMILOLA M OGUNBIYI/Examiner, Art Unit 2653

/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        
03/08/2026
Read full office action
Prosecution Timeline

Jul 19, 2023
Application Filed
May 17, 2025
Non-Final Rejection — §103
Jul 15, 2025
Interview Requested
Jul 24, 2025
Applicant Interview (Telephonic)
Jul 24, 2025
Examiner Interview Summary
Jul 28, 2025
Response Filed
Oct 07, 2025
Final Rejection — §103
Dec 08, 2025
Response after Non-Final Action
Feb 02, 2026
Request for Continued Examination
Feb 10, 2026
Response after Non-Final Action
Mar 07, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/615,766
Patent 12579979
NAMING DEVICES VIA VOICE COMMANDS
2y 5m to grant Granted Mar 17, 2026
19/024,112
Patent 12537007
METHOD FOR DETECTING AIRCRAFT AIR CONFLICT BASED ON SEMANTIC PARSING OF CONTROL SPEECH
2y 5m to grant Granted Jan 27, 2026
18/082,346
Patent 12508086
SYSTEM AND METHOD FOR VOICE-CONTROL OF OPERATING ROOM EQUIPMENT
2y 5m to grant Granted Dec 30, 2025
17/693,171
Patent 12499885
VOICE-BASED PARAMETER ASSIGNMENT FOR VOICE-CAPTURING DEVICES
2y 5m to grant Granted Dec 16, 2025
17/988,376
Patent 12469510
TRANSFORMING SPEECH SIGNALS TO ATTENUATE SPEECH OF COMPETING INDIVIDUALS AND OTHER NOISE
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
96%
With Interview (+18.6%)
2y 12m
Median Time to Grant
High
PTA Risk
Based on 304 resolved cases by this examiner. Grant probability derived from career allow rate.