Last updated: May 29, 2026
Application No. 18/157,277
NEURAL NETWORK DISTILLATION METHOD AND APPARATUS

Non-Final OA §101§103
Filed
Jan 20, 2023
Priority
Jul 24, 2020 — continuation of PCTCN2020104653
Examiner
MARU, MATIYAS T
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
2 (Non-Final)
Interview Optional

— +7.5% interview lift. Interview lift (+7.5%) is below the 15.0% threshold. A written response is recommended.
Based on 45 resolved cases, 2023–2026
Examiner Intelligence

MARU, MATIYAS T View full profile →
Grants 62% of resolved cases
Career Allowance Rate
28 granted / 45 resolved
+7.2% vs TC avg
Moderate +8% lift
Without
With
+7.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
21 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
16.0%
-24.0% vs TC avg
§103
82.0%
+42.0% vs TC avg
§112
2.0%
-38.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 45 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Note
This second Non-Final action is issued upon further consideration, the prior art of record did not fully address the whole requirements of the claimed limitation; therefore, this subsequent office action is issued. Applicant’s argument (regarding rejection under 35 U.S.C. § 103(a): (pg. 18)), have been found persuasive with respect to the previously applied rejection.
In regards to the 35 U.S.C 112(b) rejection, has been withdrawn in light of the instant amendments to the claims.

Response to argument
Applicant's arguments filed 02/25/2026 ("Arguments/Remarks") have been fully considered but they are not persuasive.

Argument – 1: (page: 11 – 12) Applicant contends: “Claim 1, as presented above, is not directed to a mental process. A mental process is limited to concepts that are performed in the human mind, such as observation, evaluation, or judgment…”
Regarding the above argument, the Examiner respectfully notes that the rejected claims limitations contains abstract idea, such as:
“obtaining a sample set …”, under the broadest reasonable interpretation, the claim recites an abstract idea – mental process which involves observing data to collect information in a sample set.
“determining a first distillation manner based on data features of the sample set …” under the broadest reasonable interpretation, the claim recites an abstract idea – mental process which involves evaluating characteristics of sample set to determine a distillation manner.

Argument – 2: (page: 12 – 14) Applicant contends: “Furthermore, even if, for the sake of argument, claim 1 recites matters within a grouping, claim 1 integrates the alleged judicial exception into a practical application, which apply, rely on, or use the alleged judicial exception by producing specific improvements in neural network technology, as expressly described throughout this application. For example, as explained in paragraphs 7, and 185-186 of this application, the recited claim limitations achieve the following concrete technical improvements: …”
Regarding the above argument, the Examiner respectfully disagrees with Applicant’s assertion that claim 1 integrates the alleged judicial exception into a practical application by producing specific improvements in neural network technology. The claim lacks sufficient details required to support a conclusion that the claim recites a technological improvement. The cited paragraphs (¶ 7, 185 – 186) merely describe that unbiased samples and stable features are used in a knowledge distillation process and that different distillation strategy may be selected to improve performance. However, the paragraphs  do not provide any specific technical details or concrete mechanism explaining how the improvement in accuracy, stability or generalization in actually achieved. To determine if the disclosure provides sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. The specification need not explicitly set forth the improvement, but it must describe the invention such that the improvement would be apparent to one of ordinary skill in the art. Conversely, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology. If the specification sets forth an improvement in technology, the claim must be evaluated to ensure that the claim itself reflects the disclosed improvement.,” MPEP 2106.04(d)(1).

Argument – 3: (page: 15 - 16) Applicant contends: “Furthermore, even if the claims are, assuming for the sake of argument, directed to an abstract idea, applicants respectfully submit that the elements of claim 1 recite significantly more which amounts to an inventive concept. The claims in DDR are tied to computer technology, even if there is not a technical improvement, because the claims were found to include a specific way of automating a web site creation by incorporating elements from different sources, and was found to include significantly more than the abstract idea…”
Regarding the above argument, the Examiner respectfully notes that the additional limitation merely reiterates the claim limitations in functional terms and asserts an improvement in conclusory manner without additional technical detail. While it references using biased and unbiased datasets in a distillation manner, it does not provide sufficient details such that one of ordinary skill in the art would recognize how the arrangement is implemented to reduce bias or improve accuracy. Instead, it restates the intended outcome that the approach improves training and output quality, without describing specific mechanism, modification of the training process or structural changes to the neural network that would account for the improvement. Accordingly, the additional elements do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claim(s) 1 – 24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without significantly more.
In step 1, of the 101-analysis set forth in the MPEP 2106, the examiner has determined
that the following limitations recite a process that, under the broadest reasonable interpretation, falls within one or more statutory categories (processes).
In step 2A prong 1, of the 101-analysis set forth in MPEP 2106, the Examiner has determined
that the following limitations recite a process that, under broadest reasonable interpretation, covers
a mental process but for the recitation of generic computer components:
Regarding claim 1, 
obtaining a sample set,
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing data to collect information in a sample set. See (MPEP 2106.04)).
determining a first distillation manner based on data features of the sample set
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating characteristics of sample set to determine a distillation manner. See (MPEP 2106.04)).

If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• 	The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
wherein the sample set comprises a biased data set and an unbiased data set, the biased data set comprises biased samples, and the unbiased data set comprises unbiased samples;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
wherein, in the first distillation manner, a teacher model is trained by using the unbiased data set and a student model is trained by using the biased data set;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
training a first neural network based on the biased data set and the unbiased data set in the first distillation manner, to obtain an updated first neural network.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).

In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception: 
Regarding limitation (III), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP
2106.05(f). 
Regarding limitation (I and II), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h). 
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 2, dependent upon claim 1, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein samples in the sample set comprise input features and actual labels, and the first distillation manner is to perform distillation by using the input features of the samples in the sample set.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claim 17, recite similar subject matter as claim 2, so is rejected under the same rationale.
Regarding claim 3, dependent upon claim 2, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
training the first neural network by using the biased data set and the unbiased data set alternately, to obtain the updated first neural network
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
wherein, in the alternate training, a quantity of batch training iterations for training the first neural network by using the biased data set and a quantity of batch training iterations for training the first neural network by using the unbiased data set are in a preset ratio, and the input features of the samples in the sample set are used as inputs of the first neural network.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claim 18, recite similar subject matter as claim 3, so is rejected under the same rationale.
Regarding claim 4, dependent upon claim 2, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
setting a confidence for the biased samples in the biased data set
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating samples and assign a confidence value that reflects the evaluation. See (MPEP 2106.04)).
wherein the confidence is used to represent a bias degree of the biased samples; and
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
training the first neural network based on the biased data set, the confidence of the biased samples in the biased data set, and the unbiased data set, to obtain the updated first neural network, 
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
wherein the biased samples comprise the input features as inputs of the first neural network when the first neural network is trained
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claim 19, recite similar subject matter as claim 4, so is rejected under the same rationale.
Regarding claim 5, dependent upon claim 1, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the first distillation manner is to perform distillation based on prediction labels of the unbiased samples comprised in the unbiased data set, the prediction labels are output by an updated second neural network for the unbiased samples in the unbiased data set, and the 
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
updated second neural network is obtained by training a second neural network by using the unbiased data set
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claim 20, recite similar subject matter as claim 5, so is rejected under the same rationale
Regarding claim 6, dependent upon claim 5, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the sample set further comprises an unobserved data set, and the unobserved data set comprises a plurality of unobserved samples and 
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
wherein the training the first neural network based on the biased data set and the unbiased data set in the first distillation manner, to obtain the updated first neural network comprises: training the first neural network by using the biased data set, to obtain a trained first neural network, and training the second neural network by using the unbiased data set, to obtain the updated second neural network; 
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
acquiring a plurality of samples from the sample set, to obtain an auxiliary data set; and
The recitation in the additional limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity and well-understood routine and conventional (2106.05(d)). 
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TL| Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).
The additional limitations as analyze failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above.
updating the trained first neural network by using the auxiliary data set and by using prediction labels of the samples in the auxiliary data set as constraints, to obtain the updated first neural network, 
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
wherein the prediction labels of the samples in the auxiliary data set comprise labels output by the updated second neural network.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 7, dependent upon claim 5, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the training the first neural network based on the biased data set and the unbiased data set in the first distillation manner, to obtain the updated first neural network comprises: training the second neural network by using the unbiased data set, to obtain the updated second neural network; 
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
outputting prediction labels of the biased samples in the biased data set by using the updated second neural network; 
The recitation in the additional limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity and well-understood routine and conventional (2106.05(d)). 
Data gathering and outputting: see Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015).
The additional limitations as analyze failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above
performing weighted merging on the prediction labels of the biased samples and actual labels of the biased samples, to obtain merged labels of the biased samples; and 
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves comparing two sets of label values, assign relative weight and combing them to produce a merged result. See (MPEP 2106.04)).
training the first neural network by using the merged labels of the biased samples, to obtain the updated first neural network.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 8, dependent upon claim 2, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the data features of the sample set comprise a first ratio, the first ratio is a ratio of a sample quantity of the unbiased data set to a sample quantity of the biased data set, and the 
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
determining the first distillation manner based on the data features of the sample set comprises: selecting the first distillation manner matching the first ratio from a plurality of distillation manners.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves comparing available options, evaluating which one aligns with the specified ratio and choosing the appropriate option. See (MPEP 2106.04)).
Claim 21, recite similar subject matter as claim 8, so is rejected under the same rationale.
Regarding claim 9, dependent upon claim 1, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the first distillation manner comprises: training the teacher model based on features extracted from the unbiased data set, to obtain a trained teacher model, and performing knowledge distillation on the student model by using the trained teacher model and the biased data set.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claim 22, recite similar subject matter as claim 9, so is rejected under the same rationale.
Regarding claim 10, dependent upon claim 9, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
filtering input features of some unbiased samples from the unbiased data set 
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating data for certain characteristics and selecting or removing features based on judgement or criteria. See (MPEP 2106.04)).
by using a deep global balancing regression (DGBR) algorithm; 
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
training a second neural network based on the input features of some unbiased samples, to obtain an updated second neural network; and using the updated second neural network as the teacher model, using the first neural network as the student model, and performing knowledge distillation on the first neural network by using the biased data set, to obtain the updated first neural network.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 11, dependent upon claim 9, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the data features of the sample set comprise a quantity of feature dimensions of the sample set, and 
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
the determining the first distillation manner based on the data features of the sample set comprises: selecting the first distillation manner matching the quantity of the feature dimensions from a plurality of distillation manners.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves selecting a distillation manner that matches the quantity of feature dimensions, comparing that to multiple available distillation operations to choose the one aligns with the dimensionality. See (MPEP 2106.04)).
Regarding claim 12, dependent upon claim 1, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the first distillation manner is selected from a plurality of preset distillation manners, and the plurality of preset distillation manners comprise at least two distillation manners with different guiding manners of the teacher model for the student model.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 13, in Step 2A prong 1: 
A recommendation method, comprising: obtaining information about a target user and information about a recommended object candidate;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves obtaining information by retrieving details from user records. See (MPEP 2106.04)).
predicting a probability that the target user performs an operational action on the recommended object candidate
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating available information, forming an expectation about feature behavior and assigning a likelihood values. See (MPEP 2106.04)).
the first distillation manner is determined based on data features of the sample set
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves evaluating characteristics of sample set to determine a distillation manner. See (MPEP 2106.04)).

If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• 	The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
inputting the information about the target user and the information about the recommended object candidate into a recommendation model,
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity, See MPEP (2106.05(g))).
wherein the recommendation model is obtained by training a first neural network by using a biased data set and an unbiased data set in a sample set in a first distillation manner, the biased data set comprises biased samples, the unbiased data set comprises unbiased samples
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
the biased samples in the biased data set comprise information about a first user, information about a first recommended object, and actual labels, the actual labels of the biased samples in the biased data set are used to represent whether the first user performs an operational action on the first recommended object, 
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
the unbiased samples in the unbiased data set comprise information about a second user, information about a second recommended object, and actual labels, and the actual labels of the biased samples in the unbiased data set are used to represent whether the second user performs an operational action on the second recommended object.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception: 
Regarding limitation (I), additional elements considered extra/post solution activity, as analyzed above, are activity that are well-understood routine and conventional, specifically: the courts have recognized the computer functions as well‐understood, routine, and conventional functions.
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TL| Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).

Regarding limitation (II, III and IV), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h). 
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 14, dependent upon claim 13, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the unbiased data set is obtained in response to the recommended object candidate in a recommended object candidate set being displayed at a same probability, and the second recommended object is a recommended object candidate in the recommended object candidate set.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 15, dependent upon claim 14, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
the unbiased samples in the unbiased data set are obtained in response to the recommended object candidate in the recommended object candidate set being randomly displayed to the second user; or the unbiased samples in the unbiased data set are obtained in response to the second user searching for the second recommended object.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 16, 
The rest of the limitations are analogous to claim 1, so are rejected under similar rationale. 
A neural network distillation apparatus, comprising a processor, wherein the processor is coupled to a memory, the memory stores program instructions, and the program instructions stored in the memory are executed by the processor to perform:
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 23,  
The rest of the limitations are analogous to claim 1, so are rejected under similar rationale. 

A recommendation apparatus, comprising at least one processor and a memory, wherein the at least one processor is coupled to the memory, and is configured to read and execute instructions in the memory, to perform:
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 24, dependent upon claim 23, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites: 
wherein the unbiased data set is obtained in response to the recommended object candidate in a recommended object candidate set being displayed at a same probability, and the second recommended object is a recommended object candidate in the recommended object candidate set.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1 – 2 and 16 – 17 are rejected under 35 U.S.C. 103 as being unpatentable over Faibish et al., Pub. No.: US11687433B2, in view of Fukuda et al., "Efficient knowledge distillation from an ensemble of teachers" and Sun et al., Pub. No.: US20200364542A1. 
Regarding claim 1, Faibish teaches: A neural network distillation method, comprising: obtaining a sample set, wherein the sample set comprises a biased data set and an unbiased data set, the biased data set comprises biased samples, and the unbiased data set comprises unbiased samples; 
(Faibish, col. 19 line [3 – 15], “Once the training step 1206 has completed, control proceeds to the step 1208. At the step 1208, validation processing may be performed. During validation, the weights and bias values are not being adjusted [obtaining a sample set, wherein the sample set comprises a biased data set]. Rather, validation processing is generally evaluating the predictive capabilities of the current neural network model using the weights and bias values resulting from the training [the biased data set comprises biased samples]. The validation processing of the step 1208 may include performing neural network validation using a second data set often referred to as the validation data set. The validation data set is different than the training data set and is used to provide an unbiased evaluation of the current neural network [and an unbiased data set, … and the unbiased data set comprises unbiased samples] resulting from completion of the training using the training data set.”)

a student model is trained by using the biased data set; and
(Faibish, col. 18 line [1 – 7], “(75) When a neural network is trained, such as the first neural network [a student model] discussed above, to recognize the major faults, the weights and bias values of the neurons are learned [is trained by using the biased data set] and may be adjusted during the training process in order to find optimal values for the weights and bias values of the neurons to enable accurate prediction of the desired outputs for particular corresponding inputs.”)

training a first neural network based on the biased data set 
(Faibish, col. 19 line [3 – 15], “Once the training step 1206 has completed, control proceeds to the step 1208. At the step 1208, validation processing may be performed. During validation, the weights and bias values are not being adjusted [training a first neural network based on the biased data set]. Rather, validation processing is generally evaluating the predictive capabilities of the current neural network model using the weights and bias values resulting from the training. The validation processing of the step 1208 may include performing neural network validation using a second data set often referred to as the validation data set. The validation data set is different than the training data set and is used to provide an unbiased evaluation of the current neural network resulting from completion of the training using the training data set.”)
Faibish does not teach:
a teacher model is trained by using the unbiased data set
training a first neural network based on 
determining a first distillation manner based on data features of the sample set, wherein, in the first distillation manner
in the first distillation manner, to obtain an updated first neural network.
Sun teaches:
a teacher model is trained by using the unbiased data set and 
(Sun, “[0052] … Specifically, DP-SGD uses the noisy loss for optimization, and PATE and scale PATE approaches add perturbation on the voting strategy. It is noted that in the embodiments of FIG. 6, the teachers are trained with balanced datasets [a teacher model is trained by using the unbiased data set] (i.e.: teacher model is trained with unbiased (balanced) dataset), where the training data is equally split into subsets (e.g., n=2 or 4) for each teacher, where each teacher is good at label prediction for all labels.”)

training a first neural network based on 
(Sun, “[0052] … Specifically, DP-SGD uses the noisy loss for optimization, and PATE and scale PATE approaches add perturbation on the voting strategy. It is noted that in the embodiments of FIG. 6, the teachers are trained with balanced datasets [training a first neural network based on ], where the training data is equally split into subsets (e.g., n=2 or 4) for each teacher, where each teacher is good at label prediction for all labels.”)
Sun and Faibish are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Sun with teachings of Faibish to add teacher student training using intermediate layer outputs and predictions from pre trained models to enable more efficient learning, better generalization and reducing reliance on sensitive data. (Sun, Abstract).
Faibish in view of Sun do not teach: 
determining a first distillation manner based on data features of the sample set, wherein, in the first distillation manner
in the first distillation manner, to obtain an updated first neural network.
Fukuda teaches:
determining a first distillation manner based on data features of the sample set, wherein, in the first distillation manner, 
(Fukuda, page: 3697, “We extend this proposed technique to the generalized distillation framework, where in addition to distillation of information from teacher networks [determining a first distillation manner based on data features of the sample set, wherein, in the first distillation manner], privileged information available only during training is also factored in. To illustrate the efficacy of our approach we show how an improved narrow band CNN based acoustic model can be trained by using privileged information from outputs of broadband models, instead of training the student network on only narrow band teacher models.”)

in the first distillation manner, to obtain an updated first neural network.
(Fukuda, page: 3697, “We extend this proposed technique to the generalized distillation framework, where in addition to distillation of information from teacher networks [in the first distillation manner, to obtain an updated first neural network], privileged information available only during training is also factored in. To illustrate the efficacy of our approach we show how an improved narrow band CNN based acoustic model can be trained by using privileged information from outputs of broadband models, instead of training the student network on only narrow band teacher models.”)
Fukuda, Faibish and Sun are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Fukuda with teachings of Faibish and Sun to transfer information from complex teacher models to a compact student model for improved accuracy. (Fukuda, Abstract).
Claim 16 recites analogous limitations as claim 1, so is rejected under similar rationale. 
Regarding claim 2, Faibish in view of Sun and Fukuda teach the method of claim 1.
Fukuda further teaches: wherein samples in the sample set comprise input features and actual labels, and the first distillation manner is to perform distillation by using the input features of the samples in the sample set.
(Fukuda, page: 3697, “To facilitate this, we combine the distillation framework with a simple data augmentation strategy [the first distillation manner is to perform distillation by using the input features of the samples in the sample set]. In this approach, instead of augmenting data using various kinds of signal distortions to the input acoustic features [wherein samples in the sample set comprise input features and actual labels] as is often done, we augment the training data by creating multiple copies of data with corresponding soft output targets from various teachers.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Fukuda with teachings of Sun and Faibish for the same reasons disclosed for claim 1.
Claim 17 recites analogous limitations as claim 2, so is rejected under similar rationale. 
Claim(s) 4, 8, 19 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Faibish in view of Sun, Fukuda and in further view of Farrar et al., Pub. No.: US11250346B2. 
Regarding claim 4, Faibish in view of Sun and Fukuda teach the method of claim 2.
Faibish further teaches: wherein the biased samples comprise the input features as inputs of the first neural network when the first neural network is trained.
(Faibish, col. 19 line [3 – 15], “Once the training step 1206 has completed, control proceeds to the step 1208. At the step 1208, validation processing may be performed. During validation, the weights and bias values are not being adjusted [wherein the biased samples comprise the input features as inputs of the first neural network when the first neural network is trained]. Rather, validation processing is generally evaluating the predictive capabilities of the current neural network model using the weights and bias values resulting from the training. The validation processing of the step 1208 may include performing neural network validation using a second data set often referred to as the validation data set. The validation data set is different than the training data set and is used to provide an unbiased evaluation of the current neural network resulting from completion of the training using the training data set.”)
Faibish in view of Sun and Fukuda do not teach:
setting a confidence for the biased samples in the biased data set, wherein the confidence is used to represent a bias degree of the biased samples; and training the first neural network based on the biased data set, the confidence of the biased samples in the biased data set, and the unbiased data set, to obtain the updated first neural network, 
Farrar teaches: 
setting a confidence for the biased samples in the biased data set, wherein the confidence is used to represent a bias degree of the biased samples; and 
(Farrar, col. 11 line [35 – 48], “Similar to the bias rejection model 200, the cluster model 211, and/or the machine learning model 300, the bias scoring model 400 undergoes a training stage 402 to train the bias scoring model 400 to score a data set and, once trained, scores data sets during a scoring stage 404 based on the training from the training stage 402. During the training stage 402, the bias scoring model 400 receives one or more bias scoring training data set(s) 410 [setting a confidence for the biased samples in the biased data set, wherein the confidence is used to represent a bias degree of the biased samples]. Each bias scoring training data set 410 includes data such as biased data 412 and/or unbiased data 414 as well as a bias score 416. For instance, the bias score 416 is a numerical representation of bias within a data set. In some examples, the bias score 416 and/or the bias scoring training data set 410 originate from a scorer 140.”)
training the first neural network based on the biased data set, the confidence of the biased samples in the biased data set, and the unbiased data set, to obtain the updated first neural network, 
(Farrar, col. 12 line [5 – 13], “When the bias score 416 of the training data set 302 satisfies the score threshold 422 (e.g., exceeds the acceptable bias score value), the bias scoring model 400 approves the training data 302 set as an approved training data set 424 [training the first neural network based on the biased data set, the confidence of the biased samples in the biased data set, and the unbiased data set, to obtain the updated first neural network]. In some examples, an approved training data set 424 includes an approval indicator recognizable by the machine learning model 300 such that the machine learning model proceeds to generate an unbiased prediction value 310 (e.g., shown in FIG. 3).”)
Farrar, Sun, Faibish and Fukuda are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Farrar with teachings of Sun, Faibish and Fukuda to identify and correct bias in the training data before training the neural networks that detect state transitions. (Farrar, Abstract).
Claim 19 recites analogous limitations as claim 4, so is rejected under similar rationale. 
Regarding claim 8, Faibish in view of Sun and Fukuda teach the method of claim 2.
Faibish further teaches: the determining the first distillation manner based on the data features of the sample set comprises: selecting the first distillation manner matching the first ratio from a plurality of distillation manners.
(Faibish, (col. 20 line [66 – 67] – col. 21 line [1 - 6]), “When the neural network is being trained or retrained, processing is performed to tune, adjust and select values for the weights and biases that optimize the ability of the neural network to predict outputs given particular inputs [selecting the first distillation manner matching the first ratio from a plurality of distillation manners]. Thus during training and retraining, one or more of the weights and bias values may be updated (e.g., in comparison to prior values or starting values of the weights and biases prior to training or retraining)”)
Faibish in view of Sun and Fukuda do not teach:
wherein the data features of the sample set comprise a first ratio, the first ratio is a ratio of a sample quantity of the unbiased data set to a sample quantity of the biased data set,
Farrar teaches: 
wherein the data features of the sample set comprise a first ratio, the first ratio is a ratio of a sample quantity of the unbiased data set 
(Farrar, col. 2 line [47 - 59], “In some examples, when training the clustering model the method includes segmenting the received cluster training data set into clusters based on data characteristics of the known unbiased population of data. In this example, for each cluster of the clusters based on the data characteristics of the known unbiased population of data, the method includes determining the cluster weight by for each cluster of the cluster model based on a ratio of a size of a respective cluster to a size of the known unbiased population of data [wherein the data features of the sample set comprise a first ratio, the first ratio is a ratio of a sample quantity of the unbiased data set]. In some implementations, an unsupervised machine learning algorithm segments the received cluster training data set into clusters based on the data characteristics of the known unbiased population of data”)

to a sample quantity of the biased data set, and 
(Farrar, col. 6 line [3 - 11], “Here the term “weight(s)” (e.g., bias cluster weights 214, 214 a-n and training data set weights 218, 218 a-n) refers to values, such as ratios, that map to unique clusters formed from a process of clustering [to a sample quantity of the biased data set]. For populations, each cluster may pertain to a fraction of a population and thus the value of the fraction may be a weight associated with the cluster (e.g., subset of the population).”)
Farrar, Sun, Faibish and Fukuda are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Farrar with teachings of Sun, Faibish and Fukuda to identify and correct bias in the training data before training the neural networks that detect state transitions. (Farrar, Abstract).
Claim 21 recites analogous limitations as claim 8, so is rejected under similar rationale. 
Claim(s) 9, 12 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Faibish in view of Sun and Fukuda and in further view of HALL et al., Pub. No.: US20220344049A1. 
Regarding claim 9, Faibish in view of Sun and Fukuda teach the method of claim 1.
Faibish in view of Sun and Fukuda do not teach: 
wherein the first distillation manner comprises: training the teacher model based on features extracted from the unbiased data set, to obtain a trained teacher model, and performing knowledge distillation on the student model by using the trained teacher model and the biased data set
HALL teaches:
wherein the first distillation manner comprises: training the teacher model based on features extracted from the unbiased data set, to obtain a trained teacher model, and performing knowledge distillation on the student model by using the trained teacher model and the biased data set.
(HALL, “[0019] First, the set of Teacher model(s) are trained on the dataset of interest. The Teacher models can be of any neural network or model architecture, and can even be completely different architectures from each other or the Student model. They can either share the same dataset exactly, or have disjoint or overlapping subsets of the original dataset. Once the Teacher models are trained, the Student is trained using a distillation loss function to mimic the outputs of the Teacher models [performing knowledge distillation on the student model by using the trained teacher model and the biased data set]. The distillation process begins by first applying the Teacher model to a dataset that is made available to both the Teacher and Student models, known as the ‘transfer dataset.’ [wherein the first distillation manner comprises: training the teacher model based on features extracted from the unbiased data set, to obtain a trained teacher model] The transfer dataset can be hold-out, blind dataset drawn from the original dataset, or could be the original dataset itself Furthermore, the transfer dataset does not have to be completely labelled, i.e., with some portion of the data not associated with a known outcome.”)
HALL, Faibish, Sun and Fukuda are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of HALL with teachings of Faibish, Sun and Fukuda to allow neural network to benefit from richer and more varied internal state information while preserving data confidentiality. (HALL, Abstract).
Claim 22 recites analogous limitations as claim 9, so is rejected under similar rationale. 
Regarding claim 12, Faibish in view of Sun and Fukuda teach the method of claim 1.
Faibish in view of Sun and Fukuda do not teach: 
wherein the first distillation manner is selected from a plurality of preset distillation manners, and the plurality of preset distillation manners comprise at least two distillation manners with different guiding manners of the teacher model for the student model.
HALL teaches:
wherein the first distillation manner is selected from a plurality of preset distillation manners, and the plurality of preset distillation manners comprise at least two distillation manners with different guiding manners of the teacher model for the student model.
(HALL, “[0018] Another approach in AI and machine learning is known as ‘Knowledge Distillation’ [wherein the first distillation manner is selected from a plurality of preset distillation manners,] (shortened to Distillation) or ‘Student-Teacher’ models in which the distributions of the weight parameters obtained from one (or multiple) models (Teacher(s)) are used to inform the weight updates of another model (Student) via the loss function of the Student model [and the plurality of preset distillation manners comprise at least two distillation manners with different guiding manners of the teacher model for the student model]. We will use the term Distillation to describe the process of training a Student model using Teacher model(s). The idea behind this procedure is to train the Student model to mimic a set of Teacher model(s). The intuition behind this process, is that the Teacher models contain subtle but important relationships between the predicted output probabilities (soft labels) that are not present in the original predicted probabilities (hard labels) obtained directly from the model results in the absence of the distributions from the Teacher model(s).”)
HALL, Faibish, Sun and Fukuda are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of HALL with teachings of Faibish, Sun and Fukuda to allow neural network to benefit from richer and more varied internal state information while preserving data confidentiality. (HALL, Abstract).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Faibish in view of Sun, Fukuda, HALL and in further view of Farrar, Kuang et al., "Stable prediction across unknown environments."
Faibish in view of Sun, Fukuda and HALL teach the method of claim 9.
Faibish further teaches: training a second neural network based on the input features of some unbiased samples, to obtain an updated second neural network; 
(Faibish, col. 2 line [13 – 25], “training the first neural network using a first plurality of inputs denoting the system in the first intermediate state; obtaining a plurality of sets of internal state information of the first neural network, each set of the plurality of sets denoting an internal state of the first neural network at a different point in time after the first neural network has processed at least a portion of the first plurality of inputs; and training a second neural network, using the plurality of sets of internal state information [training a second neural network based on the input features of some unbiased samples, to obtain an updated second neural network], to detect the first intermediate state. Each set of the plurality of sets of internal state information may include weights that are applied to inputs of neurons of one or more hidden layers of the first neural network. Each set of the plurality of sets of internal state information may include weights that are applied to inputs of neurons of an output layer of the first neural network.”)

Fukuda further teaches: using the updated second neural network as the teacher model, using the first neural network as the student model, and performing knowledge distillation on the first neural network by using the biased data set, to obtain the updated first neural network.
(Fukuda, page: 3697, “We extend this proposed technique to the generalized distillation framework [performing knowledge distillation on the first neural network by using the biased data set, to obtain the updated first neural network], where in addition to distillation of information from teacher networks [and using the updated second neural network as the teacher model], privileged information available only during training is also factored in. To illustrate the efficacy of our approach we show how an improved narrow band CNN based acoustic model can be trained by using privileged information from outputs of broadband models, instead of training the student network on only narrow band teacher models. Privileged information is presented to the student network [using the first neural network as the student model, and] not only via both an ensemble of teachers but also by data augmentation of training data as described earlier.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Fukuda with teachings of Faibish, Sun and HALL for the same reasons disclosed for claim 9.
Faibish in view of Sun, Fukuda and HALL do not teach:
filtering input features of some unbiased samples from the unbiased data set 
by using a deep global balancing regression (DGBR) algorithm; 
Farrar teaches: 
filtering input features of some unbiased samples from the unbiased data set 
(Farrar, col. 5 line [50 – 59], “By preventing the machine learning model 300 from training on biased data within the ML training data set 302, the machine learning model 300 is not influenced by the biased data and is therefore capable of generating an unbiased prediction value 310 (FIG. 3) during inference. Thus, the bias rejection model 200 corresponds to a filter that removes/adjusts biased data within the ML training data set 302 prior to training the ML model 300 by outputting/generating the unbiased training data set 206 for use in training the ML model 300 [filtering input features of some unbiased samples from the unbiased data set].”)
Farrar, Faibish, Sun, Fukuda and HALL are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Farrar with teachings of Faibish, Sun, Fukuda and HALL to identify and correct bias in the training data before training the neural networks that detect state transitions. (Farrar, Abstract).
Faibish in view of Sun, Fukuda, HALL and Farrar do not teach:
by using a deep global balancing regression (DGBR) algorithm; 
Kuang teaches: 
by using a deep global balancing regression (DGBR) algorithm; 
(Kuang, page: 7, “3.2 The Model 3.2.1 Framework We propose a Deep Global Balancing Regression (DGBR) algorithm [by using a deep global balancing regression (DGBR) algorithm] to identify stable features and capture non-linear structure for stable prediction. Its framework is shown in Figure 2. To identify the stable features, we propose a global balancing model, where we learn global sample weights which can be used to estimate the effect of each feature while controlling for the other features and thus identify stable features.”)
Kuang, Faibish, Sun, Fukuda, HALL and Farrar are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Kuang with teachings of Faibish, Sun, Fukuda, HALL and Farrar to ensure that the internal state information used to train a neural network remains reliable across unknown or changing environments. (Kuang, Abstract).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Faibish in view of Sun, Fukuda, HALL, Farrar, Kuang and in further view of Kaiyu, et al., Pub. No.: CN111260056A. 
Faibish in view of Sun, Fukuda, HALL, Farrar and Kuang teach the method of claim 10.
Faibish in view of Sun, Fukuda, HALL, Farrar and Kuang do not teach: 
wherein the data features of the sample set comprise a quantity of feature dimensions of the sample set, and the determining the first distillation manner based on the data features of the sample set comprises: selecting the first distillation manner matching the quantity of the feature dimensions from a plurality of distillation manners
Kaiyu teaches: 
wherein the data features of the sample set comprise a quantity of feature dimensions of the sample set, and 
(Kaiyu, “[0069] In real-time applications, to ensure that the acquired first channel feature set and second channel feature set can reflect the characteristics of their respective first and second network models, distillation sites can be determined according to the type of the network model, so that the channel features extracted at that distillation site are more accurate [wherein the data features of the sample set comprise a quantity of feature dimensions of the sample set]. The implementation of determining the distillation site will be explained in subsequent embodiments.”)

the determining the first distillation manner based on the data features of the sample set comprises: selecting the first distillation manner matching the quantity of the feature dimensions from a plurality of distillation manners.
(Kaiyu, “[0020] Selecting a first channel feature from the target channel feature matching pair as a target channel feature using a random function [selecting the first distillation manner matching the quantity of the feature dimensions from a plurality of distillation manners], wherein the target channel feature matching pair is any channel feature matching pair;”)
Kaiyu, Faibish, Sun, Fukuda, HALL, Farrar and Kuang are related to the same field of endeavor (i.e.: leveraging information from one model to train or improve another). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Kaiyu with teachings of Faibish, Sun, Fukuda, HALL, Farrar and Kuang to improve a learning model’s ability to detect intermediate states by ensuring it closely replicates the teacher network’s learned features. (Kaiyu, Abstract).

Allowable Subject Matter
Claim(s) 13 – 15 and 23 – 24 would be allowable if amended to overcome the rejection under 35 U.S.C. 101 set forth in this Office action. The prior art made of record does not teach, make obvious, or suggest the claim limitations as disclosed in Applicant's claims. when reading the claims in light of the specification, as per MPEP 2111.01, none of the references of record alone or in combination disclose or suggest the limitations found within the independent claim(s) 13 and 23 as a whole with regards to technical features recited by the claim limitations including directed to:.

Claim 13 recites: 
A recommendation method, comprising: obtaining information about a target user and information about a recommended object candidate; inputting the information about the target user and the information about the recommended object candidate into a recommendation model, and predicting a probability that the target user performs an operational action on the recommended object candidate, wherein the recommendation model is obtained by training a first neural network by using a biased data set and an unbiased data set in a sample set in a first distillation manner, the biased data set comprises biased samples, the unbiased data set comprises unbiased samples, the first distillation manner is determined based on data features of the sample set, the biased samples in the biased data set comprise information about a first user, information about a first recommended object, and actual labels, the actual labels of the biased samples in the biased data set are used to represent whether the first user performs an operational action on the first recommended object, the unbiased samples in the unbiased data set comprise information about a second user, information about a second recommended object, and actual labels, and the actual labels of the biased samples in the unbiased data set are used to represent whether the second user performs an operational action on the second recommended object.

Closest prior arts:
Faibish et al., Pub. No.: US11687433B2. 
Faibish teaches identifying a temporary state the system goes through before reaching the final state. The first neural network is trained with various inputs that show the system in this temporary state. After processing these inputs, internal state information from the first neural network is collected at different times. A second neural network is then trained using this internal state information to identify the temporary state. However, Faibish does not teach generating a recommendations by obtaining user and item information and predicting the likelihood that the user will interact with a given item using a trained recommendation model. The model is produced by training a neural network on a combination of biased and unbiased samples, with the training process selecting an appropriate distillation manner based on data features. Biased samples contains user-item pairs, while unbiased samples contain user-item pairs with more reliable labels, enabling the model to learn balanced patterns and improve recommendation accuracy. 
Fukuda et al., "Efficient knowledge distillation from an ensemble of teachers.
Fukuda demonstrates that combining information from different models, like VGG networks and LSTM models to enhance the training of standard CNN acoustic models for quick applications. Two methods are analyzed: updating student model weights by changing teacher labels during training, and using multiple streams of information for data augmentation. However, Fukuda does not teach generating a recommendations by obtaining user and item information and predicting the likelihood that the user will interact with a given item using a trained recommendation model. The model is produced by training a neural network on a combination of biased and unbiased samples, with the training process selecting an appropriate distillation manner based on data features. Biased samples contains user-item pairs, while unbiased samples contain user-item pairs with more reliable labels, enabling the model to learn balanced patterns and improve recommendation accuracy.
Claim 23 includes limitations analogous to those of claim 13 and is therefore would be allowable for the same rationale if amended to overcome the rejection under 35 U.S.C. 101 set forth in this Office action.
The dependent claim(s): 14 – 15 and 24 would be allowable because of their dependency to claim 13 if amended to overcome the rejection under 35 U.S.C. 101 set forth in this Office action.

Claim(s) 3, 5 – 7, 18 and 20 are objected to as being dependent upon a rejected base claim and would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and amended to overcome the rejection under 35 U.S.C. 101 set forth in this Office action. The prior art made of record does not teach, make obvious, or suggest the claim limitations as disclosed in applicant's claims.

Claim 3 recites: 
wherein the training the first neural network based on the biased data set and the unbiased data set in the first distillation manner, to obtain the updated first neural network comprises: training the first neural network by using the biased data set and the unbiased data set alternately, to obtain the updated first neural network, wherein, in the alternate training, a quantity of batch training iterations for training the first neural network by using the biased data set and a quantity of batch training iterations for training the first neural network by using the unbiased data set are in a preset ratio, and the input features of the samples in the sample set are used as inputs of the first neural network.

Closest prior arts:
 Faibish et al., Pub. No.: US11687433B2. 
Faibish teaches identifying a temporary state the system goes through before reaching the final state. The first neural network is trained with various inputs that show the system in this temporary state. After processing these inputs, internal state information from the first neural network is collected at different times. A second neural network is then trained using this internal state information to identify the temporary state. However, Faibish does not teach the first neural network is updated by alternatively training on based and unbiased data sets according to a present ratio that controls how many batches comes from each source. During this, alternating process, the network receives the sample set’s input features as its inputs, allowing it to learn from both biased and unbiased patterns in a balanced manner and produce an improved updated model. 
Fukuda et al., "Efficient knowledge distillation from an ensemble of teachers.
Fukuda demonstrates that combining information from different models, like VGG networks and LSTM models to enhance the training of standard CNN acoustic models for quick applications. Two methods are analyzed: updating student model weights by changing teacher labels during training, and using multiple streams of information for data augmentation. However, Fukuda does not teach the first neural network is updated by alternatively training on biased and unbiased data sets according to a present ratio that controls how many batches comes from each source. During this, alternating process, the network receives the sample set’s input features as its inputs, allowing it to learn from both biased and unbiased patterns in a balanced manner and produce an improved updated model.
Farrar et al., Pub. No.: US11250346B2.
Farrar discusses rejecting biased data using machine learning involves receiving a cluster training data set that consists of known unbiased data and training a clustering model to create clusters based on the characteristics of this unbiased data. Each cluster has a weight. The method then includes receiving a training data set for a machine learning model and generating weights for this training data based on the clustering model. It adjusts these weights to match the cluster weights, providing the adjusted data set to the machine learning model as unbiased training data. However, Farrar does not teach the first neural network is updated by alternatively training on biased and unbiased data sets according to a present ratio that controls how many batches comes from each source. During this, alternating process, the network receives the sample set’s input features as its inputs, allowing it to learn from both biased and unbiased patterns in a balanced manner and produce an improved updated model.
Claim 18 includes limitations analogous to those of claim 3 and is therefore would be allowable if amended to overcome the rejection under 35 U.S.C. 101 set forth in this Office action.

Claim 5 recites: 
wherein the first distillation manner is to perform distillation based on prediction labels of the unbiased samples comprised in the unbiased data set, the prediction labels are output by an updated second neural network for the unbiased samples in the unbiased data set, and the updated second neural network is obtained by training a second neural network by using the unbiased data set.

Closest prior art: 
Faibish et al., Pub. No.: US11687433B2. 
Faibish teaches identifying a temporary state the system goes through before reaching the final state. The first neural network is trained with various inputs that show the system in this temporary state. After processing these inputs, internal state information from the first neural network is collected at different times. A second neural network is then trained using this internal state information to identify the temporary state. However, Faibish does not teach The first neural network is guided by prediction labels generated from unbiased samples, where these labels comes from an updated second neural network trained exclusively on the unbiased data set and the updated neural network is obtained by training a second neural network using the unbiased data set. 
Fukuda et al., "Efficient knowledge distillation from an ensemble of teachers.
Fukuda demonstrates that combining information from different models, like VGG networks and LSTM models to enhance the training of standard CNN acoustic models for quick applications. Two methods are analyzed: updating student model weights by changing teacher labels during training, and using multiple streams of information for data augmentation. However, Fukuda does not teach The first neural network is guided by prediction labels generated from unbiased samples, where these labels comes from an updated second neural network trained exclusively on the unbiased data set and the updated neural network is obtained by training a second neural network using the unbiased data set. 
Farrar et al., Pub. No.: US11250346B2.
Farrar discusses rejecting biased data using machine learning involves receiving a cluster training data set that consists of known unbiased data and training a clustering model to create clusters based on the characteristics of this unbiased data. Each cluster has a weight. The method then includes receiving a training data set for a machine learning model and generating weights for this training data based on the clustering model. It adjusts these weights to match the cluster weights, providing the adjusted data set to the machine learning model as unbiased training data. However, Farrar does not teach The first neural network is guided by prediction labels generated from unbiased samples, where these labels comes from an updated second neural network trained exclusively on the unbiased data set and the updated neural network is obtained by training a second neural network using the unbiased data set.
Claim 20 includes limitations analogous to those of claim 5 and is therefore would be allowable if amended to overcome the rejection under 35 U.S.C. 101 set forth in this Office action.
The dependent claim(s): 6 – 7 would be allowable because of their dependency to claim 5 if amended to overcome the rejection under 35 U.S.C. 101 set forth in this Office action.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Bahng, et al. "Learning de-biased representations with biased representations.", 06-2020.
Bahng proposes a Frame-work to train a de-biased representation by encouraging it to be different from a set of representations that are biased by design.
Cho, Jang Hyun, and Bharath Hariharan. "On the efficacy of knowledge distillation.", 2019.
Cho present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures. Starting with the observation that more accurate teachers often don’t make good teachers, we attempt to tease apart the factors that affect knowledge distillation performance.

Any inquiry concerning this communication or earlier communications from the examiner
should be directed to MATIYAS T MARU whose telephone number is (571)270-0902. The examiner
can normally be reached Monday 8:00am - Friday 4:00pm EST.
		Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to
use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
Michelle Bechtold can be reached on (571)431-0762. The fax phone number for the organization were this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users.
To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA)
or 571-272-1000.

/M.T.M./       Examiner, Art Unit 2148                                                                                                                                                                                                 
/Ryan Barrett/Primary Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Jan 20, 2023
Application Filed
Nov 28, 2025
Non-Final Rejection mailed — §101, §103
Feb 25, 2026
Response Filed
May 12, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/077,206
Patent 12626190
METHOD OF ANALYZING WIRELESS SIGNALS USING MULTI-TASK LEARNING-BASED SPECTRAL ANALYSIS LEARNING MODEL
3y 5m to grant Granted May 12, 2026
17/643,921
Patent 12614106
MACHINE LEARNING TECHNIQUES USING CROSS-MODEL FINGERPRINTS FOR NOVEL PREDICTIVE TASKS
4y 4m to grant Granted Apr 28, 2026
17/367,134
Patent 12586114
GENERATING DIGITAL RECOMMENDATIONS UTILIZING COLLABORATIVE FILTERING, REINFORCEMENT LEARNING, AND INCLUSIVE SETS OF NEGATIVE FEEDBACK
4y 8m to grant Granted Mar 24, 2026
17/138,890
Patent 12572796
METHODS AND SYSTEMS FOR GENERATING RECOMMENDATIONS FOR COUNTERFACTUAL EXPLANATIONS OF COMPUTER ALERTS THAT ARE AUTOMATICALLY DETECTED BY A MACHINE LEARNING ALGORITHM
5y 2m to grant Granted Mar 10, 2026
17/161,575
Patent 12567004
METHOD OF MACHINE LEARNING TRAINING FOR DATA AUGMENTATION
5y 1m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
62%
Grant Probability
70%
With Interview (+7.5%)
4y 2m (~10m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 45 resolved cases by this examiner. Grant probability derived from career allowance rate.