Last updated: April 19, 2026
Application No. 17/370,747
BANDIT-BASED TECHNIQUES FOR FAIRNESS-AWARE HYPERPARAMETER OPTIMIZATION

Non-Final OA §101§103
Filed
Jul 08, 2021
Examiner
HAN, KYU HYUNG
Art Unit
2123
Tech Center
2100 — Computer Architecture & Software
Assignee
Feedzai-Consultadoria E Inovacão Tecnológica S A
OA Round
3 (Non-Final)
Interview Optional

— +41.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 7 resolved cases, 2023–2026
Examiner Intelligence

HAN, KYU HYUNG View full profile →
Grants 43% of resolved cases
Career Allow Rate
3 granted / 7 resolved
-12.1% vs TC avg
Strong +42% interview lift
Without
With
+41.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
38.4%
-1.6% vs TC avg
§103
50.9%
+10.9% vs TC avg
§102
4.2%
-35.8% vs TC avg
§112
6.6%
-33.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 7 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/18/2025 has been entered.

Response to Remarks
Claim Rejections – 35 U.S.C. 101
	Applicant’s amendments have been fully considered but they are not persuasive.
Applicant argues (pg. 8-9) that claim 1 recites a technical solution to the technical problem of algorithmic bias and is therefore a practical application. Applicant argues that claim 1 recites training a machine learning model to balance fairness and performance, as it describes how to select hyperparameter combinations and integrates the selection of hyperparameter combination into the training of the machine learning model.
Examiner respectfully disagrees. Although claim 1 does describe how to select hyperparameter combinations and integrates the combination into the training of the machine learning model, they are based on elements that are either a mental process or other elements that do not integrate the judicial exception into a practical application, when taken alone or in combination. Regarding the selection of hyperparameter combinations, claim 1 recites using a relative weighting, where it’s “based at least in part on an average fairness and an average performance of a set of candidate combinations of …”. This is a mental process because a human can calculate an average of fairness and performance from a set of combinations. In addition, an amended limitation reads: “using the selected hyperparameter combination to train the machine learning model to balance the fairness of the machine learning model and the performance of the machine learning model.” This is a general statement regarding a general improvement, in fairness and performance, to the machine learning model due to the hyperparameter selection. As the claim currently stands, the training process is like a black box, where the details are not sufficiently described to show the improvement in these areas by the hyperparameter combination selection. Examiner suggests amending the language to include more technical details regarding the hyperparameter combination selection and how it relates to the improvement in training.
The foregoing applies to all independent claims and their dependent claims.

Claim Rejections – 35 U.S.C. 103
Applicant’s prior art arguments have been fully considered but they are not persuasive.
Applicant argues (pgs. 9-10) that Krasanakis does not disclose or suggest that its parameters are categorical. Instead, Applicant argues that they appear to be conventional numerical/continuous weight for an objective function.
Examiner respectfully disagrees. First, note that the cited portions of the specification and drawing (paragraph 63 and Figs. 4-7) only show that the method may be used on hyperparameters that are categorical, not that it is primarily used on categorical hyperparameters. Similarly, while Krasanakis deals with the conventional numerical weight for the objective function (as given in page 858, column 2, equation 10), it also deals with classification/categorical also. Indeed, on pg. 858, col. 1, paragraph 4: “The sensitive group S and the non-sensitive group S’ can adhere to different misclassification biases, which skew error in different ways”. Revisiting equation 10, as referenced above, the weight depends on whether the samples is in S or S’. Therefore, Krasanakis does teach that the hyperparameters may be categorical hyperparameters, as per amended limitation in claim 1.
Applicant argues (pgs. 9-10) that one of ordinary skill in the art would not be motivated to combine the 103 references because both Hu and Denolf teach away from dynamic determination of an updated relative weighting between the fairness evaluation metric and the performance metric.
Examiner respectfully disagrees. As “dynamically” determining the weighting between the fairness and performance metric is not defined in the spec, Examiner takes the broadest reasonable interpretation of “dynamically” determining the weighting, which is to calculate the weighting during run-time, instead of using a pre-solved, or hardcoded, weighting. Hu does indeed teach using grid search using predefined weightings to try to get the best alpha-beta combination (Hu, page 10, paragraph 5.1.2). However, this is describing the experimental settings, where the hyperparameters are selected using the training data in experiment, or in other words, during runtime. Hu is not choosing a hardcoded alpha-beta combination before the training/experiment even begins – rather, it just limits the possible precision of the combination to a floating point of 0.1 intervals. Indeed, note that all computers have a limit of the precision that the floating-point numbers can have. 64 bits is considered reasonably high precision – yet if one were to list out the possible weighting combinations, it would be a discrete set, not a continuous distribution. Hu’s methodology regards this level of precision rather than the dynamic/static nature of the determining of the relative weighting. Regarding Denolf, the motivation to combine is based on the fact that it regards combining multiple objective functions into one objective function to train a neural network. Rather, the training in Denolf is done dynamically, by definition, as it requires unknown training data.
The foregoing applies to all independent claims and their dependent claims.

Claim Rejections – 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
	Step 1: Claims 1-18 are method claims. Claims 19-20 are machine/system/product claims. Therefore, claims 1-20 are directed to either a process, machine, manufacture or composition of matter.
With respect to claim 1:
Step 2A – Prong 1:
…
…
automatically evaluating candidate combinations of hyperparameters of the machine learning model, the hyperparameters being different from model weights of the machine learning model and the hyperparameters including categorical hyperparameters, based at least in part on multi-objective optimization including scalarization and using the fairness evaluation metric and the performance metric to select a hyperparameter combination to utilize among the candidate combinations of hyperparameters, wherein evaluating the candidate combinations of hyperparameters of the machine learning model includes automatically and dynamically determining a relative weighting between the fairness evaluation metric and the performance metric for a current iteration, the updated relative weighting being determined based at least in part on an average fairness and an average performance of a set of candidate combinations of hyperparameters evaluated in one or more previous iterations; (mental process – a person can manually evaluate candidate combinations of hyperparameters using multi-objective optimization, select a hyperparameter combination, and determine a relative weighting between the fairness metric and performance metric with the assistance of a pen/paper. A person can also recognize that the updated relative weighting being determined based at least in part on an average fairness and an average performance of a set of candidate combinations of hyperparameters evaluated in one or more previous iterations)
…

Step 2A – Prong 2: This judicial exception is not integrated into a practical application.
A method, comprising: receiving a fairness evaluation metric for evaluating a fairness of a machine learning model to be trained; (Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g)).
receiving a performance metric for evaluating performance of the machine learning model to be trained; (Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g)).
…
and using the selected hyperparameter combination to train the machine learning model to balance the fairness of the machine learning model and the performance of the machine learning model. (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: High level recitation of training a machine learning model.);

Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.

A method, comprising: receiving a fairness evaluation metric for evaluating a fairness of a machine learning model to be trained; (MPEP 2106.05(d)(II) indicate that merely “Receiving or transmitting data over a network, e.g., using the Internet to gather data” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim – the fairness evaluation metric is merely received). Thereby, a conclusion that the claimed distribute step is well-understood, routine, conventional activity is supported under Berkheimer.)
receiving a performance metric for evaluating performance of the machine learning model to be trained; (MPEP 2106.05(d)(II) indicate that merely “Receiving or transmitting data over a network, e.g., using the Internet to gather data” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim – the performance metric is merely received). Thereby, a conclusion that the claimed distribute step is well-understood, routine, conventional activity is supported under Berkheimer.)
…
and using the selected hyperparameter combination to train the machine learning model. (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: High level recitation of training a machine learning model.);

With respect to claim 2:
Step 2A – Prong 1:
The method of claim 1, wherein the scalarization reduces objectives of the multi- objective optimization to a single scalar output and includes a weighted lp-norm. (mental process – a person can recognize that the scalarization reduces the multi-objective optimization to a single scalar output and includes a weighted lp-norm.)

With respect to claim 3:
Step 2A – Prong 1:
The method of claim 1, wherein the fairness evaluation metric includes a measure of at least one of: group fairness. (mental process – a person can recognize that the fairness evaluation metric includes a measure of group fairness.)

With respect to claim 4:
Step 2A – Prong 1:
The method of claim 1, wherein the performance metric includes a measure of performance of a predictive task. (mental process – a person can recognize that the performance metric includes a measure of performance of a predictive task.)

With respect to claim 5:
Step 2A – Prong 1:
The method of claim 1, wherein the selected hyperparameter combination is included in a Pareto frontier. (mental process – a person can recognize that the selected hyperparameter combination is included in a Pareto frontier set.)

With respect to claim 6:
Step 2A – Prong 1:
The method of claim 1, wherein the updated relative weighting between the fairness evaluation metric and the performance metric includes a fairness evaluation metric weight and a performance metric weight that are inversely proportional and sums to 1. (mental process – a person can recognize that the updated relative weighting between the fairness evaluation metric and the performance metric includes a fairness evaluation metric weight and a performance metric weight that are inversely proportional and sums to 1.)

With respect to claim 7:
Step 2A – Prong 1:
The method of claim 1, further comprising evaluating the fairness of the machine learning model to be trained according to the fairness evaluation metric, wherein the evaluation of the fairness of the machine learning model to be trained is based on the same predictions used to evaluate the performance of the machine learning model to be trained. (mental process – a person can recognize that the evaluation of the fairness of the model is based on the performance of the model.)

With respect to claim 8:
Step 2A – Prong 1:
The method of claim 1, wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric is based on a user-defined fairness-performance trade-off. (mental process – a person can recognize that the relative weighting between the two metrics is based on a user-defined trade-off.)

With respect to claim 9:
Step 2A – Prong 1:
The method of claim 1, (mental process from claim 1)

Step 2A – Prong 2: This judicial exception is not integrated into a practical application.
further comprising outputting a sorted set of one or more machine learning models trained using the selected hyperparameter combination. (Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g)). 

Step 2B: The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
further comprising outputting a sorted set of one or more machine learning models trained using the selected hyperparameter combination. (MPEP 2106.05(d)(II) indicate that merely “Receiving or transmitting data over a network, e.g., using the Internet to gather data” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim – the models are merely transmitted). Thereby, a conclusion that the claimed distribute step is well-understood, routine, conventional activity is supported under Berkheimer.)

With respect to claim 10:
Step 2A – Prong 1:
The method of claim 9, wherein the set of one or more machine learning models are output to a graphical user interface including by at least one of: displaying an associated fairness and performance for each of the one or more machine learning models; (mental process – a person can manually display the fairness and performance of a model using the assistance of a pen/paper.)
or displaying at least one comparison between machine learning models in the set of one or more machine learning models, wherein the machine learning models meet at least one Pareto criterion. (mental process – a person can manually display the comparison between models using the assistance of a pen/paper.)

With respect to claim 11:
Step 2A – Prong 1:
The method of claim 1, wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric is performed automatically and does not require specific domain knowledge. (mental process – a person can manually determine a relative weighting between the fairness metric and the performance metric).

With respect to claim 12:
Step 2A – Prong 1:
The method of claim 1, wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric includes guiding a search towards minimizing a difference between the average fairness and the average performance. (mental process – a person can manually determine a relative weighting between the fairness metric and the performance metric by minimizing the difference between average fairness and average performance with the assistance of a pen/paper).

With respect to claim 13:
Step 2A – Prong 1:
The method of claim 1, wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric includes guiding a search towards regions of higher fairness if current candidate combinations of hyperparameters correspond to machine learning model performance above a first threshold and machine learning model fairness below a second threshold. (mental process – a person can manually determine a relative weighting between the fairness metric and the performance metric by searching towards regions of higher fairness with the assistance of a pen/paper).

With respect to claim 14:
Step 2A – Prong 1:
The method of claim 1, wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric includes determining an updated weighting for the fairness evaluation metric and an updated weighting for the performance metric in each iteration. (mental process – a person can manually update the relative weighting between fairness and performance metrics with the assistance of a pen/paper).

With respect to claim 15:
Step 2A – Prong 1:
The method of claim 14, wherein at least one of: the updated weighting of the fairness evaluation metric at a current iteration is increased relative to that of a previous iteration if at least one of: the average fairness of the evaluated candidate combinations of hyperparameters decreased in the previous iteration, or the average performance of the evaluated candidate combinations of hyperparameters increased in the previous iteration; (mental process – a person can recognize the changes in hyperparameters between iterations of a manual iterative process.).

With respect to claim 16:
Step 2A – Prong 1:
The method of claim 14, wherein the updated weighting of at least one of the fairness evaluation metric and the updated weighting of the performance metric at a current iteration is determined based least in part on the average fairness and the average performance of already trained hyperparameter combinations. (mental process – a person can recognize what the fairness and performance numbers are on already existing hyperparameter combinations.).

With respect to claim 17:
Step 2A – Prong 1:
The method of claim 1, wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric includes determining a weighting for the fairness evaluation metric based on an associated range of values for the fairness evaluation metric and a weighting for the performance metric based on an associated range of values for the performance metric. (mental process – a person can manually determine ranges of values for the weights of the fairness and performance metrics with the assistance of a pen/paper).

With respect to claim 18:
Step 2A – Prong 1:
The method of claim 1, wherein the use of the selected hyperparameter combination to optimize the machine learning model to be trained includes selecting the machine learning model based at least in part on the average fairness and performance of all sampled hyperparameter combinations. (mental process – a person can calculate the average fairness and performance metrics of all hyperparameter combinations using a pen/paper.).


Claim 19 is substantially similar to claim 1, but has the following additional elements:
With respect to claim 19:
Step 2A – Prong 2: This judicial exception is not integrated into a practical application.
A system, comprising: a processor configured to: (mere instructions to apply the exception using a generic computer component – processor applies exception)
and a memory coupled to the processor and configured to provide the processor with instructions. (mere instructions to apply the exception using a generic computer component – memory/processor applies exception)


Claim 20 is substantially similar to claim 1, but has the following additional elements:
With respect to claim 20:
Step 2A – Prong 2: This judicial exception is not integrated into a practical application.
A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: (mere instructions to apply the exception using a generic computer component – computer/non-transitory computer readable medium applies exception)

With respect to claim 21:
Step 2A – Prong 1:
The method of claim 1, further comprising determining candidate combinations of hyperparameters of the machine learning model including by applying at least one of the following hyperparameter tuners: Random Search, Tree Parzen Estimator, or bandit-based hyperparameter tuner. (mental process – a person can recognize that the determining of the candidate combinations of hyperparameters includes applying at least one of Random Search, Tree Parzen Estimator, or bandit-based hyperparameter tuner.)

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 7-18 are rejected under 35 U.S.C. 103 as being unpatentable over Hu et al. (“FairNN - Conjoint Learning of Fair Representations for Fair Decisions”) hereinafter known as Hu in view of Krasanakis et al. (“Adaptive Sensitive Reweighting to Mitigate Bias in Fairness-aware Classification”) hereinafter known as Krasanakis.

Regarding independent claim 1, Hu teaches:
A method, comprising: receiving a fairness evaluation metric for evaluating a fairness of a machine learning model to be trained; (Hu [Page 5, Paragraph 3]: “In this work, we employ Equalized Odds … Eq.Odds ∈ [0, 2], with 0 indicating no discrimination and 2 indicating maximum discrimination” Hu teaches a fairness evaluation metric called equalized odds, where the objective is to minimize the metric, as a lower score means lower discrimination.)
receiving a performance metric for evaluating performance of the machine learning model to be trained; (Hu [Page 8, Paragraph 2]: “The Binary Cross Entropy is used as loss function to train the classifier … where cb is the true label and ˙cb is the predicted probability of the data point b having the label cb” Hu teaches the binary cross entropy as the performance metric to evaluate the performance of the machine learning model.)
…
and using the selected hyperparameter combination to train the machine learning model to balance the fairness of the machine learning model and the performance of the machine learning model. (Hu [Page 10, Paragraph 1]: “We train the auto-encoder and classifier simultaneously by minimizing the objective function Eq. 8. In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5].” Hu teaches training the machine learning model using the hyperparameter combination, of α – β, objective function.)

Hu does not explicitly teach:
automatically evaluating candidate combinations of hyperparameters of the machine learning model, the hyperparameters being different from model weights of the machine learning model and the hyperparameters including categorical hyperparameters, based at least in part on multi-objective optimization including scalarization and using the fairness evaluation metric and the performance metric to select a hyperparameter combination to utilize among the candidate combinations of hyperparameters, wherein evaluating the candidate combinations of hyperparameters of the machine learning model includes automatically and dynamically determining an updated relative weighting between the fairness evaluation metric and the performance metric for a current iteration, the updated relative weighting being determined based at least in part on an average fairness and an average performance of a set of candidate combinations of hyperparameters evaluated in one or more previous iterations;

However, Krasanakis teaches:
automatically evaluating candidate combinations of hyperparameters of the machine learning model, the hyperparameters being different from model weights of the machine learning model and the hyperparameters including categorical hyperparameters, based at least in part on multi-objective optimization including scalarization and using the fairness evaluation metric and the performance metric to select a hyperparameter combination to utilize among the candidate combinations of hyperparameters, wherein evaluating the candidate combinations of hyperparameters of the machine learning model includes automatically and dynamically determining an updated relative weighting between the fairness evaluation metric and the performance metric for a current iteration, the updated relative weighting being determined based at least in part on an average fairness and an average performance of a set of candidate combinations of hyperparameters evaluated in one or more previous iterations; (Krasanakis [Page 858, Equation 10]: Krasanakis teaches a equation that tunes two attributes using one variable, α, to scalarize the attributes on a gradient. Krasanakis explains the equation in more detail in the following. Krasanakis [Page 859, Column 2, Paragraph 2]: “Fairness-aware classifiers are usually able to train towards mitigating various fairness metrics. At the same time, they need to preserve the accuracy (acc) of the base classification model as much as possible. … it can employ either linear scalarization, where a linear trade-off is set between the objectives … it is easier to tune the parameters of Eq. 10 in a linear than in a constrained space.” Krasanakis provides the context for Equation 10 – that the equation aims at balancing the fairness as well as the accuracy/efficiency of the objective function using scalarization. Krasanakis [Page 860, Column 1, Paragraph 4]: “convergence of Algorithm 1 towards optimal weights and the impact on the objective function … measure the root mean square weight edits on each iteration of Algorithm 1” Krasanakis teaches that in determining the optimal weights and the convergence toward it, an iterative process adjusts toward the optimal weighting based on the root mean square of the weights of the fairness and performance. These weights can be mapped to hyperparameters, as they are not true weights of the model – instead, they are the weights of the objective function. Krasanakis [Page 858, col. 1, paragraph 4]: “The sensitive group S and the non-sensitive group S’ can adhere to different misclassification biases, which skew error in different ways”. Krasanakis teaches categorical hyperparameters, since in equation 10, as referenced above, the weight depends on whether the samples is in S or S’.)

Hu and Krasanakis are in the same field of endeavor as the present invention, as the references are directed to combining multiple objective functions into one objective function to train a neural network. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine combining the fairness and the performance metrics of the objective function as taught in Hu with doing so in an iterative way that takes into account past iteratives as taught in Krasanakis. Krasanakis provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Hu to include teachings of Krasanakis because the combination would allow for the convergence of the weighting of performance and fairness metrics to be done iteratively, increasing the accuracy of the final weighting. This has the potential benefit of more likely ensuring that the fairness of the objective function does not get overshadowed by the performance metric.

Regarding dependent claim 3, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the fairness evaluation metric includes a measure of at least one of: group fairness. (Hu [Page 5, Paragraph 3]: “In particular, let δFPR (δFNR) be the difference in false positive rates (false negative rates, respectively) between the protected and non-protected groups … The goal of Eq.Odds is to minimize both differences” Hu teaches that the goal of the fairness evaluation metric is to minimize the difference between the protected and non-protected groups, meaning that the evaluation includes group fairness.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 4, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the performance metric includes a measure of performance of a predictive task. (Hu [Page 8, Paragraph 2]: “The Binary Cross Entropy is used as loss function to train the classifier … where cb is the true label and ˙cb is the predicted probability of the data point b having the label cb” Hu teaches the binary cross entropy as the performance metric to evaluate the performance of the machine learning model. This metric provides the measure of performance of a predictive task, namely the accuracy of the model in predicting the data points having the correct label.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 5, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the selected hyperparameter combination is included in a Pareto frontier. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset and α = 0.8, β = 0.4 for the Bank Marketing Dataset are selected.” Hu teaches evaluating the candidate combinations of hyperparameters and choosing a combination of hyperparameters by determining a relative weighting between the fairness evaluation metric and the performance metric, denoted β. This is done using grid search, which automatically evaluates combinations of the possible hyperparameters and selects one of the set of best ones (Pareto frontier).)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 7, Hu and Krasanakis teach:
The method of claim 1,

Hu teaches:
 further comprising evaluating the fairness of the machine learning model to be trained according to the fairness evaluation metric, wherein the evaluation of the fairness of the machine learning model to be trained is based on the same predictions used to evaluate the performance of the machine learning model to be trained. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset and α = 0.8, β = 0.4 for the Bank Marketing Dataset are selected.” Hu teaches evaluating the candidate combinations of hyperparameters and choosing a combination of hyperparameters by determining a relative weighting between the fairness evaluation metric and the performance metric, denoted β. The evaluation of the fairness of the model is based on the evaluation of the performance, as they are inextricably linked by this β coefficient in the multi-objective function.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 8, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric is based on a user-defined fairness-performance trade-off. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset and α = 0.8, β = 0.4 for the Bank Marketing Dataset are selected.” Hu teaches evaluating the candidate combinations of hyperparameters and choosing a combination of hyperparameters by determining a relative weighting between the fairness evaluation metric and the performance metric, denoted β. What value that β turns out to be is based on the user-defined terms of the fairness and performance metric terms in the function. In a word, the weight of the fairness evaluation metric relative to the performance evaluation metric is based on the characteristics of how both metrics are originally constructed, which has the effect of being a user-defined trade-off.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 9, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
further comprising outputting a sorted set of one or more machine learning models trained using the selected hyperparameter combination. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset.” Hu teaches that for the Adult Census Income Dataset, a set of one trained machine learning model, with hyperparameters α = 0.9, β = 0.2, is outputted.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 10, Hu and Krasanakis teach:
The method of claim 9, 

Hu teaches:
wherein the set of one or more machine learning models are output to a graphical user interface including by at least one of: displaying an associated fairness and performance for each of the one or more machine learning models; (Hu [Page 11, Figure 3]: Hu teaches displaying an associated fairness and performance for each of the machine learning models that were created with different methods. This graph can be interacted with by a user.)
or displaying at least one comparison between machine learning models in the set of one or more machine learning models, wherein the machine learning models meet at least one Pareto criterion. (Hu [Page 11, Figure 3]: Hu teaches displaying an associated fairness and performance for each of the chosen (Pareto optimal) machine learning models that were created with different methods. The format is a bar graph such that the comparison between the models can be shown. This graph can be interacted with by a user, as it had to have been at one point in order to be generated.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 11, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric is performed automatically and does not require specific domain knowledge. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset and α = 0.8, β = 0.4 for the Bank Marketing Dataset are selected.” Hu teaches evaluating the candidate combinations of hyperparameters and choosing a combination of hyperparameters by determining a relative weighting between the fairness evaluation metric and the performance metric, denoted β. This is done using grid search, which automatically evaluates combinations of the possible hyperparameters and is not specific to this domain nor requires specific domain knowledge.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 12, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric includes guiding a search towards minimizing a difference between the average fairness and the average performance. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset and α = 0.8, β = 0.4 for the Bank Marketing Dataset are selected.” Hu teaches evaluating the candidate combinations of hyperparameters and choosing a combination of hyperparameters by determining a relative weighting between the fairness evaluation metric and the performance metric, denoted β. This is done using grid search, which automatically evaluates combinations of the possible hyperparameters.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 13, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric includes guiding a search towards regions of higher fairness if current candidate combinations of hyperparameters correspond to machine learning model performance above a first threshold and machine learning model fairness below a second threshold. (Hu [Page 13, Paragraph 2]: “perform ablation studies to evaluate how different parts influence the predictive and fairness performance of our method. In Fig. 4, α = 0 represents the outcome without KL-Divergence regularization and β = 0 without Eq.Odds regularization respectively” Hu teaches that when β = 0, then the fairness metric is not taken into consideration. In the opposite side of the spectrum, when β = 1, the determination of the weighting go to a region of higher fairness. The threshold depends on β.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 14, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric includes determining an updated weighting for the fairness evaluation metric and an updated weighting for the performance metric in each iteration. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset and α = 0.8, β = 0.4 for the Bank Marketing Dataset are selected.” Hu teaches evaluating the candidate combinations of hyperparameters and choosing a combination of hyperparameters by determining a relative weighting between the fairness evaluation metric and the performance metric, denoted β. The evaluation of the fairness of the model is updated when the weighting is changed by changing the coefficient in each iteration.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 15, Hu and Krasanakis teach:
The method of claim 14, 

Hu teaches:
wherein at least one of: the updated weighting of the fairness evaluation metric at a current iteration is increased relative to that of a previous iteration if at least one of: the average fairness of the evaluated candidate combinations of hyperparameters decreased in the previous iteration, or the average performance of the evaluated candidate combinations of hyperparameters increased in the previous iteration; (Hu [Page 10, Paragraph 1]: “We train the auto-encoder and classifier simultaneously by minimizing the objective function Eq. 8. For training, we use the Adam optimization method” Hu teaches using the Adam optimization method, which is a modified form of the popular gradient descent for multiple gradients (because of multiple objectives). That the fairness metric is increased in an iteration where the previous iteration the fairness decreased is the definition of gradient descent, as the solution tends toward the direction perpendicular to the gradient.)
or the updated weighting of the performance metric at a current iteration is increased relative to that of a previous iteration if at least one of: the average fairness of the evaluated candidate combinations of hyperparameters increased in the previous iteration, or the average performance of the evaluated candidate combinations of hyperparameters decreased in the previous iteration. (Hu [Page 10, Paragraph 1]: “We train the auto-encoder and classifier simultaneously by minimizing the objective function Eq. 8. For training, we use the Adam optimization method” Hu teaches using the Adam optimization method, which is a modified form of the popular gradient descent for multiple gradients (because of multiple objectives). That the performance metric is increased in an iteration where the previous iteration the fairness decreased is the definition of gradient descent, as the solution tends toward the direction perpendicular to the gradient.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 16, Hu and Krasanakis teach:
The method of claim 14, 

Hu teaches:
wherein the updated weighting of at least one of the fairness evaluation metric and the updated weighting of the performance metric at a current iteration is determined based least in part on the average fairness and the average performance of already trained hyperparameter combinations. (Hu [Page 10, Paragraph 1]: “We train the auto-encoder and classifier simultaneously by minimizing the objective function Eq. 8. For training, we use the Adam optimization method” Hu teaches using the Adam optimization method, which is a modified form of the popular gradient descent for multiple gradients (because of multiple objectives). Gradient descent in this context involves keeping an average of the direction of the hyperparameter combinations to apply the change at the end.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 17, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the dynamic determination of the relative weighting between the fairness evaluation metric and the performance metric includes determining a weighting for the fairness evaluation metric based on an associated range of values for the fairness evaluation metric and a weighting for the performance metric based on an associated range of values for the performance metric. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset and α = 0.8, β = 0.4 for the Bank Marketing Dataset are selected.” Hu teaches evaluating the candidate combinations of hyperparameters and choosing a combination of hyperparameters by determining a relative weighting between the fairness evaluation metric and the performance metric, denoted β. Hu teaches an associated range of values for the fairness and the performance metrics by specifying the range of β. Note that the fairness metric coefficient is β and the fairness is 1-β, so specifying β specifies both.)

The reasons to combine are substantially similar to those of claim 1.

Regarding dependent claim 18, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
wherein the use of the selected hyperparameter combination to optimize the machine learning model to be trained includes selecting the machine learning model based at least in part on the average fairness and performance of all sampled hyperparameter combinations. (Hu [Page 10, Paragraph 1]: “In order to get the best α − β combination (see Eq. (5) and 7), grid search is operated within α ∈ [0.4, 0.5, 0.6, 0.7, 0.8, 0.9] and β ∈ [0.1, 0.2, 0.3, 0.4, 0.5]. Finally, α = 0.9, β = 0.2 for the Adult Census Income Dataset and α = 0.8, β = 0.4 for the Bank Marketing Dataset are selected.” Hu teaches evaluating the candidate combinations of hyperparameters and choosing a combination of hyperparameters by determining a relative weighting between the fairness evaluation metric and the performance metric. If the average fairness and performance of all sampled hyperparameter combinations is lower than that of a particular combination, and that no other combination has a greater score, then that particular combination can be chosen. In this case, the selected hyperparameter combination may be chosen in part based on the average of all sampled hyperparameter combinations.)

The reasons to combine are substantially similar to those of claim 1.

Claim 2, 19, 20, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Krasanakis in view of Denolf et al. (US 20200104715 A1) hereinafter known as Denolf.

Regarding dependent claim 2, Hu and Krasanakis teach:
The method of claim 1, 

Denolf teaches:
wherein the scalarization reduces objectives of the multi- objective optimization to a single scalar output and includes a weighted lp-norm. (Denolf ¶[0040]: “Linear scalarization, g=Σw.sub.if.sub.i(x), where w.sub.i>0 is a weight associated with each objective function; and L.sub.p norm, g=∥f−z∥.sub.p, where f={f.sub.1(x), f.sub.2(x), . . . , f.sub.k(x)}, and z∈R.sup.k is a vector of ideal cost values.” Denolf teaches using a weighted lp norm in the scalarization of the weights.)

Denolf is in the same field of endeavor as the present invention, since it is directed to combining multiple objective functions into one objective function to train a neural network. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine combining a fairness objective function with a performance objective function as taught in Hu as modified by Krasanakis with combining different objective functions using scalarization as taught in Denolf. Denolf provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Hu as modified by Krasanakis to include teachings of Denolf because the combination would allow for different combinations of varying relative strengths of the fairness objective and the performance objective. This has the potential benefit of choosing the multiple objective function with the relative strength of fairness to performance that results in the most accurate and fair model.

Claim 19 is substantially similar to claim 1, but has the following additional elements:
Regarding independent claim 19, Denolf teaches:
A system, comprising: a processor configured to: (Denolf ¶[0031]: “the CPU 206 can be any type of general-purpose central processing unit (CPU), such as an x86-based processor, ARM®-based processor, or the like” Denolf teaches a system comprising a processor, of different possible types.)
and a memory coupled to the processor and configured to provide the processor with instructions. (Denolf ¶[0032]: “The system memory 208 can store data 226 and program code (“code 228”) processed and executed by the CPU 206 to implement the software platform 204.” Denolf teaches memory coupled to the CPU that provides the processor with instructions to be executed.)

The reasons to combine are substantially similar to those of claim 2.


Claim 20 is substantially similar to claim 1, but has the following additional elements:
Regarding independent claim 20, Denolf teaches:
A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: (Denolf ¶[0007]: “a non-transitory computer readable medium comprising instructions, which when executed in a computer system, causes the computer system to carry out a method of implementing a neural network includes” Denolf teaches a computer product embodied in a non-transitory computer readable medium comprising instructions.)

The reasons to combine are substantially similar to those of claim 2.

Regarding dependent claim 21, Hu and Krasanakis teach:
The method of claim 1, 

Denolf teaches:
further comprising determining candidate combinations of hyperparameters of the machine learning model including by applying at least one of the following hyperparameter tuners: Random Search, Tree Parzen Estimator, or bandit-based hyperparameter tuner. (Denolf ¶[0051]: “A random search is conceptually similar to a grid search, except that a random search picks random values from a specified range for each hyperparameter, rather than selecting them from a grid.”)

The reasons to combine are substantially similar to those of claim 2.

Claims 6 is rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Krasanakis in view of Elbsat (US 20200301408 A1) hereinafter known as Elbsat.

Regarding dependent claim 6, Hu and Krasanakis teach:
The method of claim 1, 

Hu teaches:
… the updated relative weighting between the fairness evaluation metric and the performance metric includes a fairness evaluation metric weight and a performance metric weight … and sums to 1. (Hu [Page 9, Paragraph 1]: “Therefore, Equalized Odds (Eq.Odds) is used as the constraint term and added to the classification loss … where β ∈ [0, 1), is a balancing coefficient between the classification loss Lbce and the Eq.Odds fairness regularization.” Hu teaches that the terms of the multi-objective loss function are weighted such that the sum is 1.)

Hu and Krasanakis do not explicitly teach:
… that are inversely proportional …

However, Elbsat teaches:
… that are inversely proportional … (Elbsat ¶[0335]: “Due to the inverse relationship between φ.sub.1′ and COP, input weighter 1508 can generate a weighting function that assigns weights that are inversely proportional to the value of φ.sub.1′.” Elbsat teaches a weighting of a metric that is inversely proportional.)

Elbsat is in the same field as the present invention, since it is directed to weighting metrics in an inversely proportional manner. It would have been obvious, before the effective filing date of the claimed invention, to a person of ordinary skill in the art, to combine generating a multiple objective function with the both fairness and performance metrics represented in the objective function as taught in Hu as modified by Krasanakis with weighting the metrics inversely proportionally as taught in Elbsat. Elbsat provides this additional functionality. As such, it would have been obvious to one of ordinary skill in the art to modify the teachings of Hu as modified by Krasanakis to include teachings of Elbsat because the combination would allow for the weighting of the fairness evaluation metric to be inversely proportional. This has the potential benefit of weighting the fairness metric in the direction that maximizes it, rather than minimizes it, in the multi-objective function.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KYU HYUNG HAN whose telephone number is (703) 756-5529.  The examiner can normally be reached on MF 9-5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on (571) 270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Kyu Hyung Han/
Examiner
Art Unit 2123

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Prosecution Timeline

Jul 08, 2021
Application Filed
Feb 28, 2025
Non-Final Rejection — §101, §103
Jun 09, 2025
Applicant Interview (Telephonic)
Jun 09, 2025
Examiner Interview Summary
Jun 10, 2025
Response Filed
Sep 15, 2025
Final Rejection — §101, §103
Dec 15, 2025
Applicant Interview (Telephonic)
Dec 15, 2025
Examiner Interview Summary
Dec 18, 2025
Request for Continued Examination
Jan 06, 2026
Response after Non-Final Action
Mar 06, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/332,295
Patent 12585928
HARDWARE ARCHITECTURE FOR INTRODUCING ACTIVATION SPARSITY IN NEURAL NETWORK
2y 5m to grant Granted Mar 24, 2026
17/317,300
Patent 12387101
SYSTEMS AND METHODS FOR PRUNING BINARY NEURAL NETWORKS GUIDED BY WEIGHT FLIPPING FREQUENCY
2y 5m to grant Granted Aug 12, 2025
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
43%
Grant Probability
85%
With Interview (+41.7%)
4y 6m
Median Time to Grant
High
PTA Risk
Based on 7 resolved cases by this examiner. Grant probability derived from career allow rate.