Last updated: May 29, 2026
Application No. 17/475,837
FEATURE SELECTION USING MULTIVARIATE EFFECT OPTIMIZATION MODELS

Non-Final OA §103
Filed
Sep 15, 2021
Priority
May 11, 2021 — provisional 63/187,269
Examiner
KIM, HARRISON CHAN YOUNG
Art Unit
2145
Tech Center
2100 — Computer Architecture & Software
Assignee
Paypal Inc.
OA Round
4 (Non-Final)
Interview Optional

— +33.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 50% grant rate with +33.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 6 resolved cases, 2023–2026
Examiner Intelligence

KIM, HARRISON CHAN YOUNG View full profile →
Grants 50% of resolved cases
Career Allowance Rate
3 granted / 6 resolved
-5.0% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
5.6%
-34.4% vs TC avg
§103
94.4%
+54.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 6 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action, responsive to the RCE filed 9/9/2025, is made non-final. 
Claims 1-20 are pending. Claims 1, 8, and 15 are independent claims.

Reason for 2nd Non-Final Rejection
Applicant notified examiner of a citation error in the non-final rejection dated 10/20/2025. Specifically, the office action omitted a proper citation for Siddiqui et al. (US 20200349161 A1) on pages 3 and 4. This action includes the correct citation and paragraph numbers for the Siddiqui reference.

Response to Arguments
Applicant's arguments, filed 9/9/2025, regarding the 35 U.S.C. 101 rejections of the now amended claims, have been fully considered and are persuasive. Therefore, the rejections have been withdrawn.
Applicant’s arguments, filed 9/9/2025, regarding the 35 U.S.C. 103 rejections of the office action dated 7/2/2025 have been fully considered but are unpersuasive. Applicant argues that the cited references do not teach or suggest “using quantum computing to determine ground states of qubits corresponding to values for variables in a hybrid optimization model to determine which features to include in a reduced feature set”. Quantum annealing involves finding low energy state solutions to optimization problems, and Milne uses quantum annealing to select features for credit data classification on a computer (i.e., qubits are used) – see Milne, pg. 7, Section 6, ¶5 and code section, The poly object is passed to a solver and the solution is returned as a list of ones and zeros, referred to as a configuration…

    PNG
    media_image1.png
    97
    642
    media_image1.png
    Greyscale

Applicant further argues that the cited references do not teach or suggest: determining ground states of qubits corresponding to values for variables in the hybrid optimization model includes: (1) identifying multivariate higher-order interactions between different groups of the plurality of features, including identifying a measure of relevancy between feature pairs of the plurality of features and the set of labels for the plurality of data samples and a measure of redundancy between groups of three or more of the plurality of features, and (2) identifying conditional mutual information between a first feature and the set of labels for the plurality of features provided that a group of two or more other features are selected for inclusion in the reduced feature set, and then combining first results of identifying multivariate higher-order interactions and second results of identifying conditional mutual information by weighting the first results and the second results differently using the same hyperparameter value. These limitations will be discussed further below in the 35 U.S.C. 103 rejection section.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 8-11, 15 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Milne et al. ("Optimal feature selection in credit scoring and classification using a quantum annealer", INCLUDED IN IDS RECEIVED 9/15/21), herein Milne, in view of Perneti et al. (US 20220091915 A1), herein Perneti, Ishii (US 20130232140 A1), Beraha et al. (“Feature Selection via Mutual Information: New Theoretical Insights”, INCLUDED IN IDS), herein Beraha, and Siddiqui et al. (US 20200349161 A1), herein Siddiqui.
Regarding claim 1, Milne teaches: A method, comprising: accessing, by a computer system, a training dataset that includes: a plurality of data samples that include data values for a plurality of features; and a set of labels corresponding to the plurality of data samples (pg. 3, Section 2, ¶1-2, Assume that we have some data on past credit applicants… A credit observation, or sample, consists of an observation for each feature, some means of identifying the applicant, and the outcome – the outcome being a label of credit worthiness/unworthiness); and processing, by the computer system, the training dataset, wherein the processing includes executing a quantum computer to determine ground states of qubits corresponding to values for variables in a… optimization model (Abstract, quadratic unconstrained binary optimization (QUBO)… a QUBO implementation on a quantum annealer), wherein the ground states of the qubits indicate a subset of features, from the plurality of features, to include in a reduced feature set (pg. 7, Section 6, ¶5, The poly object is passed to a solver and the solution is returned as a list of ones and zeros, referred to as a configuration), and wherein determining the ground states of qubits corresponding to values for variables in the… optimization model includes: identifying multivariate higher-order interactions between… features, including a measure of relevancy between… features and the set of labels for the plurality of data samples and a measure of redundancy between… features… (pg. 5, Section 3, ¶7-8, We will associate the best subset with the value of x that minimizes an objective function, which we construct from two components. The first component represents the influence that features have on the marked class… The second component represents the independence – the influence that features have on the class is a measure of relevancy, and the independence of features is a measure of redundancy) weighting… differently using the same weighting parameter value (pg. 5, using a parameter α (0 ≤ α ≤ 1) to represent the relative weighting of independence (greatest at α = 0) and influence (greatest at α = 1) – relevancy is weighted by α while redundancy is weighted by 1- α) and training, by the computer system using the reduced feature set of the plurality of features, a machine learning model to classify data samples (pg. 6, Section 5, given a row vector u of new observations from a new applicant, calculate whether the vector belongs to the creditworthy class… existing data is divided into a training set and a test set).
Milne fails to teach: a hybrid model… hybrid… identifying conditional mutual information between a first feature and the set of labels for the plurality of features… and combining first results of identifying multivariate higher-order interactions and second results of identifying conditional mutual information.
However, in the same field of endeavor, Perneti teaches: hybrid… hybrid… identifying conditional mutual information between a first feature and the set of labels for the plurality of features… and combining first results of identifying multivariate higher-order interactions and second results of identifying conditional mutual information (¶49, two or more rankings of the plurality of features are determined using two or more FFS algorithms. The two or more FFS algorithms may include two or more of… a max-relevance min-redundancy (MRMR) algorithm… a conditional mutual information maximization (CMIM) algorithm – i.e., combining the max-relevancy min-redundancy algorithm and conditional mutual information algorithms).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use feature selection methods like conditional mutual information maximization and min-redundancy, max-relevancy in combination (i.e., a hybrid approach) as disclosed by Perneti in the feature selection method as disclosed by Milne to improve machine learning model performance (¶58, enables development of simpler and faster machine learning models).
	Milne in view of Perneti fails to teach: multivariate higher order interactions between different groups of the plurality of features… relevancy between feature pairs of the plurality of features… redundancy between groups of three or more of the plurality of features.
However, in the same field of endeavor, Ishii teaches: multivariate higher order interactions between different groups of the plurality of features… relevancy between feature pairs of the plurality of features… redundancy between groups of three or more of the plurality of features (¶52, For example, relevance using two or more features as arguments is calculated… ¶52, Redundancy using three or more features as arguments may be calculated).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use multivariate relevancy and redundancy terms in the objective function, as taught by Ishii, in the feature selection method disclosed by Milne in view of Perneti to select optimal features (¶24, it is possible to select appropriate features in polynomial time).
	Milne in view of Perneti and Ishii fails to teach: conditional mutual information provided that a group of two or more other features are selected for inclusion in the reduced feature set.
However, in the same field of endeavor, Beraha teaches: conditional mutual information provided that a group of two or more other features are selected for inclusion in the reduced feature set (pg. 2, Section II, Part B, This quantity intuitively represents the importance of the feature subset XA in predicting the target Y given that we are also using XĀ).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to determine conditional mutual information from groups of selected features as disclosed by Beraha in the method disclosed by Milne in view of Perneti and Ishii to offer more control over the feature selection process (pg. 2, Section I, explicitly control the ideal error introduced in the feature selection process).
	Milne in view of Perneti, Ishii and Beraha fails to teach: by weighting the first results and the second results differently.
	However, in the same field of endeavor, Siddiqui teaches: by weighting the first results and the second results differently (¶66, In some implementations, resource consumption model 305 may combine a plurality of such models (or any other models not expressly described herein) by weighting each of the models. For instance, resource consumption model 305 may weight any one or more of the models relatively heavier than other models).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to weight different models differently as disclosed by Siddiqui in the feature selection method disclosed by Milne in view of Perneti, Ishii and Beraha to improve model performance (¶66, to increase coverage, increase accuracy, and/or increase a combination of coverage and accuracy).

	Regarding claim 2, Milne further teaches: The method of claim 1, wherein the hybrid optimization model is a quadratic unconstrained binary optimization ("QUBO") model, and wherein determining the ground states of qubits (pg. 1, Abstract, Quadratic optimization scales exponentially with the number of features, but a QUBO implementation on a quantum annealer has the potential to be faster than classical solver – quantum annealing involves finding ground states of qubits) corresponds to a minimization of an objective function utilized in the QUBO model, wherein the minimization of the objective function corresponds to an output value that indicates the subset of features to include in the reduced feature set (pg. 5, We will associate the best subset with the value of x that minimizes an objective function – x is a vector of binary values indicating whether or not features are included in the reduced feature set, as defined on pg. 4).

Regarding claim 3, Milne further teaches: The method of claim 1, wherein the training includes: generating an updated training dataset that includes data values for the subset of features that are included in the reduced feature set, wherein the updated training dataset does not include second data values for one or more of the plurality of features that are not included in the reduced feature set (pg. 6, Section 5, ¶5, These 48 feature variables are the input to the feature selection algorithm. The feature subset at the output of the feature selector then forms the input to the classifier – the feature subset only includes information for the selected features).

Regarding claim 4, Milne further teaches: The method of claim 3, further comprising: subsequently processing, by the computer system, the training dataset based on a modified version of the hybrid optimization model (Figs. 7 and 8, comparing the performance of different feature subsets acquired by varying a hyperparameter – changing the hyperparameter creates a modified model) to select, from the plurality of features, a second subset of features to include in a second reduced feature set, wherein the second reduced feature set includes a different number of features than the reduced feature set (Figs. 7 and 8, the graph displays accuracy results for the classifier trained on different feature subsets of different sizes).

Regarding claim 8, Milne teaches: A method, comprising: accessing, by a computer system, a training dataset that includes a plurality of data samples, wherein a given one of the plurality of data samples includes: a label of an assigned classification for the given data sample; and data values for a plurality of features (pg. 3, Section 2, ¶1-2, Assume that we have some data on past credit applicants… A credit observation, or sample, consists of an observation for each feature, some means of identifying the applicant, and the outcome – the outcome being a label of credit worthiness/unworthiness); performing, by the computer system, a feature-selection operation to identify a reduced feature set from the plurality of features, wherein the feature-selection operation includes processing, via execution of a quantum computer to determine ground states of qubits corresponding to values for variables (pg. 7, Section 6, ¶5, The poly object is passed to a solver and the solution is returned as a list of ones and zeros, referred to as a configuration) in a… optimization model, the training dataset (Abstract, quadratic unconstrained binary optimization (QUBO)… a QUBO implementation on a quantum annealer), and wherein determining ground the states of qubits corresponding to values for variables in the… optimization model includes, identifying multivariate higher-order interactions between… features, including identifying a relevancy between the label for the given data sample and… features and a redundancy between… features… (pg. 5, Section 3, ¶7-8, We will associate the best subset with the value of x that minimizes an objective function, which we construct from two components. The first component represents the influence that features have on the marked class… The second component represents the independence – the influence that features have on the class is a measure of relevancy, and the independence of features is a measure of redundancy) weighting… using the same weighting parameter value (pg. 5, using a parameter α (0 ≤ α ≤ 1) to represent the relative weighting of independence (greatest at α = 0) and influence (greatest at α = 1) – relevancy is weighted by α while redundancy is weighted by 1- α); and based on the feature-selection operation, generating, by the computer system, an output value that indicates a subset of the plurality of features to include in the reduced feature set (pg. 5, Section 3, ¶7-8, We will associate the best subset with the value of x that minimizes an objective function – x is a vector of 1s and 0s that), wherein the output value is usable to identify the reduced feature set for training a machine learning model to classify data samples (pg. 6, Section 5, given a row vector u of new observations from a new applicant, calculate whether the vector belongs to the creditworthy class… existing data is divided into a training set and a test set).
Milne fails to teach: hybrid… hybrid… identifying a measure of conditional mutual information between a first feature and the set of labels for the plurality of features… and combining first results of identifying multivariate higher-order interactions and second results of identifying a measure of conditional mutual information.
However, in the same field of endeavor, Perneti teaches: hybrid… hybrid… identifying a measure of conditional mutual information between a first feature and the set of labels for the plurality of features… and combining first results of identifying multivariate higher-order interactions and second results of identifying a measure of conditional mutual information (¶49, two or more rankings of the plurality of features are determined using two or more FFS algorithms. The two or more FFS algorithms may include two or more of… a max-relevance min-redundancy (MRMR) algorithm… a conditional mutual information maximization (CMIM) algorithm – i.e., combining the max-relevancy min-redundancy algorithm and conditional mutual information algorithms).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use feature selection methods like conditional mutual information maximization and min-redundancy, max-relevancy in combination (i.e., a hybrid approach) as disclosed by Perneti in the feature selection method as disclosed by Milne to improve machine learning model performance (¶58, enables development of simpler and faster machine learning models).
	Milne in view of Perneti fails to teach: different groups of the plurality of features… feature pairs of the plurality of features… groups of three or more of the plurality of features.
However, in the same field of endeavor, Ishii teaches: Milne in view of Perneti fails to teach: different groups of the plurality of features… feature pairs of the plurality of features… groups of three or more of the plurality of features (¶52, For example, relevance using two or more features as arguments is calculated… ¶52, Redundancy using three or more features as arguments may be calculated).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use multivariate relevancy and redundancy terms in the objective function, as taught by Ishii, in the feature selection method disclosed by Milne in view of Perneti to select optimal features (¶24, it is possible to select appropriate features in polynomial time).
Milne in view of Perneti and Ishii fails to teach: conditional mutual information provided that a group of two or more other features are selected for inclusion in the reduced feature set.
However, in the same field of endeavor, Beraha teaches: conditional mutual information provided that a group of two or more other features are selected for inclusion in the reduced feature set (pg. 2, Section II, Part B, This quantity intuitively represents the importance of the feature subset XA in predicting the target Y given that we are also using XĀ).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to determine conditional mutual information from groups of selected features as disclosed by Beraha in the method disclosed by Milne in view of Perneti and Ishii to offer more control over the feature selection process (pg. 2, Section I, explicitly control the ideal error introduced in the feature selection process).
Milne in view of Perneti, Ishii and Beraha fails to teach: combining by weighting the first results and the second results differently.
However, in the same field of endeavor, Siddiqui teaches: combining by weighting the first results and the second results differently (¶66, In some implementations, resource consumption model 305 may combine a plurality of such models (or any other models not expressly described herein) by weighting each of the models. For instance, resource consumption model 305 may weight any one or more of the models relatively heavier than other models).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to weight different models differently as disclosed by Siddiqui in the feature selection method disclosed by Milne in view of Perneti, Ishii and Beraha to improve model performance (¶66, to increase coverage, increase accuracy, and/or increase a combination of coverage and accuracy).

Regarding claim 9, Milne further teaches: The method of claim 8, wherein the hybrid optimization model is a QUBO model, wherein the output value includes ground state spin information (pg. 1, Abstract, Quadratic optimization scales exponentially with the number of features, but a QUBO implementation on a quantum annealer has the potential to be faster than classical solver – quantum annealing involves finding ground states of qubits, i.e., spin state represented by a 1 or 0) that corresponds to a minimization of an objective function utilized in the QUBO model  corresponds to a minimization of an objective function utilized in the QUBO model, wherein the minimization of the objective function corresponds to an output value that indicates the subset of features to include in the reduced feature set (pg. 5, We will associate the best subset with the value of x that minimizes an objective function – x is a vector of binary values indicating whether or not features are included in the reduced feature set, as defined on pg. 4).

	Regarding claim 10, Milne in view of Perneti, Beraha and Siddiqui fails to teach: The method of claim 9, wherein the objective function utilized in the QUBO model is usable to evaluate the relevancy between the label for the given data sample and groups of three or more of the plurality of features.
However, in the same field of endeavor, Ishii teaches: wherein the objective function utilized in the QUBO model is usable to evaluate the relevancy between the label for the given data sample and groups of three or more of the plurality of features (¶52, For example, relevance using two or more features as arguments is calculated).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use relevancy between data labels and groups of features as taught by Ishii in the feature selection method of Milne in view of Perneti and Beraha to select optimal features (¶24, it is possible to select appropriate features in polynomial time).

Regarding claim 11, Milne further teaches: The method of claim 9, further comprising: training, by the computer system, a first machine learning model based on the reduced feature set (pg. 6, Section 5, given a row vector u of new observations from a new applicant, calculate whether the vector belongs to the creditworthy class… existing data is divided into a training set and a test set).

Regarding claim 15, it recites similar limitations as claim 8 and is rejected on the same grounds – see above.

Regarding claim 17, it recites similar limitations as claim 9 and is rejected on the same grounds – see above.

Regarding claim 18, Milne in view of Perneti, Beraha and Siddiqui fails to teach: The method of claim 15,wherein the determining further includes further identifying a measure of redundancy between groups of three or more of the plurality of features.
However, in the same field of endeavor, Ishii teaches: wherein the determining further includes further identifying a measure of redundancy between groups of three or more of the plurality of features (¶52, Redundancy using three or more features as arguments may be calculated).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use redundancy between data labels and groups of features as taught by Ishii in the feature selection method of Milne in view of Perneti and Beraha to select optimal features (¶24, it is possible to select appropriate features in polynomial time).

Regarding claim 19, Milne in view of Perneti, Ishii and Siddiqui fails to teach: The method of claim 15, wherein the determining further includes: identifying a measure of mutual information between groups of two or more features and the set of labels for the plurality of data samples.
However, in the same field of endeavor, Beraha teaches: identifying a measure of mutual information between groups of two or more features and the set of labels for the plurality of data samples (pg. 2, Section II, Part B, This quantity intuitively represents the importance of the feature subset XA in predicting the target Y given that we are also using XĀ).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to determine mutual information from groups of selected features as disclosed by Beraha in the method disclosed by Milne in view of Perneti and Ishii to offer more control over the feature selection process (pg. 2, Section I, explicitly control the ideal error introduced in the feature selection process).

Regarding claim 20, Milne further teaches: The method of claim 15, further comprising: generating, by the computer system, an updated training dataset that includes data values for a subset of features that are included in the reduced feature set, wherein the updated training dataset does not include second data values for one or more of the plurality of features that are not included in the reduced feature set (pg. 6, Section 5, ¶5, These 48 feature variables are the input to the feature selection algorithm. The feature subset at the output of the feature selector then forms the input to the classifier – the feature subset only includes information for the selected features); and training, by the computer system, a first machine learning model based on the updated training dataset (Figs. 7 and 8, the graph displays accuracy results for the classifier trained on different feature subsets of different sizes).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Milne in view of Perneti, Ishii, Beraha and Siddiqui as applied to claim 4 above, and further in view of Abramoff et al. (20100220906 A1), herein Abramoff.
Regarding claim 5, Milne in view of Perneti, Ishii, Beraha and Siddiqui fails to teach: The method of claim 4, further comprising: generating, by the computer system, a second updated training dataset that includes data values for the second subset of features that are included in the second reduced feature set; training, by the computer system, a second machine learning model based on the second updated training dataset; comparing, by the computer system, a performance of the machine learning model and the second machine learning model; and based on the comparing, selecting, by the computer system, one of the reduced feature set and the second reduced feature set as a final feature set for the training dataset. 
However, in the same field of endeavor, Abramoff teaches: generating, by the computer system, a second updated training dataset that includes data values for the second subset of features that are included in the second reduced feature set; training, by the computer system, a second machine learning model based on the second updated training dataset; comparing, by the computer system, a performance of the machine learning model and the second machine learning model; and based on the comparing, selecting, by the computer system, one of the reduced feature set and the second reduced feature set as a final feature set for the training dataset (¶80, By repeatedly testing with a subset of all features, which changes, and comparing the performance of the classifier with different feature subsets on the same test set, an optimal subset of features can be found).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select a final feature set as disclosed by Abramoff in the feature selection method disclosed by Milne in view of in view of Ishii, Perneti, Beraha and Siddiqui to achieve the most accurate results (¶73, creating a plurality of feature sets from the set independent components at 404, and selecting one of the plurality of feature sets that is optimal for classification according to a metric at 405).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Milne in view of Perneti, Ishii, Beraha and Siddiqui as applied to claim 1 above, and further in view of Costa et al. (20170011534 A1), herein Costa.
Regarding claim 6, Milne in view of Perneti, Ishii, Beraha and Siddiqui fails to teach: The method of claim 1, wherein the hybrid optimization model uses the Pearson correlation coefficient to determine the measure of relevancy between the feature pairs of the plurality of features and the set of labels for the plurality of features.
However, in the same field of endeavor, Costa teaches: wherein the hybrid optimization model uses the Pearson correlation coefficient to determine the measure of relevancy between the feature pairs of the plurality of features and the set of labels for the plurality of features (¶38, In the feature selection act LP103, only very informative features among all the features generated in the previous act are kept, by analyzing correlations of the features… using, for example, maximum relevance minimum redundancy, Pearson's correlation coefficient…).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the Pearson correlation coefficient as disclosed by Costa in the feature selection method of Milne in view of Ishii, Perneti, Beraha and Siddiqui to measure relevancy, thus reducing the number of features without discarding important ones (¶61, to keep a reduced number of informative features).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Milne in view of Perneti, Ishii, Beraha and Siddiqui as applied to claim 1 above, and further in view of Hafez et al. (20210319906 A1), herein Hafez.
Regarding claim 7, Milne in view of Perneti, Ishii, Beraha and Siddiqui fails to teach: The method of claim 1, wherein the hybrid optimization model uses a combination of a first relevancy measure and a second relevancy measure to determine the measure of relevancy between the feature pairs of the plurality of features and the set of labels for the plurality of data samples.
	However, in the same field of endeavor, Hafez teaches: wherein the hybrid optimization model uses a combination of a first relevancy measure and a second relevancy measure to determine the measure of relevancy between the feature pairs of the plurality of features and the set of labels for the plurality of data samples (¶249, Identify and store features which were most important in driving the predictions, based on the feature selection method(s) selected using one or more of: a) Spearman correlation between the feature and predictions, b) Pearson correlation between the feature and predictions, c) Kendall correlation between the feature and predictions – the most relevant features are ones which contribute the most to classification predictions).
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to add the second measure of relevancy of Hafez in the feature selection method disclosed by Milne in view of Perneti, Ishii, Beraha and Siddiqui to generate smaller, better subsets (¶378, allow for more robust feature importance evaluation and improved feature selection).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Milne in view of Perneti, Ishii, Beraha and Siddiqui as applied to claim 11 above, and further in view of Abramoff.
Regarding claim 12, Milne further teaches: The method of claim 11, further comprising: subsequently performing, by the computer system, a second feature-selection operation to identify, from the plurality of features, a second subset of features to include in a second reduced feature set, wherein the second feature-selection operation includes processing the training dataset using a modified version of the QUBO model, wherein the second reduced feature set includes a different number of features than the reduced feature set, training, by the computer system, a second machine learning model based on the second reduced feature set; comparing, by the computer system, a performance of the first and second machine learning models (Figures 7 and 8, comparing the performance of different subsets selected by varying a hyperparameter).
Milne in view of Perneti, Ishii, Beraha and Siddiqui fails to explicitly teach based on the comparing, selecting, by the computer system, one of the reduced feature set and the second reduced feature set as a final feature set for the training dataset.
However, in the same field of endeavor, Abramoff teaches based on the comparing, selecting, by the computer system, one of the reduced feature set and the second reduced feature set as a final feature set for the training dataset. (¶80, By repeatedly testing with a subset of all features, which changes, and comparing the performance of the classifier with different feature subsets on the same test set, an optimal subset of features can be found).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the feature selection process of Milne in view of Perneti, Ishii, Beraha and Siddiqui by adding the step to select a final feature set as disclosed by Abramoff in order to achieve the most accurate results (¶73, creating a plurality of feature sets from the set independent components at 404, and selecting one of the plurality of feature sets that is optimal for classification according to a metric at 405).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Milne in view of Perneti, Ishii, Beraha and Siddiqui as applied to claim 8 above, and further in view of Orlandi et al. (20220072425 A1), herein Orlandi.
Regarding claim 13, Milne in view of Perneti, Ishii, Beraha and Siddiqui fails to teach: The method of claim 8, wherein the optimization model uses the Spearman's rank correlation coefficient to evaluate the redundancy between the groups of three or more of the plurality of features.
However, in the same field of endeavor, Orlandi teaches: wherein the optimization model uses the Spearman's rank correlation coefficient to evaluate the redundancy between the groups of three or more of the plurality of features (¶89, A statistical approach, Spearman's Rank Correlation Coefficient can be used to eliminate these redundant features from the selection process).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the feature selection process disclosed by Milne in view of Perneti, Ishii, Beraha and Siddiqui by using the Spearman correlation coefficient, as disclosed by Orlandi, in order to reduce overfitting (¶105, the classifier can be optimized to avoid model overfitting without compromising classifier performance).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Milne in view of Perneti, Ishii, Beraha and Siddiqui as applied to claim 8 above, and further in view of Nayak et al. ("Elitism based Multi-Objective Differential Evolution for feature selection: A filter approach with an efficient redundancy measure", 2020), herein Nayak.
Regarding claim 14, Milne in view of Perneti, Ishii, Beraha and Siddiqui fails to teach: The method of claim 8, wherein the optimization model uses a combination of a first redundancy measure and a second redundancy measure to evaluate the redundancy between the groups of three or more features.
However, in the same field of endeavor, Nayak teaches: wherein the optimization model uses a combination of a first redundancy measure and a second redundancy measure to evaluate the redundancy between the groups of three or more features (Section 3.1, However, in most of the cases, people have considered only one of the redundancy measures (Xue et al., 2016, Wang et al., 2015) to identify the dependency… Considering this point, both PCC and MI have been used).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the feature selection process of Milne in view of Perneti, Ishii, Beraha and Siddiqui to use two redundancy measures as disclosed by Nayak in order to capture both linear and non-linear relationships between features (Section 3.1, for calculation of linear and nonlinear dependency respectively).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Milne in view of Perneti, Ishii, Beraha and Siddiqui as applied to claim 15 above, and further in view of Łazęcka, et al. ("Analysis of Information-Based Nonparametric Variable Selection Criteria", 2020), herein Łazęcka.
Regarding claim 16, Milne in view of Ishii and Siddiqui fails to teach: The method of claim 15, wherein the hybrid optimization model further evaluates a second measure of conditional mutual information.
However, in the same field of endeavor, Perneti teaches: wherein the hybrid optimization model further evaluates a second measure of conditional mutual information (¶49, The two or more FFS algorithms may include two or more of: ... a conditional mutual information maximization (CMIM) algorithm; ...and a conditional infomax feature extraction (CIFE) algorithm).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use a second measure of conditional mutual information as disclosed by Perneti in the method disclosed by Milne in view of Ishii, Beraha and Siddiqui to improve machine learning model performance (¶58, enables development of simpler and faster machine learning models).
Łazęcka uses CIFE as a measure of conditional mutual information (Section 2, Preliminaries, Observe that, in (16), we take into account not only relevance of the candidate feature, but also the possible interactions between the already selected features and the candidate feature – (16) refers to an equation that represents CIFE).
Milne in view of Perneti, Ishii, Siddiqui and Łazęcka fails to teach: a measure of conditional mutual information between a first plurality of features and the set of labels for the plurality of data samples provided that one or more additional features are also selected for inclusion in the reduced feature set. 
However, in the same field of endeavor, Beraha teaches: a measure of conditional mutual information between a first plurality of features and the set of labels for the plurality of data samples provided that one or more additional features are also selected for inclusion in the reduced feature set (pg. 2, Section II, Part B, This quantity intuitively represents the importance of the feature subset XA in predicting the target Y given that we are also using XĀ).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to determine conditional mutual information from groups of selected features as disclosed by Beraha in the method disclosed by Milne in view of Perneti, Ishii, Siddiqui and Łazęcka to offer more control over the feature selection process (pg. 2, Section I, explicitly control the ideal error introduced in the feature selection process).


Conclusion 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HARRISON CHAN YOUNG KIM whose telephone number is (571)272-0713. The examiner can normally be reached Monday - Thursday 10:00 am - 7:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached at (571) 272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HARRISON C KIM/               Examiner, Art Unit 2145



/CESAR B PAULA/               Supervisory Patent Examiner, Art Unit 2145
Read full office action
Prosecution Timeline

Show 10 earlier events
Sep 09, 2025
Request for Continued Examination
Sep 18, 2025
Response after Non-Final Action
Oct 20, 2025
Non-Final Rejection mailed — §103
Jan 03, 2026
Interview Requested
Jan 15, 2026
Non-Final Rejection mailed — §103
Mar 16, 2026
Interview Requested
Mar 31, 2026
Applicant Interview (Telephonic)
Mar 31, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/986,920
Patent 12608607
METHOD FOR PREDICTING REMAINING USEFUL LIFE OF RAILWAY TRAIN BEARING BASED ON CAN-LSTM
3y 5m to grant Granted Apr 21, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

4-5
Expected OA Rounds
50%
Grant Probability
83%
With Interview (+33.3%)
3y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 6 resolved cases by this examiner. Grant probability derived from career allowance rate.