Office Action Analysis: 18098532 — MULTILEVEL OVERSAMPLER

Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment/Arguments
	1. Applicant’s arguments to the rejection under 35 U.S.C. 101 filed on December 8, 2025 have been fully considered but are not persuasive. Applicant asserts that the amended claims do not recite a mental process because the steps are performed “electrically” and “by one or more processors,” and because training using gradient descent allegedly cannot be performed in the human mind or with pen and paper. 
	The Examiner disagrees. As explained in MPEP 2106.04(a)(2)(III), the courts do not distinguish between mental processes performed entirely in the human mind and those performed with the assistance of physical aids such as pen and paper or basic computational tools. See Gottschalk v. Benson, 409 U.S. 63 (1972); CyberSource Corp v. Retail Decisions, Inc., 654 F.3d 1366 (Fed. Cir. 2011); Synopsys, Inc. v. Mentor Graphics Corp., 839 F.3d 1138 (Fed. Cir. 2016). The use of physical aid to perform a mathematical calculation does not negate the mental nature of the limitation. Nor do courts distinguish between mental processes performed by humans and the same processes performed on a generic computer. See Versata Dev. Group v. SAP Am., Inc., 793 F.3d 1306 (Fed. Cir. 2015);  Mortgage Grader, Inc. v. First Choice Loan Servs. Inc., 811 F.3d 1314, 1324 (Fed. Cir. 2016).
	The claims recite operations including generating values representing distance between samples, generating a sampling rate based on those values, comparing model outputs with ground-truth values, determining the differences between those values, determining gradients based on those differences, and updating model parameters using gradient-descent optimization. These steps constitute mathematical calculations, evaluations, and optimizations. Even if such calculations are performed using basic computational tools (e.g., a calculator or spreadsheet), they remain mathematical and evaluative processes that fall within the mental process abstract idea grouping. The recitation that these operations are performed “electrically” or “by one or more processors” does not alter their character, as implementing mathematical calculations on a generic processor does not remove them from the mental process category.  Accordingly, the amended claims continue to recite a mental process under Step 2A Prong 1.
	Applicant further argues (Remarks, pp. 12-14) that the amended claims integrate the judicial exception into a practical application by improving bias reduction in machine-learning models. Applicant analogizes the claimed invention to an eligibility example and asserts that the claimed steps reflect an improvement to computer technology.
	The Examiner disagrees. The asserted improvement relates to improving the accuracy of a machine learning model by reducing bias in training data and optimizing model parameters using gradient descent. Improving model accuracy or prediction performance constitutes an improvement in the abstract mathematical model itself, not an improvement in the functioning of a computer or in another technological field. The claims do not recite any improvement of computer architecture, processor operation, memory functionality, network operation, or other technological components. Instead, the claims use a generic computer as a tool to perform mathematical calculations on data. 
	The recited steps of generating distance values, generating a sampling rate, comparing outputs to ground-truth values, determining gradients based on differences, updating parameters using gradient descent, and continuing iterations until a threshold accuracy is achieved reflect the application of mathematical techniques to improve prediction results. Such improvement of the quality of the output of a mathematical model does not integrate the judicial exception into a practical application under MPEP 2106.04(d)(1). The claims do not impose a meaningful limitation on the abstract idea beyond implementing mathematical calculations in a generic computer environment. 
	Applicant further argues (Remarks, pp. 14-15) that the claims subject matter solves a technical problem relating to model bias and accuracy, and that the specification describes efficiency improvements such as reduced computation time, improved accuracy relative to SMOTE, and reducing processing resource usage. 
Examiner respectfully disagrees. The asserted improvements relate to improved statistical performance of a machine-learning model and improved efficiency of a data-processing technique. Improvements in the mathematical performance of a model, including increased predictive accuracy or reduced training time resulting from application of a mathematical sampling technique, constitute improvements to the abstract mathematical method itself, not to computer functionality. The claims do not recite specific improvement to processor architecture, memory structure, data storage configuration, or other technological infrastructure. Instead, the claims apply mathematical operations and optimization techniques using generic computer components. 
	Alleged improvements in model accuracy, sampling efficiency, or resource usage resulting from application of mathematical techniques do not, by themselves, demonstrate integration into a practical application under MPEP 2106.04(d)(1), nor do they establish that the additional elements amount to significantly more than the judicial exception. The claims implement mathematical concepts using conventional computer components performing their ordinary functions.
	Accordingly, for reasons set forth above, the amended claims continue to recite abstract ideas under Step 2A and do not integrate the judicial exception into a practical application. Further, the additional elements, considered individually and in combination, do not amount to significantly more than the judicial exception, as they merely implement mathematical concepts using generic computer components performing well-understood, routine, and conventional functions. Therefore, the rejection of claims 1-20 under 35 U.S.C. 101 is maintained. 
	2. Applicant’s arguments regarding the rejection under 35 U.S.C. 103 filed on December 8, 2025 are not persuasive. Applicant argues (Remarks, pp. 15-16) that Abdi does not teach “generating a sample rate for the minority level using one or more values representing the distance,” and asserts that Abdi merely discloses an oversampling rate parameter (Orate) based on class size difference and separately discloses generating synthetic samples having the same Mahalanobis distance from their class mean. 
	The Examiner respectfully disagrees. Abdi discloses (p. 242, last paragraph) computing an oversampling rate parameter (Orate), which defines the quantity of synthetic minority samples to be generated. Abdi further discloses (p. 249, conclusion) that the MDO method, synthetic samples are generated such that they have the same Mahalanobis distance from their corresponding class mean. Thus, Abdi teaches both (1) determining the quantity of minority samples to be generated (the sample rate), and (2) generating those samples according to a constraint defined by Mahalanobis distance values. 
	Under the broadest reasonable interpretation, “generating a sample rate…using one or more values representing distance” does not require the rate be mathematically computed from the distance value itself. Rather, the claim language encompasses generating a sampling rate that governs the quantity of synthetic samples generated in a process that uses distance values to define how those samples are produced. Abdi’s disclosure of computing Orate and applying it within a distance-constrained synthetic generation process reasonably teaches this limitation. The oversampling rate determines the quantity of output, and the output generation is governed by Mahalanobis distance values. 
	Applicant asserts (Remarks, p. 17) that Abdi in view of Lin fails to teach or suggest one or more recited features of the amended independent claim 1.
	Examiner disagrees. As explained in the Office Action, Abdi teaches computing and oversampling rate parameter (Orate) that determines the quantity of synthetic minority samples to be generated, and further teaches generating synthetic samples governed by Mahalanobis distance. Under the broadest reasonable interpretation, this reasonably teaches generating a sample rate for the minority level using values representing distance. Lin further teaches intro-minority distance measurements that inform sampling considerations. 
	With respect to the new added training limitations. Chopra teaches training a machine-learning model using gradient descent, including determining gradients based on differences between predicted outputs and corresponding ground truth values, updating weights based on those gradients, and continuing training iterations until a threshold performance condition relative to ground truth is achieved. It would have been obvious to apply such conventional gradient-based optimization techniques to the model trained on the bias-adjusted dataset of Abdi in view of Lin in order to optimize performance, yielding predictable results. Accordingly, the cited combination teaches or suggests the recited limitations of amended independent claim 1.
	Applicant asserts (Remarks, p. 17) that independent claims 10 and 19, as well as dependent claims 2-9 and 11-18, and 20, are patentable for at least the same reasons discussed with respect to independent claim 1. 
	The Examiner disagrees. Independent claims 10 and 19 recite limitations similar in scope to amended independent claim 1, and for the reasons set forth above with respect to claim 1, the cited combination of Abdi, Lin, and Chopra teaches or suggests the recited subject matter. The dependent claims do not include additional limitations that would render them patentable over the cited references. Accordingly, the rejection of claims 1-20 under 35 U.S.C. 103 is maintained.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
101 Subject Matter Eligibility Analysis
Step 1: Claims 1-20 are within the four statutory categories (a process, machine, manufacture or composition of matter).
Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.
None of the claims represent an improvement to technology.
Claims 1-9 are directed to a method consisting of a series of steps, meaning that it is directed to the statutory category of process. Claims 10-20 are directed to storage mediums and processors which are machines.

Regarding claim 1, the following claim elements are abstract ideas:
extracting one or more samples from the first data set representing the minority level (This is an abstract idea of a “mental process.” It entails identifying the minority level (e.g., minority class/label) in a data set and selecting one or more records that match it – an action that can be performed by a person reviewing records and marking those with the minority designation using pen and paper. See MPEP 2106.04(a)(2)(III).);
generating one or more values representing a distance between a sample of the minority level and the one or more samples of the plurality of samples of the minority level (This is an abstract idea of a “mental process.” It entails calculating values that represent a distance between a given sample of the minority level and one or more other samples of the minority level. A person could manually observe the data points, apply a distance formula (e.g., Euclidean distance) with pen and paper, and record the results. These steps of observing, reasoning and performing mathematical calculations are activities that can practically be carried out in the human mind or with simple tools, and therefore falls within the mental process grouping of abstract ideas.);
generating a sample rate for the minority level using the one or more values representing the distance (This is an abstract idea of a “mental process.” It entails deriving a sampling rate for the minority level from previously computed distance values. A person could, with observation and reasoning, review the distances, perform hand calculations (e.g., normalize, threshold, or invert the distances) using pen and paper, and assign a corresponding sample rate. Such steps – observing, reasoning, and performing arithmetic – can practically be done mentally or with simple tools and therefore fall within the mental process grouping of abstract ideas.);
generating a second data set by (i) sampling a second set of samples from a distribution using the sample rate and (ii) combining the second set of samples with the first data set (This is an abstract idea of a “mental process.” It entails (i) using observation and reasoning to apply a sampling rate to a distribution and hand-select a second set of samples – e.g., with pen-and-paper math to compute how many to take (proportions, thresholds) and simple manual randomization (coin tosses, number tables) – and (ii) combining those selected samples with the first data set by listing/appending them. These steps can be carried out in the human mind or with basic tools.);
comparing the output with a ground truth value included in a portion of the second data set not provided to the machine learning model (This is an abstract idea of a “mental process.” It entails comparing the model’s output against a known ground-truth value that was withheld from training. A person could, by observation and reasoning, write down the model’s predicted value, look up a corresponding correct answer in the unused portion of the data set, and with pen-and-paper math mark whether they match or differ. Such reviewing, matching, and evaluating steps can be carried out in the human mind or with simple tools, and therefore fall within the mental process grouping of abstract ideas.);
adjusting the machine learning model using one or more values indicating the comparison between the output and the ground truth value (This is an abstract idea of a “mental process.” It entails reviewing values that indicate how the output compares to ground truth, and – through observation, reasoning, and pen-and-paper calculations – deciding how to adjust the model (e.g., manually increasing/decreasing a weight, learning rate, or rule). Such interpretation and hand-tuning can be performed in the human mind or with simple tools.).
training the machine learning model using one or more training algorithms, including a gradient descent algorithm (This is an abstract idea of a mathematical concept. The limitation involves computing gradients and iteratively minimizing an error function to adjust model parameters. Such operations rely on mathematical relationships and optimization calculations and therefore fall within the mathematical concepts grouping of abstract ideas.)
determining one or more gradients based on differences between the output and the ground truth value (This is an abstract idea of a mental process and mathematical concept. The limitation involves computing differences between values and deriving gradients from those differences, which are mathematical relationships used in optimization. Such evaluation and calculation can be performed in the human mind or with the aid of pen and paper or basic computational tools such as a calculator, and therefore falls within the mathematical concepts and mental process groupings of abstract ideas.);
wherein the threshold accuracy is defined by a degree of similarity between the output and the ground truth value (This is an abstract idea of a mental process. The limitation involves defining a threshold based on degree of similarity, which requires evaluating and deciding what level of similarity is sufficient. Such judgement can be performed in the human mind or with the aid of pen and paper and therefore falls within the mental process grouping of abstract ideas.).
The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
a machine-learning model (This is a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05.)
using bias-reduced training data to improve accuracy of the machine learning model in a computer system (This limitation amounts to insignificant extra-solution activity. It merely reflects the intended use of the processed data to improve model performance and does not impose any meaningful limitation on how the abstract idea is carried out. Accordingly, this is an instruction to apply the results of the abstract idea and does integrate the judicial exception into a practical application.)
electrically….by the one or more processors (This is merely an instruction to apply the abstract idea using generic computer components).
obtaining a first data set that includes data of multiple attribute levels including a minority level and a majority level (This limitation amounts to adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP 2106.05(g). Obtaining a first data set (i.e., mere data gathering in conjunction with an abstract idea) is directed to a well understood routine conventional activity of data transmission see MPEP 2106.05(d)(II)(i).); 
providing a portion of the second data set to a machine learning model (This limitation amounts to adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP 2106.05(g).);
obtaining output of the machine learning model representing a prediction using the portion of the second data set (The step of “obtaining output” is merely a generic computer function that amounts to invoking a trained model on input data and reading the result – i.e., storing/retrieving information in memory and transmitting/receiving data – which are well-understood, routine, and conventional activities. See MPEP 2106.05(d)(II).);
updating one or more weights or parameters of the machine learning model based on the determined gradients (This limitation recites a generic data processing operation involving modification of stored model values and therefore constitutes a well-understood, routine, and conventional activity. See MPEP 2106.05(d)(II)(iv).);
continuing training iterations until the output achieves a threshold accuracy (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).),
Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following additional elements, which taken alone or in combination with other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein the second data set includes more samples representing the minority level than the first data set (This limitation amounts to adding insignificant extra-solution activity. It merely states the resulting data set composition from routine sampling, duplication, and appending – i.e., data organization in memory – and does not integrate the exception into a practical application.).
Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, claim 3 recites the following abstract ideas:
generating one or more values representing an average of each feature describing each sample of the first data set (This is an abstract idea of a “mental process.” It involves reviewing feature values and hand-computing averages (e.g., arithmetic mean) with pen-and-paper math – steps performable in the human mind or with simple tools. See MPEP 2106.04(a)(2)(III).);
generate one or more values representing a correlation of each feature describing each sample of the first data set (This is an abstract idea of a “mental process.” It involves reviewing values and, with observation and reasoning, hand-computing correlations (e.g., Pearson/Spearman) using pen-and-paper math – steps performable in the human mind or with simple tools.).
Regarding claim 4, the rejection of claim 3 is incorporated herein. Further, claim 4 recites the following abstract ideas:
wherein the one or more values representing the correlation include a covariance matrix (This recites a mathematical concept – a covariance matrix – falling within the mathematical relationships/formulas/calculations grouping. See MPEP 2106.04(a)(2)(I). Further, to the extent it merely specifies the format/content in which correlation results are kept, it is insignificant extra-solution activity – routine organization/storage of information in memory. See MPEP 2106.05(g) and MPEP 2106.05(d)(II)(iv).).
Regarding claim 5, the rejection of claim 3 is incorporated herein. Further, claim 5 recites the following abstract ideas:
generating a transposition of the one or more values indicating the distance (This is an abstract idea of a “mental process” and mathematical concept. It recites matrix/vector transposition – a mathematical operation within the mathematical relationships/formulas/calculations grouping – and, to the extent performed by observation and pen-and-paper reindexing of rows/columns, it is a mental process. Such rearrangement and calculations can be carried out in the human mind using simple tools, and therefore falls within these abstract-idea groupings. See MPEP 2106.04(a)(2)(I) and MPEP 2106.04(a)(2)(III).).
Regarding claim 6, the rejection of claim 1 is incorporated herein. Further, claim 6 recites the following abstract ideas:
combining a portion of the one or more values representing the distance (This is an abstract idea of a “mental process.” It entails selecting a subset of distance values and aggregating them (e.g., sum/average/weighted mix) by observation and pen-and-paper math – steps performable in the human mind or with simple tools.);
modifying the combination of the portion of the one or more values using a reciprocal of a combination of the one or more values representing the distance (This is an abstract idea of a “mental process” and mathematical concept. It recites mathematical operations – forming combinations and applying reciprocal (e.g., 1/x) to rescale/normalize – and , to the extent performed by observation and pen-and-paper arithmetic, it is a mental process (steps performable in the human mind and/or with simple tools).);
generating a matrix of values including the sample rate for the minority level using the modified combination of the portion of the one or more values (This is an abstract idea of a “mental process” and mathematical concept. It recites constructing a matrix from computed quantities and inserting the sample rate – mathematical relationships/calculations a person could carry out by observation and pen-and-paper tabulation.).
Regarding claim 7, the rejection of claim 1 is incorporated herein. Further, claim 7 recites the following abstract ideas:
generating a second sample rate for a second minority level of the first data set or the majority level using the one or more values representing the distance (This is an abstract idea of a “mental process.” It involves reviewing distance values for the specified class and, through observation and reasoning, performing pen and paper calculations (e.g., normalization, averaging, inversion) to derive a second sampling rate – steps performable in the human mind and/or with simple tools.),
The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein the second sample rate is less than the sample rate for the minority level (This is insignificant extra-solution activity. It merely specifies a comparative parameter choice/constraint for sampling rates – i.e., routine selection and organization of values (e.g., storing a number in memory) – and does not integrate the exception into a practical application.).
Regarding claim 8, the rejection of claim 1 is incorporated herein. Further, claim 8 recites the following additional elements, which taken alone or in combination with other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein the minority level represents one or more attributes with a likelihood of bias higher than one or more attributes represented by the majority level in the first data set (This is insignificant extra-solution activity. It merely specifies a label/semantics for the data – how records are designated as “minority” versus “majority” – amounting to routine organization/representation of information in memory, and does not integrate the exception into a practical application. See MPEP 2106.05(g) and MPEP 2106.05(d)(II).),
wherein a bias of the one or more attributes of the minority level in the second data set is less than the one or more attributes of the minority level in the first data set (This is insignificant extra-solution activity. It merely states a resulting data-set property – a comparative bias level after prior processing – i.e., routine evaluation/recording and organization of information in memory, and does not integrate the exception into a practical application.).
Regarding claim 9, the rejection of claim 8 is incorporated herein. Further, claim 9 recites the following additional elements, which taken alone or in combination with other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
wherein a higher bias indicates a greater likelihood of inaccurate results from the machine learning model (This is insignificant extra-solution activity. It merely states an interpretive relationship/labeling of stored values – designating the higher “bias” corresponds to increased likelihood of inaccuracy – i.e., routine organization of information in memory and does not integrate into a practical application.).
Regarding claim 10, the following claim elements are abstract ideas:
extracting one or more samples from the first data set representing the minority level (This is an abstract idea of a “mental process.” It entails identifying the minority level (e.g., minority class/label) in a data set and selecting one or more records that match it – an action that can be performed by a person reviewing records and marking those with the minority designation using pen and paper. See MPEP 2106.04(a)(2)(III).);
generating one or more values representing a distance between a sample of the minority level and the one or more samples of the minority level (This is an abstract idea of a “mental process.” It entails calculating values that represent a distance between a given sample of the minority level and one or more other samples of the minority level. A person could manually observe the data points, apply a distance formula (e.g., Euclidean distance) with pen and paper, and record the results. These steps of observing, reasoning and performing mathematical calculations are activities that can practically be carried out in the human mind or with simple tools, and therefore falls within the mental process grouping of abstract ideas.);
generating a sample rate for the minority level using the one or more values representing the distance (This is an abstract idea of a “mental process.” It entails deriving a sampling rate for the minority level from previously computed distance values. A person could, with observation and reasoning, review the distances, perform hand calculations (e.g., normalize, threshold, or invert the distances) using pen and paper, and assign a corresponding sample rate. Such steps – observing, reasoning, and performing arithmetic – can practically be done mentally or with simple tools and therefore fall within the mental process grouping of abstract ideas.);
generating a second data set by (i) sampling a second set of samples from a distribution using the sample rate and (ii) combining the second set of samples with the first data set (This is an abstract idea of a “mental process.” It entails (i) using observation and reasoning to apply a sampling rate to a distribution and hand-select a second set of samples – e.g., with pen-and-paper math to compute how many to take (proportions, thresholds) and simple manual randomization (coin tosses, number tables) – and (ii) combining those selected samples with the first data set by listing/appending them. These steps can be carried out in the human mind or with basic tools.);
comparing the output with a ground truth value included in a portion of the second data set not provided to the machine learning model (This is an abstract idea of a “mental process.” It entails comparing the model’s output against a known ground-truth value that was withheld from training. A person could, by observation and reasoning, write down the model’s predicted value, look up a corresponding correct answer in the unused portion of the data set, and with pen-and-paper math mark whether they match or differ. Such reviewing, matching, and evaluating steps can be carried out in the human mind or with simple tools, and therefore fall within the mental process grouping of abstract ideas.);
adjusting the machine learning model using one or more values indicating the comparison between the output and the ground truth value (This is an abstract idea of a “mental process.” It entails reviewing values that indicate how the output compares to ground truth, and – through observation, reasoning, and pen-and-paper calculations – deciding how to adjust the model (e.g., manually increasing/decreasing a weight, learning rate, or rule). Such interpretation and hand-tuning can be performed in the human mind or with simple tools.).
training the machine learning model using one or more training algorithms, including a gradient descent algorithm (This is an abstract idea of a mathematical concept. The limitation involves computing gradients and iteratively minimizing an error function to adjust model parameters. Such operations rely on mathematical relationships and optimization calculations and therefore fall within the mathematical concepts grouping of abstract ideas.);
determining one or more gradients based on differences between the model output and corresponding ground truth values (This is an abstract idea of a mental process and mathematical concept. The limitation involves computing differences between values and deriving gradients from those differences, which are mathematical relationships used in optimization. Such evaluation and calculation can be performed in the human mind or with the aid of pen and paper or basic computational tools such as a calculator, and therefore falls within the mathematical concepts and mental process groupings of abstract ideas.);
wherein the threshold accuracy is defined by a degree of similarity between the model output and the ground truth values (This is an abstract idea of a mental process. The limitation involves defining a threshold based on degree of similarity, which requires evaluating and deciding what level of similarity is sufficient. Such judgement can be performed in the human mind or with the aid of pen and paper and therefore falls within the mental process grouping of abstract ideas.).
The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A non-transitory computer-readable medium (This is a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05.)
one or more processors (This a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05(f).)
a memory, wherein the memory is electrically coupled with the one or more processors (This a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05(f).)
a computer system (This is a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05.)
electrically….by the one or more processors (This is merely an instruction to apply the abstract idea using generic computer components).
obtaining a first data set that includes data of multiple attribute levels including a minority level and a majority level (This limitation amounts to adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP 2106.05(g). Obtaining a first data set (i.e., mere data gathering in conjunction with an abstract idea) is directed to a well understood routine conventional activity of data transmission see MPEP 2106.05(d)(II)(i).); 
providing a portion of the second data set to a machine learning model (This limitation amounts to adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP 2106.05(g).);
obtaining output of the machine learning model representing a prediction using the portion of the second data set (The step of “obtaining output” is merely a generic computer function that amounts to invoking a trained model on input data and reading the result – i.e., storing/retrieving information in memory and transmitting/receiving data – which are well-understood, routine, and conventional activities. See MPEP 2106.05(d)(II).);
updating one or more weights or parameters of the machine learning model based on the determined gradients (This limitation recites a generic data processing operation involving modification of stored model values and therefore constitutes a well-understood, routine, and conventional activity. See MPEP 2106.05(d)(II)(iv).); and
continuing training iterations until the model output achieves a threshold accuracy (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).),
Regarding claim 11, the rejection of claim 10 is incorporated herein. The claim recites similar limitations corresponding to claim 2. Therefore, the same subject matter analysis that was utilized for claim 2, as described above, is equally applicable to claim 11. 
Therefore, claim 11 is ineligible.
Regarding claim 12, the rejection of claim 10 is incorporated herein. The claim recites similar limitations corresponding to claim 3. Therefore, the same subject matter analysis that was utilized for claim 3, as described above, is equally applicable to claim 12. 
Therefore, claim 12 is ineligible.
Regarding claim 13, the rejection of claim 12 is incorporated herein. The claim recites similar limitations corresponding to claim 4. Therefore, the same subject matter analysis that was utilized for claim 4, as described above, is equally applicable to claim 13. 
Therefore, claim 13 is ineligible.
Regarding claim 14, the rejection of claim 12 is incorporated herein. The claim recites similar limitations corresponding to claim 5. Therefore, the same subject matter analysis that was utilized for claim 5, as described above, is equally applicable to claim 14. 
Therefore, claim 14 is ineligible.
Regarding claim 15, the rejection of claim 10 is incorporated herein. The claim recites similar limitations corresponding to claim 6. Therefore, the same subject matter analysis that was utilized for claim 6, as described above, is equally applicable to claim 15. 
Therefore, claim 15 is ineligible.
Regarding claim 16, the rejection of claim 10 is incorporated herein. The claim recites similar limitations corresponding to claim 7. Therefore, the same subject matter analysis that was utilized for claim 7, as described above, is equally applicable to claim 16. 
Therefore, claim 16 is ineligible.
Regarding claim 17, the rejection of claim 10 is incorporated herein. The claim recites similar limitations corresponding to claim 8. Therefore, the same subject matter analysis that was utilized for claim 8, as described above, is equally applicable to claim 17. 
Therefore, claim 17 is ineligible.
Regarding claim 18, the rejection of claim 17 is incorporated herein. The claim recites similar limitations corresponding to claim 9. Therefore, the same subject matter analysis that was utilized for claim 9, as described above, is equally applicable to claim 18. 
Therefore, claim 18 is ineligible.
Regarding claim 19, the following claim elements are abstract ideas:
extracting one or more samples from the first data set representing the minority level (This is an abstract idea of a “mental process.” It entails identifying the minority level (e.g., minority class/label) in a data set and selecting one or more records that match it – an action that can be performed by a person reviewing records and marking those with the minority designation using pen and paper. See MPEP 2106.04(a)(2)(III).);
generating one or more values representing a distance between a sample of plurality of samples of the minority level and the one or more samples of the plurality of samples of the minority level (This is an abstract idea of a “mental process.” It entails calculating values that represent a distance between a given sample of the minority level and one or more other samples of the minority level. A person could manually observe the data points, apply a distance formula (e.g., Euclidean distance) with pen and paper, and record the results. These steps of observing, reasoning and performing mathematical calculations are activities that can practically be carried out in the human mind or with simple tools, and therefore falls within the mental process grouping of abstract ideas.);
generating a sample rate for the minority level using the one or more values representing the distance (This is an abstract idea of a “mental process.” It entails deriving a sampling rate for the minority level from previously computed distance values. A person could, with observation and reasoning, review the distances, perform hand calculations (e.g., normalize, threshold, or invert the distances) using pen and paper, and assign a corresponding sample rate. Such steps – observing, reasoning, and performing arithmetic – can practically be done mentally or with simple tools and therefore fall within the mental process grouping of abstract ideas.);
generating a second data set by (i) sampling a second set of samples from a distribution using the sample rate and (ii) combining the second set of samples with the first data set (This is an abstract idea of a “mental process.” It entails (i) using observation and reasoning to apply a sampling rate to a distribution and hand-select a second set of samples – e.g., with pen-and-paper math to compute how many to take (proportions, thresholds) and simple manual randomization (coin tosses, number tables) – and (ii) combining those selected samples with the first data set by listing/appending them. These steps can be carried out in the human mind or with basic tools.);
comparing the output with a ground truth value included in a portion of the second data set not provided to the machine learning model (This is an abstract idea of a “mental process.” It entails comparing the model’s output against a known ground-truth value that was withheld from training. A person could, by observation and reasoning, write down the model’s predicted value, look up a corresponding correct answer in the unused portion of the data set, and with pen-and-paper math mark whether they match or differ. Such reviewing, matching, and evaluating steps can be carried out in the human mind or with simple tools, and therefore fall within the mental process grouping of abstract ideas.);
adjusting the machine learning model using one or more values indicating the comparison between the output and the ground truth value (This is an abstract idea of a “mental process.” It entails reviewing values that indicate how the output compares to ground truth, and – through observation, reasoning, and pen-and-paper calculations – deciding how to adjust the model (e.g., manually increasing/decreasing a weight, learning rate, or rule). Such interpretation and hand-tuning can be performed in the human mind or with simple tools.).
training the machine learning model using one or more training algorithms, including a gradient descent algorithm (This is an abstract idea of a mathematical concept. The limitation involves computing gradients and iteratively minimizing an error function to adjust model parameters. Such operations rely on mathematical relationships and optimization calculations and therefore fall within the mathematical concepts grouping of abstract ideas.);
determining one or more gradients based on differences between the model output and corresponding ground truth values (This is an abstract idea of a mental process and mathematical concept. The limitation involves computing differences between values and deriving gradients from those differences, which are mathematical relationships used in optimization. Such evaluation and calculation can be performed in the human mind or with the aid of pen and paper or basic computational tools such as a calculator, and therefore falls within the mathematical concepts and mental process groupings of abstract ideas.);
wherein the threshold accuracy is defined by a degree of similarity between the model output and the ground truth values (This is an abstract idea of a mental process. The limitation involves defining a threshold based on degree of similarity, which requires evaluating and deciding what level of similarity is sufficient. Such judgement can be performed in the human mind or with the aid of pen and paper and therefore falls within the mental process grouping of abstract ideas.).
The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
one or more processors (This is a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05.);
machine-readable media (This is a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05.)
electrically…by the one or more processors  (This is merely an instruction to apply the abstract idea using generic computer components).
obtaining a first data set that includes data of multiple attribute levels including a minority level and a majority level (This limitation amounts to adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP 2106.05(g). Obtaining a first data set (i.e., mere data gathering in conjunction with an abstract idea) is directed to a well understood routine conventional activity of data transmission see MPEP 2106.05(d)(II)(i).); 
providing a portion of the second data set to a machine learning model (This limitation amounts to adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP 2106.05(g).);
obtaining output of the machine learning model representing a prediction using the portion of the second data set (The step of “obtaining output” is merely a generic computer function that amounts to invoking a trained model on input data and reading the result – i.e., storing/retrieving information in memory and transmitting/receiving data – which are well-understood, routine, and conventional activities. See MPEP 2106.05(d)(II).);
updating one or more weights or parameters of the machine learning model based on the determined gradients  (This limitation recites a generic data processing operation involving modification of stored model values and therefore constitutes a well-understood, routine, and conventional activity. See MPEP 2106.05(d)(II)(iv).);: and
continuing training iterations until the model output achieves a threshold accuracy (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).),
Regarding claim 20, the rejection of claim 19 is incorporated herein. The claim recites similar limitations corresponding to claim 2. Therefore, the same subject matter analysis that was utilized for claim 2, as described above, is equally applicable to claim 20. 
Therefore, claim 20 is ineligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under the 35 U.S.C. 103 as being unpatentable over Abdi et al., (NPL: “To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques” (Published: 2016)) in view of Lin et al., (Pub. No.: US 20150356464 A1 (Filed: 2015)) further in view of Chopra et al., (Pub. No.: US 20210256387 A1 (Filed: 2020).

Regarding claim 1, Abdi teaches:
A method for training a machine-learning model using bias-reduced training data to improve accuracy of the machine learning model in a computer system, the method comprising (Abdi, Introduction “In recent years, with the accelerated developments in science and technology and availability of data, there is a need for more robust and accurate learning algorithms. Existence of imbalanced distributions among these data is very prevalent. In fact, a data set with unequal number of instances for different classes is called imbalanced data set. This skewness in the data underlying distribution causes many problems for typical machine learning algorithms…Simply said, the key point of learning is to obtain a classifier which will provide high accuracy for the minority class without severely jeopardizing the accuracy of the majority class…Data level solutions are pre-process tasks which are applied to balance the skewed distributions directly. These solutions which can be used simply, are divided into over-sampling and under-sampling techniques…”)
electrically generating, by the one or more processors, a second data set by (i) sampling a second set of samples from a distribution using the sample rate (Abdi, page 242, “We will use equation (4) to generate new synthetic examples
for the minority classes.” and “Over-sampling rate parameter             
                
                        O
                        r
                        a
                        t
                        e
                    
                        i
                    
         which is the size difference between the number of majority and minority class instances, is computed in step 14” – The new “synthetic examples “ are the “second set of samples” They are sampled from a “distribution” defined by equation (4) (the Mahalanobis distance ellipsoid). The entire process is controlled by the generated             
                
                        O
                        r
                        a
                        t
                        e
                    
                        i
                    
         (the sample rate), which dictates the quantity of samples created.) and (ii) combining the second set of samples with the first data set (Abdi, page 243 “After choosing suitable examples for over-sampling, they are normalized and mapped to PC space. The new synthetic samples are generated according to equation (4) and transformed
to the original space. Finally, they are added by their class mean. In Fig. 4, the resulting samples together with majority and minority class instances are illustrated.”  - the “generated samples” are “added” to the dataset and are shown together with the original data (majority and minority class instances), which create the final augmented “second data set.” The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.)
electrically providing, by the one or more processors, a portion of the second data set to a machine learning model (Abdi, page 24, section 4.1 “The performance of our over-sampling method for multiclass imbalanced problems is evaluated using 20 multi-class benchmark data sets… We use three different base classifiers for our experiments… The following classifiers are used: a) C4.5 decision tree… b) RIPPER [14],... c) KNN classifier… As some data sets have very small classes of data, we perform stratified five-fold cross-validation with 10 independent runs to be sure that each fold of test or train contains at least one example of each class.” – experimented on three different types of classifiers using the “over-sampled” datasets. Cross-validation requires splitting the combined data (second data set) into folds, where the training folds (a “portion”) are explicitly provided to the model for the learning process. The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.)
electrically obtaining, by the one or more processors, output of the machine learning model representing a prediction using the portion of the second data set (Abdi, section 4.2 “Traditionally, the most widely used metrics of learning algorithms are accuracy and error rate.. Precision, recall, and F-measure [45] are the most prevalently used single-class measures for two-class problems. We computed these single-class metrics for the minority class, solely. Also, MAUC and Gmean metrics are used for multi-class evaluation. Precision is a measure of exactness, i.e., from among the examples labelled as positive, how many are actually labelled correctly [31].” – All these performance metrics rely on the machine learning model providing a raw score (for MAUC) or a classification label (for Precision/Recall) for the test portion of the data. The score/label is the model’s “output representing a prediction.” The paper’s entire experimental results section (Tables 2-7) is built on successfully obtaining this prediction output from the classifiers using the balanced data. The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.);
electrically comparing, by the one or more processors, the output with a ground truth value included in a portion of the second data set not provided to the machine learning model (Abdi, page 245, section 4.1 “We use three different base classifiers for our experiments… As some data sets have very small classes of data, we perform stratified five-fold cross-validation with 10 independent runs to be sure that each fold of test or train contains at least one example of each class.” –five-fold cross-validation requires splitting the combined data into folds, where one fold (test portion) is explicitly held out and “not provided to the machine learning model” during training.  This test portion  contains the necessary ground truth. Section 4.2 “Precision, recall, and F-measure [45]… Precision is a measure of exactness, i.e., from among the examples labelled as positive, how many are actually labelled correctly [31]. On the other hand, recall is a measure of completeness, i.e., how many examples of the positive class are labelled correctly. F-measure incorporates both measures, precision and recall, to express the trade-off between them.” – the calculation of all performance metrics (Recall, Precision, F-measure) requires direct comparison between the model’s prediction (“output”) and the known correct label (“ground truth value”). The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.);
electrically adjusting, by the one or more processors, the machine learning model using one or more values indicating the comparison between the output and the ground truth value (Abdi, page 245 section 4.1 “We use three different base classifiers for our experiments.. The following classifiers are used: a) C4.5 decision tree [43] which uses normalized information gain splitting criterion from information theory [1]; b) RIPPER [14], Repeated Incremental Pruning to Produce Error Reduction, is a well-known rule-based algorithm… c) KNN classifier, which is an instanced based or lazy learning algorithm”  and page 248, col. 1, last paragraph “We can infer that over-sampling the minority class instances properly increases the overall performance of a classifiers.” – “Pruning” is a specific mechanism for “adjusting the machine learning model” (removing or simplifying rules). The adjustment is explicitly tied to “Error Reduction” (minimizing the comparison value). The decision tree is built recursively finding the best split. This best split is determined by a gain/purity metric (e.g., Information Gain), which is a value calculated from the comparison of the predicted classes vs. ground truth labels. The only way to increase performance during training is for the model to adjust  its internal parameters in response to measured error (the value indicating the comparison). The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.).
However, Abdi does not teach, but Lin teaches the limitations:
electrically obtaining, by one or more processors in the computer system, a first data set that includes data of multiple attribute levels including a minority level and a majority level (Lin, paragraph [0065] “The process begins when the computer receives an input to balance an imbalanced training data set that includes a majority data class and a minority data class, such as imbalanced training data set 118 that includes majority data class 130 and minority data class 132 in FIG. 1 (step 402).” The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.);
electrically extracting, by the one or more processors, one or more samples from the first data set representing the minority level, wherein the minority level includes a plurality of samples (Lin, paragraph [0065] “Afterward, the computer generates a set of positive data samples for the minority data class of the imbalanced training data set (step 404). The computer may draw positive data samples from a data distribution that covers a feasible sample space, such as sample data space 202 in FIG. 2. “ and paragraph [0066] “ The computer uses the expectation maximization algorithm to estimate the Gaussian mixture model and uses the known data samples from the minority data class to constrain the Gaussian mixture model. In particular, the computer leverages the known data samples from the minority data class to calculate the appropriate number of Gaussian kernels to be included in the Gaussian mixture model.” and [0030] “ Calculated maximum Mahalanobis distance 124 is a threshold Mahalanobis distance score with regard to generated data samples within sample data space…The Mahalanobis distance identifies a degree of similarity between the generated data samples for minority data class 132 and the recorded data samples associated with majority data class 130.” The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.);
electrically generating, by the one or more processors, one or more values representing a distance between a sample of the plurality of samples of the minority level and the one or more samples of the plurality of samples of the minority level (Lin, paragraph [0067] “The computer also calculates a maximum Mahalanobis distance, such as calculated maximum Mahalanobis distance 124 in FIG. 1, for the set of positive data samples (step 408). Further, the computer calculates a Mahalanobis distance from each data sample in the set of positive data samples to a center of each of the number of Gaussian kernels, such as center 212, 214, and 216 in FIG. 2, within the Gaussian mixture model (step 410). Then, the computer determines a minimum Mahalanobis distance from each data sample in the set of positive data samples to the center of a nearest Gaussian kernel in the number of Gaussian kernels (step 412). The minimum Mahalanobis distance may be, for example, calculated minimum Mahalanobis distance 126 in FIG. 1. The scoring method may be expressed as, for example…where x is the positive data sample to score; J is the number of Gaussian kernels to be included in the Gaussian mixture model; and μ.sub.j and S.sub.j are the mean vector parameter and the covariance matrix parameter, respectively, for each Gaussian kernel.”);
electrically generating, by the one or more processors, a sample rate for the minority level using the one or more values representing the distance (Abdi, page 249, conclusion “In MDO method, the synthetic samples are generated in such a way that they have the same Mahalanobis distance from their corresponding class mean.” and page 242, last paragraph “Over-sampling rate parameter                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                 which is the size difference between the number of majority and minority class instances, is computed in step 14.” – teaches determining an oversampling rate parameter (Orate) that controls the number of synthetic minority samples generated. Abdi further teaches generating synthetic samples constrained by Mahalanobis distance. Lin teaches intra-minority distance measurements. It would have been obvious to use Lin’s distance measurements as criterion for setting or informing the oversampling rate with Abdi’s framework in order to decide where and how much to oversample, thereby improving bias reduction. The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.);
However, Abdi in view of Lin does not teach, but Abdi in view of Lin further in view of Chopra teaches the following limitations:
wherein adjusting the machine learning model comprises: training the machine learning model using one or more training algorithms, including a gradient descent algorithm (Chopra, paragraph [0038] “ In order to train a deep neural network, weights of various neurons illustrated in the example architecture 200 are iteratively adjusted by the training module 110 during training of the untrained model 114, with the goal of minimizing the model's loss function for its specified task. In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically”);
determining one or more gradients based on differences between the output and the ground truth value (Chopra, paragraph [0017] “where the model's parameters are modified following each warm-up iteration according to a task-specific loss determined based on a difference between the model's predicted outputs and the ground truth data.” [0038] “In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically as set forth in Equation 2.” – determines a loss based on differences between predicted outputs and corresponding ground truth data and optimizes that loss using gradient descent, which involves determining gradients of the loss. Accordingly, Chopra teaches determining one or more gradients based on differences between the model output and corresponding ground truth values.);
updating one or more weights or parameters of the machine learning model based on the determined gradients (Chopra, paragraph [0038] “In order to train a deep neural network, weights of various neurons illustrated in the example architecture 200 are iteratively adjusted by the training module 110 during training of the untrained model 114, with the goal of minimizing the model's loss function for its specified task. In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically as set forth in Equation 2.”); and
continuing training iterations until the output achieves a threshold accuracy (Chopra, paragraph [0039] “Weights of the untrained model 114 are then iteratively updated by the training module 110 to improve the untrained model 114's performance during training until the model's output(s) from the input data 118 achieves a threshold difference relative to the ground truth data 120. “ – discloses continuing iterative training until a threshold performance condition relative to ground truth is achieved.),
wherein the threshold accuracy is defined by a degree of similarity between the output and the ground truth value (Chopra, paragraph [0041] “ FIG. 3 illustrates an example system 300 useable to generate a trained model 106, which is representative of the untrained model 114 being trained to generate outputs from input data 118 that are within a threshold similarity to ground truth data 120”).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention, having Abdi, Lin, and Chopra before them, to use Lin’s intra-minority distance measurements as a criterion for setting the minority sampling rate taught by Abdi within Abdi’s bias-reduction pipeline in order to decide where and how much to oversample, thereby yielding more reliable bias reduction with equal or lower training cost, and to train the resulting machine-learning model using conventional gradient-based optimization techniques as taught by Chopra, including gradient descent with iterative weight updates based on gradients of a loss determined from differences between predicted outputs and corresponding ground-truth values, and continuing training until a threshold performance condition relative to the ground truth is achieved. One would have been motivated to incorporate such gradient-descent training techniques in order to optimize a model trained on the bias-adjusted dataset, as gradient-based optimization was a well-known and routinely used approach for training neural networks and other machine-learning models. The combination would have yielded predictable results in producing a trained model using the bias-reduced dataset.
Regarding claim 2, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Abdi further teaches:
wherein the second data set includes more samples representing the minority level than the first data set (Abdi, Introduction, col. 2  “In over-sampling, the number of instances in minority classes increases to reach a desired level of balance.” – the number of instances for the minority class increases in the newly created dataset).
Regarding claim 3, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Abdi further teaches wherein generating the one or more values representing the distance between the sample of the  plurality of the samples of the minority level and the one or more samples of the plurality of samples of the minority level comprises:
generating one or more values representing an average of each feature describing each sample of the first data set (Abdi, page 241, section 3 “The proposed over-sampling technique is based on Mahalanobis distance [39]. The idea is that “we can generate new synthetic instances which have the same Mahalanobis distance from their corresponding class mean. Fig. 1 indicates a simulated two dimensional data. The circle and ellipse contours represent equal Euclidean and Mahalanobis distance to the centre point of the considered class, respectively.” – see equation (1).);
generate one or more values representing a correlation of each feature describing each sample of the first data set (Abdi, page 241, section 3 “The circle and ellipse contours represent equal Euclidean and Mahalanobis distance to the centre point of the considered class, respectively. In calculation of Mahalanobis distance, the correlations of the data are also taken into account and in this way, Mahalanobis distance differs from Euclidean distance. The Mah The Mahalanobis distance of a multivariate vector                     
                        x
                        =
                        
                                                x
                                            
                                                1
                                            
                                        ,
                                        
                                                x
                                            
                                                2
                                            
                                        ,
                                        
                                                x
                                            
                                                3
                                            
                                        ,
                                        …
                                        ,
                                        
                                                x
                                            
                                                d
                                            
                                T
                            
                 from a group of values with mean                     
                        μ
                        =
                        
                                                μ
                                            
                                                1
                                            
                                        ,
                                        
                                                μ
                                            
                                                2
                                            
                                        ,
                                        
                                                μ
                                            
                                                3
                                            
                                        ,
                                        …
                                        ,
                                        
                                                μ
                                            
                                                d
                                            
                                T
                            
                 and covariance matrix S is defined as… In this way, the synthetic samples are generated toward the variation of the corresponding class and preserve the covariance structure of the original minority class samples.”).
Regarding claim 4, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 3, therefore is rejected for the same reasons as those presented for claim 3, mutatis mutandis, Abdi further teaches:
wherein the one or more values representing the correlation include a covariance matrix (Abdi, page 241, section 3 “The circle and ellipse contours represent equal Euclidean and Mahalanobis distance to the centre point of the considered class, respectively. In calculation of Mahalanobis distance, the correlations of the data are also taken into account and in this way, Mahalanobis distance differs from Euclidean distance. The Mah The Mahalanobis distance of a multivariate vector                     
                        x
                        =
                        
                                                x
                                            
                                                1
                                            
                                        ,
                                        
                                                x
                                            
                                                2
                                            
                                        ,
                                        
                                                x
                                            
                                                3
                                            
                                        ,
                                        …
                                        ,
                                        
                                                x
                                            
                                                d
                                            
                                T
                            
                 from a group of values with mean                     
                        μ
                        =
                        
                                                μ
                                            
                                                1
                                            
                                        ,
                                        
                                                μ
                                            
                                                2
                                            
                                        ,
                                        
                                                μ
                                            
                                                3
                                            
                                        ,
                                        …
                                        ,
                                        
                                                μ
                                            
                                                d
                                            
                                T
                            
                 and covariance matrix S is defined as… In this way, the synthetic samples are generated toward the variation of the corresponding class and preserve the covariance structure of the original minority class samples.”).
Regarding claim 5, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 3, therefore is rejected for the same reasons as those presented for claim 3, mutatis mutandis, Abdi further teaches:
generating a transposition of the one or more values indicating the distance (Abdi, page 241 section 3 “The circle and ellipse contours represent equal Euclidean and Mahalanobis distance to the centre point of the considered class, respectively…The Mahalanobis distance of a multivariate vector… from a group of values with mean…and covariance matrix S is
defined as:             
                
                        D
                    
                        M
                    
                        x
                    
                =
                 
                                    x
                                    -
                                    μ
                                
                            T
                        
                            S
                        
                            -
                            1
                        
                            x
                            -
                            μ
                        
         “ – The vector term             
                (
                x
                -
                μ
                )
            
         represents the distance value (the difference between the sample and the mean). The formula explicitly requires and generates the term             
                
                                x
                                -
                                μ
                            
                        T
                    
        , which is the transposition of the distance vector             
                (
                x
                -
                μ
                )
            
        , to perform the required matrix multiplication. The Mahalanobis Distance cannot be calculated without this transposed value.).
Regarding claim 6, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Abdi further teaches wherein generating the sample rate for the minority level using the one or more values representing the distance comprises:
combining a portion of the one or more values representing the distance (Abdi, page 241 section 3 “The circle and ellipse contours represent equal Euclidean and Mahalanobis distance to the centre point of the considered class, respectively…The Mahalanobis distance of a multivariate vector… from a group of values with mean…and covariance matrix S is
defined as:             
                
                        D
                    
                        M
                    
                        x
                    
                =
                 
                                    x
                                    -
                                    μ
                                
                            T
                        
                            S
                        
                            -
                            1
                        
                            x
                            -
                            μ
                        
         “ – the Mahalanobis Distance value,             
                D
                
                        x
                        ,
                        μ
                    
         is the result of combining a portion of the distance vector             
                
                        x
                        -
                        μ
                    
        . Since             
                D
                
                        x
                        ,
                        μ
                    
         is a value representing the distance that governs the distribution for oversampling, its necessary calculation (combination) satisfies this claim’s limitation. Page 242, col. 2, last paragraph “Over-sampling rate parameter             
                
                        O
                        r
                        a
                        t
                        e
                    
                        i
                    
         which is the size difference between the number of majority and minority class instances, is computed in step 14.” – The calculated sample rate (            
                
                        O
                        r
                        a
                        t
                        e
                    
                        i
                    
                )
            
         is then applied to the MDO function, which is entirely constrained by the combined Mahalanobis Distance value. Thus, this sample rate generation process incorporates the required combination.);
modifying the combination of the portion of the one or more values using a reciprocal of a combination of the one or more values representing the distance (Abdi, page 241 section 3 “The circle and ellipse contours represent equal Euclidean and Mahalanobis distance to the centre point of the considered class, respectively…The Mahalanobis distance of a multivariate vector… from a group of values with mean…and covariance matrix S is
defined as:             
                
                        D
                    
                        M
                    
                        x
                    
                =
                 
                                    x
                                    -
                                    μ
                                
                            T
                        
                            S
                        
                            -
                            1
                        
                            x
                            -
                            μ
                        
         “ – The term             
                
                        S
                    
                        -
                        1
                    
         (inverse of the covariance matrix). The covariance matrix (S) is a combination of the feature correlations (distance values). Its inverse             
                
                        S
                    
                        -
                        1
                    
        , is the reciprocal of that combination. The term             
                
                                x
                                -
                                μ
                            
                        T
                    
                        x
                        -
                        μ
                    
         is a combination of the distance vectors (Euclidean distance squared). This combination is modified by multiplying it by the reciprocal term             
                
                        S
                    
                        -
                        1
                    
        . This multiplication is what transforms the simple Euclidean distance into the Mahalanobis Distance.);
generating a matrix of values including the sample rate for the minority level using the modified combination of the portion of the one or more values (Abdi, page 241 section 3 “The circle and ellipse contours represent equal Euclidean and Mahalanobis distance to the centre point of the considered class, respectively…The Mahalanobis distance of a multivariate vector… from a group of values with mean…and covariance matrix S is defined as:                     
                        
                                D
                            
                                M
                            
                                x
                            
                        =
                         
                                            x
                                            -
                                            μ
                                        
                                    T
                                
                                    S
                                
                                    -
                                    1
                                
                                    x
                                    -
                                    μ
                                
                 “ and Page 243 “In steps 16 to 18 of the Algorithm 1, synthetic samples which are generated in PC space, are transformed to the original space and they are added by the calculated class mean.” – Each new synthetic sample is a vector of features (matrix of values). The Algorithm 1 generates this matrix and transforms it for use.  Page 242, col. 2, last paragraph “Over-sampling rate parameter                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                 which is the size difference between the number of majority and minority class instances, is computed in step 14.” – the sample rate                     
                        
                                (
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                        )
                    
                 directly dictates how many of these matrices (synthetic samples) are generated. The entire generation process for the sample is constrained by the modified combination – the Mahalanobis Distance which is the value that determines the boundaries of where the new sample matrix can be generated.).
Regarding claim 7, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Abdi further teaches:
generating a second sample rate for a second minority level of the first data set or the majority level using the one or more values representing the distance, wherein the second sample rate is less than the sample rate for the minority level (Abdi, page 242, “Over-sampling rate parameter                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                 which is the size difference between the number of majority and minority class instances, is computed in step 14.” At step 14: “                    
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                        =
                         
                                n
                            
                                m
                                a
                                j
                            
                        -
                        
                                n
                            
                                i
                            
                .” -                     
                        
                                n
                            
                                m
                                a
                                j
                            
                (Majority count): This is the fixed count of the largest class in the dataset. The                     
                        
                                n
                            
                                i
                            
                 (Minority Class Count): In this paper’s multi-class context, this value is the count of the specific minority class                     
                        (
                        i
                        )
                    
                 currently addressed by the algorithm. The MDO algorithm is designated for Multi-class imbalanced problems. It must calculate a unique                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                , for every minority class. Calculating                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                2
                            
                 for the second minority class (where                    
                         
                        i
                    
                 = 2) satisfies “generating a second sample rate for a second minority level.” The smaller the minority class count (                    
                        
                                n
                            
                                i
                            
                ), the larger the required                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                . Therefore, if Minority class 2 has more samples than Minority Class 1 (meaning                     
                        
                                n
                            
                                2
                            
                        >
                         
                                n
                            
                                2
                            
                ), the resulting second sample rate will be mathematically less than the first sample rate:                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                2
                            
                        <
                        
                                O
                                r
                                a
                                t
                                e
                            
                                1
                            
                .).
Regarding claim 8, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Abdi further teaches:
wherein the minority level represents one or more attributes with a likelihood of bias higher than one or more attributes represented by the majority level in the first data set, (Abdi, Introduction “IN recent years, with the accelerated developments in science and technology and availability of data, there is a need for more robust and accurate learning algorithms. Existence of imbalanced distributions among these data is very prevalent. In fact, a data set with unequal number of instances for different classes is called imbalanced data set. This skewness in the data underlying distribution causes many problems for typical machine learning algorithms. In particular, correctly classifying the minority class instances is a main issue in processing these data sets.” – The term “skewness” is the direct technical equivalent of “bias” in a data distribution. The problem exists because the minority class’s attributes are severely underrepresented, creating a higher likelihood of data bias that favors the majority class’s attributes.) and wherein a bias of the one or more attributes of the minority level in the second data set is less than the one or more attributes of the minority level in the first data set (Abdi, Introduction, col. 2 “In over-sampling, the number of instances in minority classes increases to reach a desired level of balance.” – The act of achieving a “desired level of balance” in the “second data set” means the distribution is no longer severely skewed. A balanced (less skewed) distribution is, by definition, one where the data bias against the minority class is less than it was in the original, imbalanced data set (“first dataset”).).
Regarding claim 9, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 8, therefore is rejected for the same reasons as those presented for claim 8, mutatis mutandis, Abdi further teaches:
wherein a higher bias indicates a greater likelihood of inaccurate results from the machine learning model (Abdi, Introduction “This skewness in the data underlying distribution causes many problems for typical machine learning algorithms. In particular, correctly classifying the minority class instances is a main issue in processing these data sets. Simply said, the key point of learning is to obtain a classifier which will provide high accuracy for the minority class without severely jeopardizing the accuracy of the majority class [31].” – the “skewness” (bias) is the cause and the “problems for typical machine learning algorithms” (poor performance, high error) are the effect.).
Regarding claim 10, Abdi teaches:
electrically generating, by the one or more processors, a second data set by (i) sampling a second set of samples from a distribution using the sample rate (Abdi, page 242, “We will use equation (4) to generate new synthetic examples
for the minority classes.” and “Over-sampling rate parameter             
                
                        O
                        r
                        a
                        t
                        e
                    
                        i
                    
         which is the size difference between the number of majority and minority class instances, is computed in step 14” – The new “synthetic examples “ are the “second set of samples” They are sampled from a “distribution” defined by equation (4) (the Mahalanobis distance ellipsoid). The entire process is controlled by the generated             
                
                        O
                        r
                        a
                        t
                        e
                    
                        i
                    
         (the sample rate), which dictates the quantity of samples created.) and (ii) combining the second set of samples with the first data set (Abdi, page 243 “After choosing suitable examples for over-sampling, they are normalized and mapped to PC space. The new synthetic samples are generated according to equation (4) and transformed
to the original space. Finally, they are added by their class mean. In Fig. 4, the resulting samples together with majority and minority class instances are illustrated.”  - the “generated samples” are “added” to the dataset and are shown together with the original data (majority and minority class instances), which create the final augmented “second data set.”)
electrically providing, by the one or more processors, a portion of the second data set to a machine learning model (Abdi, page 24, section 4.1 “The performance of our over-sampling method for multiclass imbalanced problems is evaluated using 20 multi-class benchmark data sets… We use three different base classifiers for our experiments… The following classifiers are used: a) C4.5 decision tree… b) RIPPER [14],... c) KNN classifier… As some data sets have very small classes of data, we perform stratified five-fold cross-validation with 10 independent runs to be sure that each fold of test or train contains at least one example of each class.” – experimented on three different types of classifiers using the “over-sampled” datasets. Cross-validation requires splitting the combined data (second data set) into folds, where the training folds (a “portion”) are explicitly provided to the model for the learning process.)
electrically obtaining, by the one or more processors, output of the machine learning model representing a prediction using the portion of the second data set (Abdi, section 4.2 “Traditionally, the most widely used metrics of learning algorithms are accuracy and error rate.. Precision, recall, and F-measure [45] are the most prevalently used single-class measures for two-class problems. We computed these single-class metrics for the minority class, solely. Also, MAUC and Gmean metrics are used for multi-class evaluation. Precision is a measure of exactness, i.e., from among the examples labelled as positive, how many are actually labelled correctly [31].” – All these performance metrics rely on the machine learning model providing a raw score (for MAUC) or a classification label (for Precision/Recall) for the test portion of the data. The score/label is the model’s “output representing a prediction.” The paper’s entire experimental results section (Tables 2-7) is built on successfully obtaining this prediction output from the classifiers using the balanced data.);
electrically comparing, by the one or more processors, the output with a ground truth value included in a portion of the second data set not provided to the machine learning model (Abdi, page 245, section 4.1 “We use three different base classifiers for our experiments… As some data sets have very small classes of data, we perform stratified five-fold cross-validation with 10 independent runs to be sure that each fold of test or train contains at least one example of each class.” –five-fold cross-validation requires splitting the combined data into folds, where one fold (test portion) is explicitly held out and “not provided to the machine learning model” during training.  This test portion  contains the necessary ground truth. Section 4.2 “Precision, recall, and F-measure [45]… Precision is a measure of exactness, i.e., from among the examples labelled as positive, how many are actually labelled correctly [31]. On the other hand, recall is a measure of completeness, i.e., how many examples of the positive class are labelled correctly. F-measure incorporates both measures, precision and recall, to express the trade-off between them.” – the calculation of all performance metrics (Recall, Precision, F-measure) requires direct comparison between the model’s prediction (“output”) and the known correct label (“ground truth value”).);
electrically adjusting, by the one or more processors, the machine learning model using one or more values indicating the comparison between the output and the ground truth value (Abdi, page 245 section 4.1 “We use three different base classifiers for our experiments.. The following classifiers are used: a) C4.5 decision tree [43] which uses normalized information gain splitting criterion from information theory [1]; b) RIPPER [14], Repeated Incremental Pruning to Produce Error Reduction, is a well-known rule-based algorithm… c) KNN classifier, which is an instanced based or lazy learning algorithm”  and page 248, col. 1, last paragraph “We can infer that over-sampling the minority class instances properly increases the overall performance of a classifiers.” – “Pruning” is a specific mechanism for “adjusting the machine learning model” (removing or simplifying rules). The adjustment is explicitly tied to “Error Reduction” (minimizing the comparison value). The decision tree is built recursively finding the best split. This best split is determined by a gain/purity metric (e.g., Information Gain), which is a value calculated from the comparison of the predicted classes vs. ground truth labels. The only way to increase performance during training is for the model to adjust  its internal parameters in response to measured error (the value indicating the comparison).).
However, Abdi does not teach, but Lin teaches the limitations:
A non-transitory computer-readable medium storing one or more instructions executable by a computer system to perform operations comprising (Lin, paragraph [0014] “ Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing…a computer readable storage medium may be any tangible medium that can store a program for use by or in connection with an instruction execution system, apparatus, or device.”):
one or more processors: a memory, wherein the memory is electrically coupled with the one or more processors (Lin, paragraph [0022] “data processing system 100 includes communications fabric 102, which provides communications between processor unit 104, memory 106, persistent storage 108, communications unit 110, input/output (I/O) unit 112, and display 114.”:
electrically obtaining, by the one or more processors, a first data set that includes data of multiple attribute levels including a minority level and a majority level (Lin, paragraph [0065] “The process begins when the computer receives an input to balance an imbalanced training data set that includes a majority data class and a minority data class, such as imbalanced training data set 118 that includes majority data class 130 and minority data class 132 in FIG. 1 (step 402).”);
electrically extracting, by the one or more processors, one or more samples from the first data set representing the minority level (Lin, paragraph [0065] “Afterward, the computer generates a set of positive data samples for the minority data class of the imbalanced training data set (step 404). The computer may draw positive data samples from a data distribution that covers a feasible sample space, such as sample data space 202 in FIG. 2. “ and paragraph [0066] “ The computer uses the expectation maximization algorithm to estimate the Gaussian mixture model and uses the known data samples from the minority data class to constrain the Gaussian mixture model. In particular, the computer leverages the known data samples from the minority data class to calculate the appropriate number of Gaussian kernels to be included in the Gaussian mixture model.” and [0030] “ Calculated maximum Mahalanobis distance 124 is a threshold Mahalanobis distance score with regard to generated data samples within sample data space…The Mahalanobis distance identifies a degree of similarity between the generated data samples for minority data class 132 and the recorded data samples associated with majority data class 130.”);
electrically generating, by the one or more processors, one or more values representing a distance between a sample of the minority level and the one or more samples of the minority level (Lin, paragraph [0067] “The computer also calculates a maximum Mahalanobis distance, such as calculated maximum Mahalanobis distance 124 in FIG. 1, for the set of positive data samples (step 408). Further, the computer calculates a Mahalanobis distance from each data sample in the set of positive data samples to a center of each of the number of Gaussian kernels, such as center 212, 214, and 216 in FIG. 2, within the Gaussian mixture model (step 410). Then, the computer determines a minimum Mahalanobis distance from each data sample in the set of positive data samples to the center of a nearest Gaussian kernel in the number of Gaussian kernels (step 412). The minimum Mahalanobis distance may be, for example, calculated minimum Mahalanobis distance 126 in FIG. 1. The scoring method may be expressed as, for example…where x is the positive data sample to score; J is the number of Gaussian kernels to be included in the Gaussian mixture model; and μ.sub.j and S.sub.j are the mean vector parameter and the covariance matrix parameter, respectively, for each Gaussian kernel.”);
electrically generating, by the one or more processors, a sample rate for the minority level using the one or more values representing the distance (Abdi, page 249, conclusion “In MDO method, the synthetic samples are generated in such a way that they have the same Mahalanobis distance from their corresponding class mean.” and page 242, last paragraph “Over-sampling rate parameter                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                 which is the size difference between the number of majority and minority class instances, is computed in step 14.” – teaches determining an oversampling rate parameter (Orate) that controls the number of synthetic minority samples generated. Abdi further teaches generating synthetic samples constrained by Mahalanobis distance. Lin teaches intra-minority distance measurements. It would have been obvious to use Lin’s distance measurements as criterion for setting or informing the oversampling rate with Abdi’s framework in order to decide where and how much to oversample, thereby improving bias reduction. The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.);
electrically generating, by the one or more processors, a sample rate for the minority level using the one or more values representing the distance (Abdi, page 249, conclusion “In MDO method, the synthetic samples are generated in such a way that they have the same Mahalanobis distance from their corresponding class mean.” and page 242, last paragraph “Over-sampling rate parameter                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                 which is the size difference between the number of majority and minority class instances, is computed in step 14.” – teaches determining an oversampling rate parameter (Orate) that controls the number of synthetic minority samples generated. Abdi further teaches generating synthetic samples constrained by Mahalanobis distance. Lin teaches intra-minority distance measurements. It would have been obvious to use Lin’s distance measurements as criterion for setting or informing the oversampling rate with Abdi’s framework in order to decide where and how much to oversample, thereby improving bias reduction. The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.);
However, Abdi in view of Lin does not teach, but Abdi in view of Lin further in view of Chopra teaches the following limitations:
wherein adjusting the machine learning model comprises: training the machine learning model using one or more training algorithms, including a gradient descent algorithm (Chopra, paragraph [0038] “ In order to train a deep neural network, weights of various neurons illustrated in the example architecture 200 are iteratively adjusted by the training module 110 during training of the untrained model 114, with the goal of minimizing the model's loss function for its specified task. In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically”);
determining one or more gradients based on differences between the output and the ground truth value (Chopra, paragraph [0017] “where the model's parameters are modified following each warm-up iteration according to a task-specific loss determined based on a difference between the model's predicted outputs and the ground truth data.” [0038] “In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically as set forth in Equation 2.” – determines a loss based on differences between predicted outputs and corresponding ground truth data and optimizes that loss using gradient descent, which involves determining gradients of the loss. Accordingly, Chopra teaches determining one or more gradients based on differences between the model output and corresponding ground truth values.);
updating one or more weights or parameters of the machine learning model based on the determined gradients (Chopra, paragraph [0038] “In order to train a deep neural network, weights of various neurons illustrated in the example architecture 200 are iteratively adjusted by the training module 110 during training of the untrained model 114, with the goal of minimizing the model's loss function for its specified task. In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically as set forth in Equation 2.”); and
continuing training iterations until the output achieves a threshold accuracy (Chopra, paragraph [0039] “Weights of the untrained model 114 are then iteratively updated by the training module 110 to improve the untrained model 114's performance during training until the model's output(s) from the input data 118 achieves a threshold difference relative to the ground truth data 120. “ – discloses continuing iterative training until a threshold performance condition relative to ground truth is achieved.),
wherein the threshold accuracy is defined by a degree of similarity between the output and the ground truth value (Chopra, paragraph [0041] “ FIG. 3 illustrates an example system 300 useable to generate a trained model 106, which is representative of the untrained model 114 being trained to generate outputs from input data 118 that are within a threshold similarity to ground truth data 120”).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention, having Abdi, Lin, and Chopra before them, to use Lin’s intra-minority distance measurements as a criterion for setting the minority sampling rate taught by Abdi within Abdi’s bias-reduction pipeline in order to decide where and how much to oversample, thereby yielding more reliable bias reduction with equal or lower training cost, and to train the resulting machine-learning model using conventional gradient-based optimization techniques as taught by Chopra, including gradient descent with iterative weight updates based on gradients of a loss determined from differences between predicted outputs and corresponding ground-truth values, and continuing training until a threshold performance condition relative to the ground truth is achieved. One would have been motivated to incorporate such gradient-descent training techniques in order to optimize a model trained on the bias-adjusted dataset, as gradient-based optimization was a well-known and routinely used approach for training neural networks and other machine-learning models. The combination would have yielded predictable results in producing a trained model using the bias-reduced dataset.
Regarding claim 11, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 10, therefore is rejected for the same reasons as those presented for claim 10, the claim recites similar limitations corresponding to claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Regarding claim 12, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 10, therefore is rejected for the same reasons as those presented for claim 10, the claim recites similar limitations corresponding to claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Regarding claim 13, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 12, therefore is rejected for the same reasons as those presented for claim 12, the claim recites similar limitations corresponding to claim 4 and is rejected for similar reasons as claim 4 using similar teachings and rationale.
Regarding claim 14, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 12, therefore is rejected for the same reasons as those presented for claim 12, the claim recites similar limitations corresponding to claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale.
Regarding claim 15, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 10, therefore is rejected for the same reasons as those presented for claim 10, the claim recites similar limitations corresponding to claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale.
Regarding claim 16, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 10, therefore is rejected for the same reasons as those presented for claim 10, the claim recites similar limitations corresponding to claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale.
Regarding claim 17, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 10, therefore is rejected for the same reasons as those presented for claim 10, the claim recites similar limitations corresponding to claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale.
Regarding claim 18, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 17, therefore is rejected for the same reasons as those presented for claim 17, the claim recites similar limitations corresponding to claim 9 and is rejected for similar reasons as claim 9 using similar teachings and rationale.
Regarding claim 19, Abdi teaches:
electrically generating, by the one or more processors, a second data set by (i) sampling a second set of samples from a distribution using the sample rate (Abdi, page 242, “We will use equation (4) to generate new synthetic examples
for the minority classes.” and “Over-sampling rate parameter             
                
                        O
                        r
                        a
                        t
                        e
                    
                        i
                    
         which is the size difference between the number of majority and minority class instances, is computed in step 14” – The new “synthetic examples “ are the “second set of samples” They are sampled from a “distribution” defined by equation (4) (the Mahalanobis distance ellipsoid). The entire process is controlled by the generated             
                
                        O
                        r
                        a
                        t
                        e
                    
                        i
                    
         (the sample rate), which dictates the quantity of samples created.) and (ii) combining the second set of samples with the first data set (Abdi, page 243 “After choosing suitable examples for over-sampling, they are normalized and mapped to PC space. The new synthetic samples are generated according to equation (4) and transformed
to the original space. Finally, they are added by their class mean. In Fig. 4, the resulting samples together with majority and minority class instances are illustrated.”  - the “generated samples” are “added” to the dataset and are shown together with the original data (majority and minority class instances), which create the final augmented “second data set.”)
electrically providing, by the one or more processors, a portion of the second data set to a machine learning model (Abdi, page 24, section 4.1 “The performance of our over-sampling method for multiclass imbalanced problems is evaluated using 20 multi-class benchmark data sets… We use three different base classifiers for our experiments… The following classifiers are used: a) C4.5 decision tree… b) RIPPER [14],... c) KNN classifier… As some data sets have very small classes of data, we perform stratified five-fold cross-validation with 10 independent runs to be sure that each fold of test or train contains at least one example of each class.” – experimented on three different types of classifiers using the “over-sampled” datasets. Cross-validation requires splitting the combined data (second data set) into folds, where the training folds (a “portion”) are explicitly provided to the model for the learning process.)
electrically obtaining, by the one or more processors, output of the machine learning model representing a prediction using the portion of the second data set (Abdi, section 4.2 “Traditionally, the most widely used metrics of learning algorithms are accuracy and error rate.. Precision, recall, and F-measure [45] are the most prevalently used single-class measures for two-class problems. We computed these single-class metrics for the minority class, solely. Also, MAUC and Gmean metrics are used for multi-class evaluation. Precision is a measure of exactness, i.e., from among the examples labelled as positive, how many are actually labelled correctly [31].” – All these performance metrics rely on the machine learning model providing a raw score (for MAUC) or a classification label (for Precision/Recall) for the test portion of the data. The score/label is the model’s “output representing a prediction.” The paper’s entire experimental results section (Tables 2-7) is built on successfully obtaining this prediction output from the classifiers using the balanced data.);
electrically comparing, by the one or more processors, the output with a ground truth value included in a portion of the second data set not provided to the machine learning model (Abdi, page 245, section 4.1 “We use three different base classifiers for our experiments… As some data sets have very small classes of data, we perform stratified five-fold cross-validation with 10 independent runs to be sure that each fold of test or train contains at least one example of each class.” –five-fold cross-validation requires splitting the combined data into folds, where one fold (test portion) is explicitly held out and “not provided to the machine learning model” during training.  This test portion  contains the necessary ground truth. Section 4.2 “Precision, recall, and F-measure [45]… Precision is a measure of exactness, i.e., from among the examples labelled as positive, how many are actually labelled correctly [31]. On the other hand, recall is a measure of completeness, i.e., how many examples of the positive class are labelled correctly. F-measure incorporates both measures, precision and recall, to express the trade-off between them.” – the calculation of all performance metrics (Recall, Precision, F-measure) requires direct comparison between the model’s prediction (“output”) and the known correct label (“ground truth value”).);
electrically adjusting, by the one or more processors, the machine learning model using one or more values indicating the comparison between the output and the ground truth value (Abdi, page 245 section 4.1 “We use three different base classifiers for our experiments.. The following classifiers are used: a) C4.5 decision tree [43] which uses normalized information gain splitting criterion from information theory [1]; b) RIPPER [14], Repeated Incremental Pruning to Produce Error Reduction, is a well-known rule-based algorithm… c) KNN classifier, which is an instanced based or lazy learning algorithm”  and page 248, col. 1, last paragraph “We can infer that over-sampling the minority class instances properly increases the overall performance of a classifiers.” – “Pruning” is a specific mechanism for “adjusting the machine learning model” (removing or simplifying rules). The adjustment is explicitly tied to “Error Reduction” (minimizing the comparison value). The decision tree is built recursively finding the best split. This best split is determined by a gain/purity metric (e.g., Information Gain), which is a value calculated from the comparison of the predicted classes vs. ground truth labels. The only way to increase performance during training is for the model to adjust  its internal parameters in response to measured error (the value indicating the comparison).).
However, Abdi does not teach, but Lin teaches the limitations:
A system, comprising: one or more processors; and machine-readable media interoperably coupled with the one or more processors and storing one or more instructions that, when executed by the one or more processors, perform operations comprising (Lin, paragraph [0014] “A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing…any tangible medium that can store a program for use by or in connection with an instruction execution system, apparatus, or device.” Paragraph [0023] “ Processor unit 104 serves to execute instructions for software applications or programs that may be loaded into memory 106. Processor unit 104 may be a set of one or more processors or may be a multi-processor core, depending on the particular implementation.”):
electrically obtaining, by the one or more processors,  a first data set that includes data of multiple attribute levels including a minority level and a majority level (Lin, paragraph [0065] “The process begins when the computer receives an input to balance an imbalanced training data set that includes a majority data class and a minority data class, such as imbalanced training data set 118 that includes majority data class 130 and minority data class 132 in FIG. 1 (step 402).”);
electrically extracting, by the one or more processors, one or more samples from the first data set representing the minority level (Lin, paragraph [0065] “Afterward, the computer generates a set of positive data samples for the minority data class of the imbalanced training data set (step 404). The computer may draw positive data samples from a data distribution that covers a feasible sample space, such as sample data space 202 in FIG. 2. “ and paragraph [0066] “ The computer uses the expectation maximization algorithm to estimate the Gaussian mixture model and uses the known data samples from the minority data class to constrain the Gaussian mixture model. In particular, the computer leverages the known data samples from the minority data class to calculate the appropriate number of Gaussian kernels to be included in the Gaussian mixture model.” and [0030] “ Calculated maximum Mahalanobis distance 124 is a threshold Mahalanobis distance score with regard to generated data samples within sample data space…The Mahalanobis distance identifies a degree of similarity between the generated data samples for minority data class 132 and the recorded data samples associated with majority data class 130.”);
electrically generating, by the one or more processors, one or more values representing a distance between a sample of the minority level and the one or more samples of the minority level (Lin, paragraph [0067] “The computer also calculates a maximum Mahalanobis distance, such as calculated maximum Mahalanobis distance 124 in FIG. 1, for the set of positive data samples (step 408). Further, the computer calculates a Mahalanobis distance from each data sample in the set of positive data samples to a center of each of the number of Gaussian kernels, such as center 212, 214, and 216 in FIG. 2, within the Gaussian mixture model (step 410). Then, the computer determines a minimum Mahalanobis distance from each data sample in the set of positive data samples to the center of a nearest Gaussian kernel in the number of Gaussian kernels (step 412). The minimum Mahalanobis distance may be, for example, calculated minimum Mahalanobis distance 126 in FIG. 1. The scoring method may be expressed as, for example…where x is the positive data sample to score; J is the number of Gaussian kernels to be included in the Gaussian mixture model; and μ.sub.j and S.sub.j are the mean vector parameter and the covariance matrix parameter, respectively, for each Gaussian kernel.”);
electrically generating, by the one or more processors, a sample rate for the minority level using the one or more values representing the distance (Abdi, page 249, conclusion “In MDO method, the synthetic samples are generated in such a way that they have the same Mahalanobis distance from their corresponding class mean.” and page 242, last paragraph “Over-sampling rate parameter                     
                        
                                O
                                r
                                a
                                t
                                e
                            
                                i
                            
                 which is the size difference between the number of majority and minority class instances, is computed in step 14.” – teaches determining an oversampling rate parameter (Orate) that controls the number of synthetic minority samples generated. Abdi further teaches generating synthetic samples constrained by Mahalanobis distance. Lin teaches intra-minority distance measurements. It would have been obvious to use Lin’s distance measurements as criterion for setting or informing the oversampling rate with Abdi’s framework in order to decide where and how much to oversample, thereby improving bias reduction. The recitation that the operations are performed electrically by one or more processors merely reflects generic computer implementation of the disclosed functionality and is inherent in the operation of the computing systems taught by the cited references.);
However, Abdi in view of Lin does not teach, but Abdi in view of Lin further in view of Chopra teaches the following limitations:
wherein adjusting the machine learning model comprises: training the machine learning model using one or more training algorithms, including a gradient descent algorithm (Chopra, paragraph [0038] “ In order to train a deep neural network, weights of various neurons illustrated in the example architecture 200 are iteratively adjusted by the training module 110 during training of the untrained model 114, with the goal of minimizing the model's loss function for its specified task. In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically”);
determining one or more gradients based on differences between the output and the ground truth value (Chopra, paragraph [0017] “where the model's parameters are modified following each warm-up iteration according to a task-specific loss determined based on a difference between the model's predicted outputs and the ground truth data.” [0038] “In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically as set forth in Equation 2.” – determines a loss based on differences between predicted outputs and corresponding ground truth data and optimizes that loss using gradient descent, which involves determining gradients of the loss. Accordingly, Chopra teaches determining one or more gradients based on differences between the model output and corresponding ground truth values.);
updating one or more weights or parameters of the machine learning model based on the determined gradients (Chopra, paragraph [0038] “In order to train a deep neural network, weights of various neurons illustrated in the example architecture 200 are iteratively adjusted by the training module 110 during training of the untrained model 114, with the goal of minimizing the model's loss function for its specified task. In some implementations, the retrospective loss system 104 is configured to optimize a deep neural network's loss function using a gradient descent algorithm. Mathematically, updating weights during each iteration of a gradient descent algorithm can be represented mathematically as set forth in Equation 2.”); and
continuing training iterations until the output achieves a threshold accuracy (Chopra, paragraph [0039] “Weights of the untrained model 114 are then iteratively updated by the training module 110 to improve the untrained model 114's performance during training until the model's output(s) from the input data 118 achieves a threshold difference relative to the ground truth data 120. “ – discloses continuing iterative training until a threshold performance condition relative to ground truth is achieved.),
wherein the threshold accuracy is defined by a degree of similarity between the output and the ground truth value (Chopra, paragraph [0041] “ FIG. 3 illustrates an example system 300 useable to generate a trained model 106, which is representative of the untrained model 114 being trained to generate outputs from input data 118 that are within a threshold similarity to ground truth data 120”).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention, having Abdi, Lin, and Chopra before them, to use Lin’s intra-minority distance measurements as a criterion for setting the minority sampling rate taught by Abdi within Abdi’s bias-reduction pipeline in order to decide where and how much to oversample, thereby yielding more reliable bias reduction with equal or lower training cost, and to train the resulting machine-learning model using conventional gradient-based optimization techniques as taught by Chopra, including gradient descent with iterative weight updates based on gradients of a loss determined from differences between predicted outputs and corresponding ground-truth values, and continuing training until a threshold performance condition relative to the ground truth is achieved. One would have been motivated to incorporate such gradient-descent training techniques in order to optimize a model trained on the bias-adjusted dataset, as gradient-based optimization was a well-known and routinely used approach for training neural networks and other machine-learning models. The combination would have yielded predictable results in producing a trained model using the bias-reduced dataset.
Regarding claim 20, Abdi in view of Lin further in view of Chopra, as outlined above, all the elements of claim 19, therefore is rejected for the same reasons as those presented for claim 19, the claim recites similar limitations corresponding to claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daravanh Phakousonh whose telephone number is (571)272-6324. The examiner can normally be reached Mon - Thurs 7 AM - 5 PM, Every other Friday 7 AM - 4PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Daravanh Phakousonh/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
MULTILEVEL OVERSAMPLER

This examiner grants 50% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MULTILEVEL OVERSAMPLER

This examiner grants 50% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email