Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 1/29/2026 has been entered.
Response to Arguments
The 112 rejections are withdrawn, due to amendments.
The Remarks filed 1/29/2026 are unpersuasive with respect to the 103 rejections.
Applicant argues, that Jiang doesn’t teach preprocessing the data before training. Remarks 12. Jiang shows that this reweighting to adjust for bias happens before training on the data, see Fig. 1 desc. “Our main contribution is providing a procedure that appropriately weights examples in the dataset, and then showing that training on the resulting loss corresponds to training on the original, true, unbiased labels.” (emphasis added).
Applicant argues Frtunikj does not teach determining a count of a feature-label combination relating a sensitive feature to a label. Remarks 13-14. Sensitive features are data that are misrepresented (biased) in the dataset. Frtunikj teaches how to count misrepresentations, and Jiang teaches how to adjust for misrepresentations on sensitive features – Jiang calls sensitive features a protected class.
Applicant argues that Frtunikj uses “its counts for a fundamentally different purpose… Frtunikj counts object detection classes to identify underrepresented classes in the training dataset, then uses those count to assign importance scores…” Remarks 14. Frtunikj teaches the counting to determine bias. Jiang teaches adjusting for bias.
Applicant argues that the references aren’t directed to generating training data. Jiang regenerates the training set, in fig. 1, and Frtunikj generates metadata that is used to validate training data in Fig. 2. That metadata is generated and that metadata is part of any training data that gets used for training by Frtunikj.
Applicant argues that the motivation to combine does not relate to Jiang’s fairness context. The motivation to combine is to make the model less likely to return false positives. Fairness requires not generating false classifications based on biased data.
Applicant argues Jiang has no need for Frtunikj’s counting approach. Many papers and patents use several algorithms to balance biased data.1 Also Jiang preprocess data before training, see Jiang Fig. 1.
Applicant argues, the combination of Jiang and Frtunikj does not teach “a count of sensitive feature-label combinations…” Examiner disagrees. Frtunikj paragraph 35’s event class count is the count, and event class count by time is frequency analysis. And the sensitive feature-label is taught by the labeled example of the protected class G, Jiang sec. 4.2.
Applicant repeats some of these argument in page 16-17, they are likewise rebutted.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 11-14 and 21-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because Applicant claims computer readable media, which include transitory media, which is not patent-eligible subject matter.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 2, 4-6, 8, 10-14, 16-18 and 20-23 are rejected under 35 U.S.C. 103 as being unpatentable over Identifying and Correcting Label Bias in Machine Learning by Jiang et al and US 20220164602 A1 to Frtunikj et al.
Claims 9 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Identifying and Correcting Label Bias in Machine Learning by Jiang et al, US 20220164602 A1 to Frtunikj et al and US 20210241033 A1 to Yang.
Jiang teaches claims 1 and 11. A computerized system, the computerized system comprising:
determining from training data, at a bias reducing machine learning engine and prior to training, (Jiang sec. 4.2 “the idea is that if the positive prediction rate for a protected class G is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G, then this will encourage the classifier to increase its accuracy on the positively labeled examples in G, while the accuracy on the negatively labeled examples of G may fall. Either of these two events will cause the positive prediction rate on G to increase, and thus bring h closer to the true, unbiased label function.” The sensitive feature is G, the positive/negative label is the label value. Jiang shows that this reweighting to adjust for bias happens before training on the data, see Fig. 1 desc. “Our main contribution is providing a procedure that appropriately weights examples in the dataset, and then showing that training on the resulting loss corresponds to training on the original, true, unbiased labels.”)
determining a loss adjustment weight based on the count of the feature-label combination; (Jiang sec. 4.2 “if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G…” Increase/decrease is the loss adjustment.)
applying the loss adjustment weight to a loss function to generate an adjusted loss function; (Jiang sec. 4.2 “we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G, then this will encourage the classifier to increase its accuracy on the positively labeled examples in G, while the accuracy on the negatively labeled examples of G may fall. Either of these two events will cause the positive prediction rate on G to increase, and thus bring h closer to the true, unbiased label function.”)
training a machine learning model using the adjusted loss function to generate an adjusted machine learning model; (Jiang sec. 4.2, increasing or decreasing weights is training a model.)
deploying the adjusted machine learning model for use in a computing application. (Jiang sec. 7.4 “We see that our method consistently leads to more fair classifiers, often yielding a classifier with the lowest test violation out of all methods.”) Jiang doesn’t teach the computer system and count/frequency measurements.
Pre-processing doesn’t teach pre-processing before training.
However, Frtunikj teaches A computerized system, the computerized system comprising: at least one computer processor; and computer memory storing computer-useable instructions that, when used by the at least one computer processor, cause the at least one computer processor to perform operations comprising: (Frtunikj fig. 7)
determining from training data, at a bias reducing machine learning engine and prior to training, a count of a feature-label combination (Frtunikj para 35 “the one or more trends may be used to identify one or more characteristics in the plurality labeled data logs of the training dataset that are underrepresented. Examples of such characteristics can include… an event class count; an event class count by time, an event class count by location, an instance count by environmental conditions…” Event class count is the count, and event class count by time is frequency analysis. Fig. 1 shows that this analysis is completed before training step 112.)
Jiang, the claims and Frtunikj are all directed to generating training data. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to consider feature-label count in order to make the machine learning model “less likely to” return false positives. Frtunikj para 35.
Jiang teaches claim 2. The computerized system of claim 1, the operations further comprising converting the training data into a table or a vector configured to associate a group of the sensitive feature to a corresponding label, and wherein the loss adjustment weight is determined based on a statistical analysis of the table or vector. (Jiang fig. 1 “our loss” is the loss adjustment based on a statistical analysis of the vector [x,y].)
Jiang teaches claims 4, 13 and 18. The computerized system of claim 1, wherein the (Jiang fig. 1 “Our main contribution is providing a procedure that appropriately weights examples in the dataset, and then showing that training on the resulting loss corresponds to training on the original, true, unbiased labels.”) Jiang doesn’t teach a feature-label count.
However, Frtunikj teaches count of the feature-label combination is determined based on a frequency of the sensitive feature relative to the label. (Frtunikj para 35 “the one or more trends may be used to identify one or more characteristics in the plurality labeled data logs of the training dataset that are underrepresented. Examples of such characteristics can include… an event class count; an event class count by time, an event class count by location, an instance count by environmental conditions…” Event class count is the count, and event class count by time is frequency analysis.)
Jiang teaches claim 5. The computerized system of claim 1, wherein the sensitive feature comprises a gender feature, a race feature, an age feature, a socioeconomic feature, a geographical location feature, or a health feature. (Jiang sec. 7.1 ProPublicas COMPAS dataset description “The task is to predict recidivism based on criminal history, jail and prison time, demographics, and risk scores. The protected groups are two race-based (Black, White) and two gender-based (Male, Female).”)
Jiang teaches claim 6. The computerized system of claim 1, wherein the operations comprise: determining the loss function based on the machine learning model, wherein the loss adjustment weight is determined based on the loss function (Jiang sec. 4.2 “we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G, then this will encourage the classifier to increase its accuracy on the positively labeled examples in G, while the accuracy on the negatively labeled examples of G may fall. Either of these two events will cause the positive prediction rate on G to increase, and thus bring h closer to the true, unbiased label function.”) Jiang doesn’t teach a feature-label count.
However, Frtunikj teaches a count of a feature-label combination. (Frtunikj para 35 “the one or more trends may be used to identify one or more characteristics in the plurality labeled data logs of the training dataset that are underrepresented. Examples of such characteristics can include… an event class count; an event class count by time, an event class count by location, an instance count by environmental conditions…” Event class count is the count, and event class count by time is frequency analysis.)
Jiang teaches claims 8 and 20. The computerized system of claim 1, wherein the adjusted machine learning model is deployed to an abstraction layer of a client device or a server device, wherein the abstraction layer comprises at least one of an operating system layer, an application layer, or a hardware layer. (Jiang sec. 7.4 “We see that our method consistently leads to more fair classifiers, often yielding a classifier with the lowest test violation out of all methods.” Deploying the classifier is deploying the model to an abstraction layer of a client or server device.)
Jiang teaches claim 9. The computerized system of claim 1, wherein the operations comprise causing presentation of a graphical user (Jiang Fig. 2 shows a graphic for its data.)
However, Yang teaches comprising (i) a first control configured to receive a first user input indicative of the sensitive feature and (ii) a second control configured to receive a second user input indicative of the label. (Yang para 63 “the user may user a graphical user interface (GUI) to define which parameter of the target input serves as the sensitive parameter.” Yang para 46 “the user is using an application for creation of training data…” The training data includes labels. The user device, where the application is, includes a user interface (Fig. 1 108) with a display (GUI).)
Yang, Jiang and the claims are all concerned with label bias. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to allow a user to select sensitive features in order to correct “labels of the target inputs affected by label bias…” Yang abs.
Jiang teaches claim 10. The computerized system of claim 1, wherein the operations are performed without receipt of client-side code. (Jiang fig. 1 shows no receipt of client side code.)
Jiang teaches claim 11.
determine, at a bias reducing machine learning engine and from training data, a (Jiang sec. 4.2 “the idea is that if the positive prediction rate for a protected class G is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G, then this will encourage the classifier to increase its accuracy on the positively labeled examples in G, while the accuracy on the negatively labeled examples of G may fall. Either of these two events will cause the positive prediction rate on G to increase, and thus bring h closer to the true, unbiased label function.” The sensitive feature is G, the positive/negative label is the label value.)
determine a loss adjustment weight by applying a statistical test to quantify bias, the loss adjustment weight determined based on a quantified bias; (Jiang sec. 4.2 “the idea is that if the positive prediction rate for a protected class G is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G…” Increase/decrease is the loss adjustment. The quantified bias is the determined positive prediction rate for the sensitive feature being lower than an overall positive prediction rate.)
apply the loss adjustment weight to a loss function to generate an adjusted loss function; (Jiang sec. 4.2 “we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G, then this will encourage the classifier to increase its accuracy on the positively labeled examples in G, while the accuracy on the negatively labeled examples of G may fall. Either of these two events will cause the positive prediction rate on G to increase, and thus bring h closer to the true, unbiased label function.”)
train a machine learning model using the adjusted loss function to generate an adjusted machine learning model; (Jiang sec. 4.2, increasing or decreasing weights is training a model.)
deploy the adjusted machine learning model for use in a computing application. (Jiang sec. 7.4 “We see that our method consistently leads to more fair classifiers, often yielding a classifier with the lowest test violation out of all methods.”) Jiang doesn’t teach the computer system and count/frequency measurements.
However, Frtunikj teaches One or more computer-storage media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause the processor to: (Frtunikj fig. 7)
determine, at a bias reducing machine learning engine and from training data, a count of a feature-label combination relating a sensitive feature to a label; (Frtunikj para 35 “the one or more trends may be used to identify one or more characteristics in the plurality labeled data logs of the training dataset that are underrepresented. Examples of such characteristics can include… an event class count; an event class count by time, an event class count by location, an instance count by environmental conditions…” Event class count is the count, and event class count by time is frequency analysis.)
Jiang, the claims and Frtunikj are all directed to generating training data. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to consider feature-label count in order to make the machine learning model “less likely to” return false positives. Frtunikj para 35.
Jiang teaches claim 12. The computer-storage media of claim 11, wherein the instructions further cause the processor to convert the training data into a table or vector configured to associate a group of the sensitive feature to a corresponding label, and wherein the loss adjustment weight is determined based on a statistical analysis of the table or vector, the statistical analysis comprising (Jiang fig. 1 “our loss” is the loss adjustment based on a statistical analysis of the vector [x,y].) Jiang doesn’t teach chi-squared.
However, Yang teaches a a chi-squared test or Fisher's exact test. (Yang para 63 “using formatting that defines the sensitive parameter (e.g., first column in a table storing the training dataset is defined as the sensitive parameter).” Yang para 72 “At 306, an analysis is performed for determining whether the statistically significant difference exists between target inputs assigned to the sensitive group and other target inputs excluded from the sensitive group.” Yang para 73 “when the target output is assigned one or several values from a pre-defined set (i.e., the output is categorical), then a Chi-squared test may be used.”)
Yang, Jiang and the claims are all concerned with label bias. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use a chi-squared test in order to correct “labels of the target inputs affected by label bias…” Yang abs.
Jiang teaches claim 14. The computer-storage media of claim 11, wherein the (Jiang sec. 4.1 “which states that training a classifier on examples with biased labels weighted by w(x, y) is equivalent to training a classifier on examples labelled according to the true, unbiased labels.” W(x,y) is a numerical transformation, see equations 1 and 2 of theorem 1 in sec. 4.1 of Jiang) Jiang doesn’t teach a feature-label count.
However, Frtunikj teaches a count of a feature-label combination. (Frtunikj para 35 “the one or more trends may be used to identify one or more characteristics in the plurality labeled data logs of the training dataset that are underrepresented. Examples of such characteristics can include… an event class count; an event class count by time, an event class count by location, an instance count by environmental conditions…” Event class count is the count, and event class count by time is frequency analysis.)
Jiang teaches claim 16. A computer-implemented method, comprising:
accessing training data; (Jian sec. 4.2 “(3) compute the weights for each sample based on these multipliers using the closed-form provided by Proposition 1;…”)
training a machine learning model based on the training data; (Jian sec. 4.2 “(3) compute the weights for each sample based on these multipliers using the closed-form provided by Proposition 1;…”)
evaluating the machine learning model, wherein evaluating the machine learning model comprises:
determining a based on a (Jiang sec. 4.2 “the idea is that if the positive prediction rate for a protected class G is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G, then this will encourage the classifier to increase its accuracy on the positively labeled examples in G, while the accuracy on the negatively labeled examples of G may fall. Either of these two events will cause the positive prediction rate on G to increase, and thus bring h closer to the true, unbiased label function.” The sensitive feature is G, the positive/negative label is the label value.)
determining a loss adjustment weight by applying a statistical test to quantify bias, the loss adjustment weight determined based on a quantified bias; (Jiang sec. 4.2 “the idea is that if the positive prediction rate for a protected class G is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G…” Increase/decrease is the loss adjustment. The quantified bias is the determined positive prediction rate for the sensitive feature being lower than an overall positive prediction rate.)
applying the loss adjustment weight to a loss function of the machine learning model to generate an adjusted loss function configured to reduce an error attributed to the sensitive feature; (Jiang sec. 4.2 “we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G, then this will encourage the classifier to increase its accuracy on the positively labeled examples in G, while the accuracy on the negatively labeled examples of G may fall. Either of these two events will cause the positive prediction rate on G to increase, and thus bring h closer to the true, unbiased label function.”)
re-training a machine learning model using the adjusted loss function to generate an adjusted machine learning model; and (Jiang sec. 4.2 “retrain the classifier given these weights.”)
deploying the adjusted machine learning model. (Jiang sec. 7.4 “We see that our method consistently leads to more fair classifiers, often yielding a classifier with the lowest test violation out of all methods.”) Jiang doesn’t teach a feature-label count.
However, Frtunikj teaches determining a count of a feature-label combination relating a sensitive feature to a label based on a frequenc (Frtunikj para 35 “the one or more trends may be used to identify one or more characteristics in the plurality labeled data logs of the training dataset that are underrepresented. Examples of such characteristics can include… an event class count; an event class count by time, an event class count by location, an instance count by environmental conditions…” Event class count is the count, and event class count by time is frequency analysis.)
Jiang, the claims and Frtunikj are all directed to generating training data. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to consider feature-label count in order to make the machine learning model “less likely to” return false positives. Frtunikj para 35.
Jiang teaches claim 17. The computer-implemented method of claim 16, wherein the machine learning model is re-trained until an adjusted loss output by the adjusted loss function satisfies a loss threshold. (Jiang sec. F.1 Post-calibration “first trains without consideration for fairness, and then determines appropriate thresholds for the protected groups such that fairness is satisfied in training.” Jiang fig. 1 “training on the resulting loss corresponds to training on the original, true, unbiased labels.” Jiang sec. 4.2 “(4) retrain the classifier given these weights.”)
Jiang teaches claim 21. (Currently Amended) The computer-storage media of claim 11, wherein the statistical test to quantify bias comprises analyzing frequencies of labels for different values of the sensitive feature to quantify bias. (Jiang sec. 4.2 “the idea is that if the positive prediction rate for a protected class G is lower than the overall positive prediction rate, then the corresponding coefficient should be increased…” Comparing rates is analyzing frequencies. The protected class is the sensitive feature. The bias is the coefficient.)
Jiang teaches claim 22. (Currently Amended) The computer storage media of claim 11, wherein the loss adjustment weight is determined based on a the feature-label combination, and the loss adjustment weight varies inversely with the (The proportionality to a degree of bias is not defined in the specification. Jiang sec. 4.2 “if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G…” Increase/decrease is the loss adjustment, and it’s based on a lower positive prediction rate for the protected class compared to the overall positive prediction rate.)
Jiang doesn’t teach the count.
However Frtunikj teaches the count. (Frtunikj para 35 “the one or more trends may be used to identify one or more characteristics in the plurality labeled data logs of the training dataset that are underrepresented. Examples of such characteristics can include… an event class count; an event class count by time, an event class count by location, an instance count by environmental conditions…” Event class count is the count, and event class count by time is frequency analysis.)
Jiang teaches claim 23. (Currently Amended) The computer storage media of claim 22, wherein the loss adjustment weight comprises a coefficient, scalar or multiplier applied to the loss function. (The proportionality is not mentioned or defined in the specification. Jiang sec. 4.2 “the idea is that if the positive prediction rate for a protected class G is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G…”)
Jiang teaches claim 24. (Currently Amended) The computer storage media of claim 11, wherein the statistical test comprises a (Statistical significance value is not mentioned or defined in the specification. Distributional difference is not mentioned or defined in the specification. The proportionality is not mentioned or defined in the specification. Jiang sec. 4.2 “the idea is that if the positive prediction rate for a protected class G is lower than the overall positive prediction rate, then the corresponding coefficient should be increased; i.e., if we increase the weights of the positively labeled examples of G and decrease the weights of the negatively labeled examples of G…” Increase/decrease is the loss adjustment. The quantified bias is the determined positive prediction rate for the sensitive feature being lower than an overall positive prediction rate. The different combinations and their weights are the different protected and unprotected classes and their coefficients.)
Jiang doesn’t teach chi-squared.
However, Yang teaches a chi-squared test or Fisher's exact test. (Yang para 63 “using formatting that defines the sensitive parameter (e.g., first column in a table storing the training dataset is defined as the sensitive parameter).” Yang para 72 “At 306, an analysis is performed for determining whether the statistically significant difference exists between target inputs assigned to the sensitive group and other target inputs excluded from the sensitive group.” Yang para 73 “when the target output is assigned one or several values from a pre-defined set (i.e., the output is categorical), then a Chi-squared test may be used.”)
Yang, Jiang and the claims are all concerned with label bias. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to allow a user to select sensitive features in order to correct “labels of the target inputs affected by label bias…” Yang abs.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AUSTIN HICKS/Primary Examiner, Art Unit 2142
1 “AIF360currently contains 9 bias mitigation algorithms that span these three categories.” https://arxiv.org/pdf/1810.01943