Prosecution Insights
Last updated: May 29, 2026
Application No. 17/342,588

PREVENTING DATA VULNERABILITIES DURING MODEL TRAINING

Non-Final OA §103
Filed
Jun 09, 2021
Examiner
HAN, JOSEP
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
Verizon Patent and Licensing Inc.
OA Round
5 (Non-Final)
38%
Grant Probability
At Risk
5-6
OA Rounds
0m
Est. Remaining
62%
With Interview

Examiner Intelligence

Grants only 38% of cases
38%
Career Allowance Rate
6 granted / 16 resolved
-17.5% vs TC avg
Strong +25% interview lift
Without
With
+25.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
14 currently pending
Career history
50
Total Applications
across all art units

Statute-Specific Performance

§101
6.8%
-33.2% vs TC avg
§103
81.1%
+41.1% vs TC avg
§102
9.9%
-30.1% vs TC avg
§112
0.8%
-39.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 16 resolved cases

Office Action

§103
Detailed Action The following action is in response to the communication(s) received on 01/13/2026. As of the claims filed on the above date: Claims 1, 8, and 15 have been amended. Claims 1-3, 6-10, 13-16, and 19-26 are pending. Claims 1, 8, and 15 are independent claims. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/13/2026 has been entered. Response to Arguments Applicant’s arguments filed 01/13/2026 have been fully considered. Examiner’s response is set forth below. With respect to the new matter rejections under 35 USC § 112, the amendments have overcome the new matter rejection. Thus, the new matter rejection has been withdrawn. With respect to the patent eligibility rejections under 35 USC § 101, Applicant’s arguments are persuasive. Thus, the patent eligibility rejections have been withdrawn. With respect to the rejections under 35 USC § 103: Applicant asserts that Perkins addresses a fundamentally different technical problem and that Perkins eliminates “low information features” instead of addressing training data and integrity. (p.11 ¶3) Examiner respectfully submits that the distinction between the two interpretations is not shown as currently recited in the claims. In other words, the claims do not recite how eliminating the low information features is not addressing training data and integrity. Applicant further asserts that Perkins does not remove the training data example from training entirely and that Perkins only removes the features from consideration by the model (p.11 last ¶). Examiner respectfully submits that Perkins explicitly teaches this limitation: ([p.1 ¶2] Eliminating low information features gives your model clarity by removing noisy data; [Calculating Information Gain] “select only those words that appear in the set”, [code lines 66-74] “best”, “bestwords”, “best_words_feats), since the function “best_words_feats” removes irrelevant/low-ranked features and is used to train the model, thus removes the data from training entirely in that model. Applicant further asserts that Perkins does not teach the technical distinction of “vulnerable data.” (p.12 2nd ¶) Examiner respectfully submits that there is no concrete definition of “vulnerable data” in the Specification. [0064] of the Specification merely recites: “In one embodiment, these common features can be used to reduce the total amount of labeled examples processed using method 400… Thus, the use of common features enables massive filtering of labeled examples to only identify those potentially vulnerable examples.” Additionally, no amendment has been made to specify the broadest reasonable interpretation of vulnerability of data in machine learning reflected on the claim language. Thus, the technical distinction of “vulnerable data” cannot be read into the present claims. Applicant further asserts that Perkins does not teach a common feature dictionary including features present in both the first and second feature dictionaries (p.12 ¶3). This is unpersuasive, as this is not taught by Perkins but rather by Li, via the method of identifying shared feature center of the groups (fig.1). Applicant further asserts that Perkins does not modify or remove entire training examples from the training dataset based on vulnerability analysis (p.12 last ¶). Similar to above, the distinction is not clearly reflected in the claims to suggest that Perkins does not teach this limitation. Applicant further asserts that Li addresses a different technical problem from the present claims (p.13 ¶1). Examiner respectfully disagrees, as the detection of shared features is a method in machine learning in which a person having ordinary skill in the art could reasonably implement in removing uninformative data. Applicant further asserts that Stack merely teaches sorted frequency counting and not the substantive technical elements of the claims. Examiner respectfully submits that the technical elements of the claim regarding informative word identification is taught by Perkins, while Stack more explicitly teaches the ranking threshold used to analyze the data in dictionaries. Additionally, the prior arts should not be read separately from the combination identified in the art rejection. Applicant further asserts that the prior arts are combined using impermissible hindsight. Examiner respectfully submits that the analogous endeavors and motivations to combine have been identified in the previous and current Office Actions. As currently recited, the claims can be broadly read such that Perkins teaches removing uninformative data, Li identifies shared features between two classes in a machine learning endeavor, and Stack teaches the method of ranking dictionaries in a shared natural language processing endeavor. Thus, the prior art remains teaching the present invention. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claims 1-6, 8-12, 15, 16-22, 25, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Jacob Perkins et al, “Text Classification for Sentiment Analysis - Eliminate Low Information Features” (hereinafter Perkins), further in view of Stack, “Sorted Word frequency count using python” (hereinafter Stack), hereinafter Li, “Group Based Deep Shared Feature Learning for Fine-grained Image Classification” (hereinafter Li). Regarding Claim 1, Perkins teaches: A method comprising: receiving, by a processor, a first set of importance features and a second set of importance features used to detect potentially compromised training data, the first set of importance features associated with a first label and identifying first features used by a machine learning (ML) model to classify data with the first label, and the second set of importance features associated with a second label and identifying second features used by the ML model to classify data with the second label; (Perkins [p.1 ¶2] Eliminating low information features gives your model clarity by removing noisy data. [Calculating Information Gain] “To use it, first we need to calculate a few frequencies for each word: its overall frequency and its frequency within each class. This is done with a FreqDist for overall frequency of words, and a ConditionalFreqDist where the conditions are the class labels.”, [code line 13,14] “negfeats”, “posfeats”) [Note: Examiner is interpreting “negfeats” and “posfeats” as the two sets of importance features as disclosed; noisy data corresponds to potentially compromised training data] generating, by the processor, a first feature dictionary based on the first set of importance features and a second feature dictionary based on the second set of importance features, (Perkins [Calculating Information Gain] “To use it, first we need to calculate a few frequencies for each word: its overall frequency and its frequency within each class. This is done with a FreqDist for overall frequency of words, and a ConditionalFreqDist where the conditions are the class labels.” [code lines 44-53] “word_fd”, “label_word_fd”) [Note: Examiner is interpreting “label_word_fd” containing a key “pos” as the first feature dictionary, and the key “neg” as the second feature dictionary] wherein generating includes identifying unique importance features, calculating their occurrence frequency, and storing them…; (Perkins [Calculating Information Gain] “To use it, first we need to calculate a few frequencies for each word: its overall frequency and its frequency within each class. This is done with a FreqDist for overall frequency of words, and a ConditionalFreqDist where the conditions are the class labels.” [code lines 44-53] “word_fd”, “label_word_fd”) (Note: each word corresponds to a unique importance feature; word_fd and label_word_fd correspond to the storage of the calculated occurrence freuquency) identifying, by the processor, a subset of labeled examples in a training dataset used to train the ML model by filtering the training dataset…and determining whether importance features in each labeled example exceed a pre-configured threshold in its corresponding feature dictionary on the first feature dictionary and second feature dictionary; Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set”, [code lines 66-74] “best”, “bestwords”, “best_words_feats) (Note: taking the top 10000 corresponds to filtering via determining to a pre-configured threshold for the respective feature dictionaries) modifying, by the processor, the subset of labeled examples based on the first feature dictionary and second feature dictionary, the modifying generating a modified training data set having reduced data vulnerabilities; (Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set”, [code lines 66-74] “best”, “bestwords”, “best_words_feats) [Note: the function “best_words_feats” removes irrelevant/low-ranked features, which identifies and modifies the training data set used in the classifier, and thus corresponds to a data set that has reduced vulnerabilities] retraining, by the processor, the ML model using the modified training data set. (Perkins [Calculating Information Gain] “Now each file is classified based on the presence of these high information words.”, [code lines 80, 13-20] “evaluate_classifer(best_words_feats)”) and applying, by the processor, the ML model to classify a set of new, unlabeled data examples, wherein the classifying generates a set of predicted labels for the set of new, unlabeled data examples. (Perkins [Calculating Information Gain] “Now each file is classified based on the presence of these high information words.”, [code lines 80, 13-20] “evaluate_classifer(best_words_feats)”) (Note: evaluate_classifier using the best_words_feats corresponds to applying the processor to classify a set of new unlabeled data examples. Perkins does teach storing them, but not in an ordered dictionary. However, Stack further teaches: Storing them in an ordered dictionary (Stack [p.5-7] PNG media_image1.png 172 691 media_image1.png Greyscale PNG media_image2.png 408 536 media_image2.png Greyscale ) Stack and Perkins are analogous to the present invention because both are from the same field of endeavor of natural language processing methods. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the frequency dictionary method from Stack into Perkins’s feature filtering method. The motivation would be to “keeping words in a dictionary and having a count for each of these words” (Stack p.1). Perkins/Stack does not teach, but Li further teaches: and identifying common importance features that appear in both the first and second feature dictionaries and label-specific importance features that appear in only one of the first or second feature dictionaries; (Li [p.2 2nd ¶] we propose a Group based Deep Shared Feature Learning Network (GSFL-Net) that can extract shared as well as discriminative features for fine-grained classification. Our modeling of shared features is based on a new group based learning wherein existing classes are divided into groups and multiple shared feature patterns are discovered (learned). The motivation behind GSFL-Net is shown in Fig. 1: The classes are divided into groups according to the distances between each class specific feature center (mi) in the feature space. In particular, Fig. 1 illustrates 4 classes and 2 groups. The main idea is that by removing the effects brought by the shared feature patterns within each group, the rest of the discriminative components can be quite effective for fine-grained classification. Note that Fig. 1 additionally shows two different kinds of feature centers: labeled m1 through m4 and s1 through s2. mi represents the class-specific feature center, which is computed from the discriminative feature components from the ith class. sj is the shared feature center for the jth group. PNG media_image3.png 475 331 media_image3.png Greyscale ) (Note: the shared components correspond to the common importance features; the feature components from the respective class are only in their respective class and thus correspond to the label-specific importance features.) generating, by the processor, a common feature dictionary including features present in both the first and second feature dictionaries; (Li [p.2 2nd ¶] we propose a Group based Deep Shared Feature Learning Network (GSFL-Net) that can extract shared as well as discriminative features for fine-grained classification. Our modeling of shared features is based on a new group based learning wherein existing classes are divided into groups and multiple shared feature patterns are discovered (learned). The motivation behind GSFL-Net is shown in Fig. 1: The classes are divided into groups according to the distances between each class specific feature center (mi) in the feature space. In particular, Fig. 1 illustrates 4 classes and 2 groups. The main idea is that by removing the effects brought by the shared feature patterns within each group, the rest of the discriminative components can be quite effective for fine-grained classification. Note that Fig. 1 additionally shows two different kinds of feature centers: labeled m1 through m4 and s1 through s2. mi represents the class-specific feature center, which is computed from the discriminative feature components from the ith class. sj is the shared feature center for the jth group. PNG media_image3.png 475 331 media_image3.png Greyscale ) (Note: the collection of shared components correspond to the common feature dictionary) Li and Perkins/Stack are analogous to the present invention because both are from the same field of endeavor of machine learning classification methods. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the method of creating a shared feature centerinto Perkins/Stack’s feature filtering method. The motivation would be “by removing the effects brought by the shared feature patterns within each group, the rest of the discriminative components can be quite effective for fine-grained classification.” (Li p.2 2nd ¶). Regarding Claim 2, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches the method of claim 1, wherein each importance feature is associated with a corresponding confidence value and wherein the method further comprises filtering the first set of importance features and the second set of importance features, the filtering comprising removing an importance feature having a confidence value below a pre-configured threshold (Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set”, [code lines 66-74] “best”, “bestwords”, “best_words_feats) [Note: the confidence value is the chi-squared value below a threshold corresponding to the chi-squared value of the 10000th word.] Regarding Claim 3, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches the method of claim 1, wherein generating a feature dictionary for a respective label and a respective set of importance features comprises: identifying a set of unique importance features for the respective label; calculating a total number of occurrences for each of the unique importance features in the respective set of importance features; ordering the set of unique importance features by the total number of occurrences to generate an ordered set of unique importance features; and storing the ordered set of unique importance features as the feature dictionary, the storing comprising associating each unique importance feature with a corresponding total number of occurrences. (Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set.”) Regarding Claim 4, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches the method of claim 3, wherein the method further comprises generating a common feature dictionary, the common feature dictionary including a set of importance features present in both the first feature dictionary and the second feature dictionary. (Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set.”) [Note: the set of the top 10000 chi-squared words are features that are present in both the first feature dictionary and the second feature dictionary.] Regarding Claim 5, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches the method of claim 4, wherein identifying a subset of labeled examples comprises filtering the labeled examples using the common feature dictionary. (Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set.”) Regarding claim 6, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches the method of claim 1, wherein identifying a subset of labeled examples comprises determining, for a respective labeled example, whether a number of importance features in the respective labeled example appearing in a corresponding feature dictionary exceeds a pre-configured threshold. (Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set”, [code lines 66-74] “best”, “bestwords”, “best_words_feats) [Note: the function “best_words_feats” uses the top 10000 words as the pre-configured threshold to remove irrelevant/low-ranked features, which identifies the subset of labeled examples that contain the importance features] Regarding claim 13, Claim 13 precisely recites the methods of Claim 6, and thus it is rejected for reasons set forth in Claim 6. Independent Claim 8 recites a non-transitory computer-readable storage medium for tangibly storing computer program instructions to perform precisely the methods of Claim 1, and thus it is rejected for reasons set forth in Claim 1. Claims 9-12, dependent on Claim 8, are rejected for reasons set forth in Claims 2-5, respectively. Similarly, independent Claim 15 recites a memory; and one or more processors operatively coupled to the memory to perform precisely the methods of Claim 1, and thus it is rejected for reasons set forth in Claim 1. Claims 16-19, dependent on Claim 15, are rejected for reasons set forth in Claims 2-6, respectively. Regarding Claim 21, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches: The method of claim 1, wherein each importance feature is associated with a corresponding confidence value and wherein generating the feature dictionaries includes filtering importance features based on their confidence values. (Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set”, [code lines 66-74] “best”, “bestwords”, “best_words_feats) (Note: the confidence value is the chi-squared value below a threshold corresponding to the chi-squared value of the 10000th word, thus corresponding to filtering importance features based on their confidence scores) Regarding Claim 22, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches: The method of claim 1, wherein modifying the subset of labeled examples comprises determining whether to one of alter a label or remove an example based on comparative accuracy measurements of the ML model under different modification strategies. (Perkins [Calculating Information Gain] “One of the best metrics for information gain is chi square… Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set”, [code lines 66-74] “best”, “bestwords”, “best_words_feats) (Note: using the chi square metric for sorting by the most informative words corresponds to removing examples based on comparative accuracy measurements. Sorting the words by score corresponds to one modification strategy, and taking the top 10000 corresponds to another modification strategy, thus each corresponding to a different modification strategy. Taking the top 10000 words excludes words which are outside of the top 10000, thus corresponding to removing examples.) Regarding Claim 25, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches: The method of claim 1, wherein generating the first and second feature dictionaries comprises calculating a total number of occurrences for each unique importance feature and ordering the unique importance features based on their total occurrences. (Perkins [Calculating Information Gain] “One of the best metrics for information gain is chi square… To use it, first we need to calculate a few frequencies for each word: its overall frequency and its frequency within each class. This is done with a FreqDist for overall frequency of words, and a ConditionalFreqDist where the conditions are the class labels. Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set.”) (Note: the top 10000 words corresponds to the total number of occurrences; using FreqDist to score using chi square corresponds to basing off of the total occurrences.) Regarding Claim 26, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Li, via Perkins/Stack/Li, further teaches: The method of claim 1, wherein the common feature dictionary is used to reduce a total amount of labeled examples processed by excluding examples that do not include features in the common feature dictionary. (Li PNG media_image3.png 475 331 media_image3.png Greyscale PNG media_image4.png 166 844 media_image4.png Greyscale [p.6 ¶2] A key aspect of our design is learning the values of the class specific feature center mc and the group based shared feature center sj… we perform the updates of the feature centers based on mini-batch instead of updating the centers with respect to the entire training set. In each iteration, the class specific center mc is computed by averaging y shd c s in that batch, namely, mc may not be updated if no y shd c is available) (Note: the shared components correspond to the common feature dictionary; by only updating mc if yshd (outputs of the shared features) is available, performing the updates of the feature centers sj corresponds to reducing the total amount of labeled examples processed.) Claim 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Perkins/Stack/Li in view of A. Bagherjeiran, “Estimating Missing Features to Improve Multimedia Information Retrieval,” (hereinafter Bagherjeiran). Regarding claim 7: Perkins teaches the method of claim 1 (and thus the rejection of Claim 1 is incorporated). Perkins does not teach, but Bagherjeiran teaches, modifying the subset of labeled examples comprises one or more of altering a label of a respective labeled example or removing the respective labeled example from the subset of labeled examples. (Bagherjeiran [p.8 Experiments] “Some documents had no text features. Most of these were errors such as “This page contains characters that cannot be displayed.” or captions in another language. All documents with no words in the feature vector were removed.”) It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Perkins’s feature filtering method to implement the teachings of Bagherjeiran of removing the resulting data with no features. The motivation to do so would be to “evaluate the quality of the retrieval results” (Bagherjeiran p.8 Experiments). Independent Claim 8 recites a non-transitory computer-readable storage medium for tangibly storing computer program instructions to perform precisely the methods of Claim 1. Claim 14, dependent on Claim 8, are rejected for reasons set forth in Claim 7. Regarding Claim 20, Perkins respectively teaches and incorporates the claimed limitations and rejections of Claim 19. Perkins does not teach, but Bagherjeiran further teaches: modifying the subset of labeled examples comprises one or more of altering a label of a respective labeled example or removing the respective labeled example from the subset of labeled examples (Bagherjeiran [p.8 Experiments] “Some documents had no text features. Most of these were errors such as “This page contains characters that cannot be displayed.” or captions in another language. All documents with no words in the feature vector were removed.”) It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Perkins’s feature filtering method to implement the teachings of Bagherjeiran of removing the resulting data with no features. The motivation to do so would be to “evaluate the quality of the retrieval results” (Bagherjeiran p.8 Experiments). Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Perkins/Stack/Li in view of Natarajan et al., “Learning with Noisy Labels,” (hereinafter Natarajan). Regarding Claim 23, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins/Stack/Li does not teach, but Natarajan further teaches: The method of claim 1, wherein modifying the subset of labeled examples comprises altering a label of an example when features of the example have more matches in a feature dictionary associated with a different label than a current label. (Natarajan [Abstract] In this paper, we theoretically study the problem of binary classification in the presence of random classification noise — the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is class-conditional — the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. [p.1 last ¶] In this paper, we consider risk minimization in the presence of class-conditional random label noise (abbreviated CCN). The data consists of iid samples from an underlying “clean” distribution D. The learning algorithm sees samples drawn from a noisy version Dρ of D — where the noise rates depend on the class label. To the best of our knowledge, general results in this setting have not been obtained before. To this end, we develop two methods for suitably modifying any given surrogate loss function ℓ, and show that minimizing the sample average of the modified proxy loss function 1 ˜ℓ leads to provable risk bounds where the risk is calculated using the original loss ℓ on the clean distribution.) (Note: since the labels are flipped with a small probability, there are more matches in the different label than the current label when the learner sees which have been flipped; thus, minimizing the risk of the labels flipped with a small probability corresponds to altering the labels that have more matches in the feature dictionary than the current label) Natarajan and Perkins/Stack/Li are analogous to the present invention because both are from the same field of endeavor of machine learning for modifying training data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the method of modifying the labels of training data from Natarajan into Perkins/Stack/Li’s feature filtering method. The motivation would be to “obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels.” (Natarajan [Abst]) . Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over Perkins/Stack/Li in view of Grove et al., US 10163061 B (hereinafter Grove). Regarding Claim 24, Perkins/Stack/Li respectively teaches and incorporates the claimed limitations and rejections of Claim 1. Perkins, via Perkins/Stack/Li, further teaches The method of claim 1…; modifying the subset of labeled examples…; steps of identifying the subset of labeled examples and modifying the subset (Perkins [Calculating Information Gain] “Once we have those numbers, we can score words with the BigramAssocMeasures.chi_sq function, then sort the words by score and take the top 10000. We then put these words into a set, and use a set membership test in our feature selection function to select only those words that appear in the set”, [code lines 66-74] “best”, “bestwords”, “best_words_feats) Perkins/Stack/Li does not teach, but Grove further teaches: The method of claim 1, further comprising determining whether to retrain the ML model [after some action] and upon determining to retrain, repeating [the steps of the retraining] (Grove, [Fig 3] PNG media_image5.png 530 335 media_image5.png Greyscale ) (Note: element 306 is determining to retrain, and element 310 is the step of repeating.) Grove and Perkins/Stack/Li are analogous to the present invention because both are from the same field of endeavor of machine learning using training data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to implement the retraining method from Grove into Perkins/Stack/Li’s method of feature filtering method. The motivation would be to “When this context consists of a large amount of data, it helps to train an analytics model for it. In a continuously running solution, this model should be kept up-to-date, otherwise quality degrades.” (Grove [col.1 line 17]). Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSEP HAN whose telephone number is (703)756-1346. The examiner can normally be reached Mon-Fri 9am-5pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /J.H./Examiner, Art Unit 2122 /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action

Prosecution Timeline

Show 5 earlier events
May 11, 2025
Response after Non-Final Action
Jun 23, 2025
Non-Final Rejection mailed — §103
Sep 22, 2025
Response Filed
Oct 16, 2025
Final Rejection mailed — §103
Jan 13, 2026
Request for Continued Examination
Jan 25, 2026
Response after Non-Final Action
Mar 03, 2026
Non-Final Rejection mailed — §103
Mar 03, 2026
Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12585965
INTERACTIVE MACHINE-LEARNING FRAMEWORK
3y 11m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6
Expected OA Rounds
38%
Grant Probability
62%
With Interview (+25.0%)
4y 2m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 16 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month