Last updated: May 04, 2026

Application No. 18/459,140

DETECTING POISONED TRAINING DATA FOR ARTIFICIAL INTELLIGENCE MODELS USING VARIABLE CLUSTERING CRITERIA

Non-Final OA §103

Filed

Aug 31, 2023

Examiner

CHEN, ALAN S

Art Unit

2125

Tech Center

2100 — Computer Architecture & Software

Assignee

DELL PRODUCTS, L.P.

OA Round

1 (Non-Final)

Interview Optional

— +6.3% interview lift. Interview lift (+6.3%) is below the 15.0% threshold. A written response is recommended.

Based on 1130 resolved cases, 2023–2026

Examiner Intelligence

CHEN, ALAN S View full profile →

Grants 91% — above average

Career Allowance Rate

1029 granted / 1130 resolved

+36.1% vs TC avg

Moderate +6% lift

Without

With

+6.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

19 currently pending

Career history

1149

Total Applications

across all art units

Statute-Specific Performance

§101

12.7%

-27.3% vs TC avg

§103

20.9%

-19.1% vs TC avg

§102

37.5%

-2.5% vs TC avg

§112

19.9%

-20.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1130 resolved cases

Office Action

§103

Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention. Claims 1 , 9-12 and 17 are rejected under 35 USC 103 as being unpatentable over Universal Detection of Backdoor Attacks via Density-Based Clustering and Centroids Analysis to Guo et al. (hereinafter Guo) in view of Casting out Demons: Sanitizing Training Data for Anomaly Sensors to Cretu et al. (hereinafter Cretu) . Per claim 1, Guo discloses A method of managing an artificial intelligence (AI) model (Abstract… Clustering and Centroids Analysis ( CCA-UD ) is a defen s e method applied to Deep Neural Network (DNN) models to detect and manage backdoor-poisoned training data , "The goal of the proposed defence is to reveal whether a Deep Neural Network model is subject to a backdoor attack by inspecting the training dataset"; Section III.A…"Upon detection of the poisoned samples, Alice may remove them from the training set and use the clean dataset to train a sanitised model") , the method comprising: obtaining a candidate training data set usable to update an instance of the AI model ( Section I V .A…the defender Alice inspects the training dataset D α tr , which is used to train model instance F α , and upon sanitization uses the clean data to update/retrain that model instance, "Upon detection of the poisoned samples, Alice may remove them from the training set and use the clean dataset to train a sanitised model" ; obtaining an inspected training dataset, e.g., a candidate dataset D α tr , that contains potentially poisoned and benign data used to train an instance of the model F α , "Alice aims at revealing the presence of poisoned samples in the training dataset D α tr , if any... Alice may remove them from the training set and use the clean dataset to train a sanitised model" ) ; … performing an analysis of the candidate training data set and a reference … training data set to obtain a score reflecting a likelihood that the candidate training data set comprises poisoned training data (Section I V .B 2 ) ... performing a cluster analysis on the candidate data by computing the centroid of each candidate cluster, calculating its deviation β i k from a benign validation reference dataset D val , and computing a Misclassification Ratio MR i k which serves as the score reflecting the likelihood of poisoned data, "The corresponding misclassification ratio is computed as follows : MR i k = … ") , the analysis using variable clustering criteria to obtain the score (Section I II .B...the analysis uses variable clustering criteria via density-based clustering such as DBSCAN, which dynamically determines cluster sizes and boundaries based on varying spatial sample densities, "DBSCAN splits a set of points into K clusters...where K is automatically determined by counting the areas with high sample density") ; making a first determination regarding whether the score exceeds a score threshold ( Section I V .B 2 ) ... comparing the calculated misclassification ratio score to a predefined score threshold, "For a given threshold θ , if MR i k ≥ 1 - θ ...") ; in a first instance of the first determination in which the score exceeds the score threshold, treating the candidate training data set as comprising poisoned training data (Section I V .B 2 ) ... treating the candidate cluster as poisoned if it meets or exceeds the threshold, "For a given threshold θ , if MR i k ≥ 1 – θ, the corresponding C i k is judged to be poisoned and its elements are added to P i ") ; and in a second instance of the first determination in which the score does not exceed the score threshold, treating the candidate training data set as not comprising poisoned training data ( Section I V .B.2... treating the candidate data as not poisoned if it does not exceed the threshold, "Otherwise, the cluster is considered benign and its elements are added to B i " ) . Guo does not expressly disclose, but with Cretu does teach: identifying a historical training data set, the historical training data set being obtained prior to the candidate training data set and the historical training data set already having been used to train the instance of the AI model (Cretu: Section 2.2...teaches dividing training data chronologically into historical training data subsets ("epochs") obtained prior to incoming data, which have already been used to successfully train prior instances of the model ("micro-models"), to serve as the baseline reference for testing and sanitizing candidate data, "consider a large training dataset T partitioned into a number of smaller disjoint subsets... where md i is the micro-dataset starting at time (i-1) * g...In order to create the ensemble of classifiers, we use each of the “ epochs ” md i to compute a micro-model, M i ") . Guo and Cretu are analogous art because they are from both within the same field of endeavor, specifically cybersecurity for machine learning models and artificial intelligence systems. They address the same problem solving area of identifying, sanitizing, and removing poisoned, anomalous, or malicious data within training datasets to prevent model corruption. Guo cites the necessity of a clean benign validation dataset to assess candidate data (Guo: Section I V .A) , the acquisition of which in dynamic real-world environments without "absolute ground truth" is a core problem identified and solved by the historical baseline training subsets of Cretu ( Cretu: Section 1.1 and 2.2) . Before the effective filing date of the claimed invention, it would have been obvious to a person having ordinary skill in the art to implement Guo's static benign validation dataset using the historical training data subsets taught by Cretu that have already been obtained and used to successfully train prior model instances. The suggestion/motivation for doing so would have been to resolve the well- known difficulty of acquiring an absolute, manually verified ground-truth reference dataset. As expressly recognized by Cretu, " ” ground truth ” for large, realistic data sets is extremely hard to determine" (Cretu: Section 1.1) . By evaluating incoming candidate training data against the AI model's own historically vetted training phases (the "epochs" and "micro-models" of Cretu) , a person having of ordinary skill in the art would predictably enable Guo's centroid-scoring method to operate continuously and reliably without requiring the manual or external acquisition of a pristine reference dataset, thereby fully automating a robust AI data sanitization pipeline. Per claim 9, Guo combined with Cretu discloses claim 1, Guo further disclosing treating the candidate training data as comprising poisoned training data comprises one selected from a list consisting of: removing the candidate training data set from consideration as training data for the AI model ( Guo: Section I V .A... "Alice may remove them from the training set and use the clean dataset to train a sanitised model" ) ; treating the candidate training data set as being part of a malicious attack; discarding the candidate training data set; identifying a data source of the candidate training data set; and treating the data source of the candidate training data set as a potentially malicious data source . Per claim 10 , Guo combined with Cretu discloses claim 1, Guo further disclosing treating the candidate training data set as not comprising poisoned training data comprises one selected from a list consisting of: updating the instance of the AI model using the candidate training data to obtain a new instance of the AI model ( Guo: Section I V .A... "use the clean dataset to train a sanitised model", which constitutes obtaining a new updated instance of the AI model using the approved data ) ; and adding the candidate training data set to the historical training data set to obtain an updated historical training data set . Per claim 11, Guo combined with Cretu discloses claim 1, Cretu further disclosing prior to obtaining the candidate training data set: making an identification that a re-training condition is met for the AI model, wherein the candidate training data set is obtained in response to the identification (Cretu: Section 2.2... training models over consecutive time intervals or "epochs", where “md i is a micro-dataset starting at time (i-1)*g and , g is the granularity for each micro-dataset”, t he passage of the time interval serves as the re-training condition met to obtain the next candidate data set). The rationale to combine this teaching of Cretu with Guo is the same as the parent claim. Claim 12 is substantially similar in scope and spirit as claim 1. Therefore, the rejection of claim 1 is applied accordingly. Claim 1 7 is substantially similar in scope and spirit as claim 1. Therefore, the rejection of claim 1 is applied accordingly. Allowable Subject Matter Claims 2-8, 13-16 and 18-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is the statement of reasons for the indication of allowable subject matter: The prior art disclosed by the applicant and cited by the Examiner fail to teach or suggest, alone or in combination, all the limitations of the independent claims 1, 12 and 17 , further including the particular notable limitations of: performing the analysis comprises: performing, using the variable clustering criteria, a cluster analysis of the historical training data set to obtain a set of clusters; identifying a first data value of the candidate training data set; making a second determination regarding whether the first data value falls within the set of clusters; in a first instance of the second determination in which the first data value falls within the set of clusters: modifying the score to indicate a higher likelihood of the candidate training data set comprising poisoned training data; and in a second instance of the second determination in which the first data value does not fall within the set of clusters: modifying the score to indicate a lower likelihood of a candidate training data set comprising poisoned training data; and approving the first data value for AI model training purposes. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to density-based detection of poisoned training data . Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT ALAN CHEN whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571) 272-4143 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT M-F 10-7 . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FILLIN "SPE Name?" \* MERGEFORMAT Kamran Afshar can be reached at FILLIN "SPE Phone?" \* MERGEFORMAT (571) 272-7796 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent- center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ALAN CHEN/ Primary Examiner, Art Unit 2125

Read full office action

Prosecution Timeline

Aug 31, 2023

Application Filed

Mar 16, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/665,370

Patent 12614068

SYSTEMS AND METHODS FOR TRAINING NEURAL NETWORKS WITH SPARSE DATA

4y 2m to grant Granted Apr 28, 2026

18/146,075

Patent 12614113

USING CONSISTENCY METADATA FOR FILTERING OF MACHINE LEARNING DATA ACROSS JOBS

3y 4m to grant Granted Apr 28, 2026

17/886,055

Patent 12608600

SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION

3y 8m to grant Granted Apr 21, 2026

17/943,176

Patent 12608615

JOINTLY PRUNING AND QUANTIZING DEEP NEURAL NETWORKS

3y 7m to grant Granted Apr 21, 2026

17/808,314

Patent 12596942

BLACK-BOX EXPLAINER FOR TIME SERIES FORECASTING

3y 9m to grant Granted Apr 07, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

91%

Grant Probability

97%

With Interview (+6.3%)

2y 9m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 1130 resolved cases by this examiner. Grant probability derived from career allowance rate.