Last updated: May 28, 2026
Application No. 17/916,793
ARTIFICIAL INTELLIGENCE (AI) METHOD FOR CLEANING DATA FOR TRAINING AI MODELS

Final Rejection §102§103
Filed
Oct 03, 2022
Priority
Apr 03, 2020 — AU 2020901043 +1 more
Examiner
SALOMON, PHENUEL S
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Presagen Pty Ltd.
OA Round
2 (Final)
Interview Optional

— +17.8% interview lift. Examiner has a relatively high allowance rate (72%); +17.8% interview lift. A written response may suffice.
Based on 723 resolved cases, 2023–2026
Examiner Intelligence

SALOMON, PHENUEL S View full profile →
Grants 72% — above average
Career Allowance Rate
523 granted / 723 resolved
+17.3% vs TC avg
Strong +18% interview lift
Without
With
+17.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
15 currently pending
Career history
742
Total Applications
across all art units
Statute-Specific Performance

§101
2.5%
-37.5% vs TC avg
§103
86.1%
+46.1% vs TC avg
§102
7.9%
-32.1% vs TC avg
§112
0.9%
-39.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 723 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
2.	This office action is in response to the amendment filed on 11/17/2025. Claim 1-35 are pending and have been considered below.

3.	The objection to Claims 11-12 and 19 are moot pursuant to amendment.

Claim Rejections – 35 USC § 102
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

5.	Claim(s) 1-6, 8, 16-18, 20-24, 26-30, and 33-35 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Dalek et al. (US 2019/0370384).

Claim 1. Dalek discloses a computation method for cleaning a dataset for generating an Artificial Intelligence (AI) model, the method comprising:
generating a cleansed training data set comprising: dividing a training dataset into a plurality (k) of training subsets (The cluster process 316 may cluster the samples into subgroups using unsupervised Machine Learning. The clustering algorithm used by this block needs to be able to determine the number of clusters on its own, and isolate outliers) ([0050]);
training, for each training subset, a plurality (n) of Artificial Intelligence (AI) models on two or more of the remaining (k-1) training subsets and using the plurality of trained Al models to obtain an estimated label for each sample in the training subset for each trained Al model ([0033]);
removing or relabeling samples in each training subset which are consistently incorrectly predicted by the plurality of trained Al models (performing a plurality of classification processes on the set of data to automatically identify, by each classification process, a label group in the set of data; voting to determine a selected label group for the set of data based on the label group in the set of data identified by each of the plurality of classification processes) ([0033]); generating a final Al model by training one or more Al models using the cleansed training dataset (generating a curated labeled dataset, the curated labeled dataset including the set of data and the selected label group) ([0033]); deploying the final Al model (building a classifier from the curated dataset which acts as a gatekeeper for extending the curated dataset) ([0033]).

Claim 2. Dalek discloses the method as claimed in claim 1, wherein the plurality of Artificial Intelligence (AI) models comprises a plurality of model architectures (..creation of dedicated classifiers..)(abstract).

Claim 3. Dalek discloses the method as claimed in claim 1,

Claim 4. Dalek discloses the method as claimed in claim 1, wherein removing or relabeling samples in each training subset comprises: obtaining a count of the number of times each sample in each training subset is either correctly predicted, incorrectly predicted or passes a threshold confidence level, by the plurality of trained AI models; removing or relabeling samples in each training subset which are consistently wrongly predicted by comparing the predictions with a consistency threshold (“This initial classifier is then applied to all entries in the .Majority class and each sample is placed in bins based on the class confidence level of the classifier" ([0056]… "Once the machine learning model has been trained and validated, the model may be used for predictions with a confidence threshold. ") ([0025]).

Claim 5. Dalek discloses the method as claimed in claim 4, wherein the consistency threshold is estimated from the distribution of counts (“This initial classifier is then applied to all entries in the Majority class and each sample is placed in bins based on the class confidence level of the classifier" ([0056]… "Once the machine learning model has been trained and validated, the model may be used for predictions with a confidence threshold. ") ([0025]).

Claim 6. Dalek discloses the method as claimed in claim 5, wherein the consistency threshold is determined using an optimization method to identify a threshold count thatThis initial classifier is then applied to all entries in the Majority class and each sample is placed in bins based on the class confidence level of the classifier" ([0056]… "Once the machine learning model has been trained and validated, the model may be used for predictions with a confidence threshold. ") ([0025]).

Claim 8. Dalek discloses the method as claimed in claim 1, further comprising: after generating the cleansed training set and prior to generating a final AI model: iteratively retraining the plurality of trained AI models using the cleansed dataset; and generating an updated cleansed training set until a pre-determined level of performance is achieved or until there are no further samples with a count below the consistency threshold (These dedicated classifiers are retrained as the dataset grows ...) (abstract).

Claim 16. Dalek discloses a computational method for labeling a dataset for generating an Artificial Intelligence (AI) model, the method comprising:
dividing a labeled training dataset into a plurality (k) of training subsets wherein there are C labels ("The cluster process 316 may cluster the samples into subgroups using unsupervised Machine Learning. The clustering algorithm used by this block needs to be able to determine the number of clusters on its own, and isolate outliers" ([0050]) " ... (a clusterer executed by the computer system that processes the set of data to automatically identify a label group in the set of data ... ") ([0034]);
training, for each training subset, a plurality (n) of Artificial Intelligence (AI) models on two or more of the remaining (k-1) training subsets ("The method involves receiving a set of data; performing a plurality of classification processes on the set of data to automatically identify, by each classification process, a label group in the set of data ... ") ([0033]);
obtaining a plurality of label estimates for each sample in an unlabeled dataset using the plurality of trained AI models (" ... a clusterer executed by the computer system that processes the set of data to automatically identify a label group in the set of data ...) ([0034]);
repeating the dividing, training and obtaining steps C times ("This curated labelled data is then used to generate a dedicated classifier for the dataset which is used in further voting iterations") ([0034]);
assigning a label for each sample in the unlabeled dataset by using a voting strategy to combine the plurality of estimated labels for the sample (" ... voting to determine a selected label group for the set of data based on the label group in the set of data identified by each of the plurality of classification processes ... " ([0033]) "When new data is received the process is reiterated where the voting weight of the dedicated classifier is increased as the dataset grows. ") ([0034]).

Claim 17. Dalek discloses the method as claimed in claim 16, wherein the plurality of Artificial Intelligence (AI) models comprises a plurality of model architectures (" ... creation of dedicated classifiers ... " [which is considered to correspond to a plurality of model architectures] (abstract).

Claim 18. Dalek discloses the method as claimed in claim 16, wherein training, for each training subset, a plurality of Artificial Intelligence (AI) models on two or more of the remaining (k-1) training subsets comprises: training, for each training subset, a plurality of Artificial Intelligence (AI) models on all of the remaining (k-1) training subsets ("The method involves receiving a set of data; performing a plurality of classification processes on the set of data to automatically identify, by each classification process, a label group in the set of data...) ([0033]).

Claim 20. Dalek discloses the method as claimed in claim 16 wherein dividing, training, obtaining and repeating the dividing and training steps C times comprises: generating C temporary datasets from the unlabeled dataset, wherein each sample in the temporary dataset is assigned a temporary label from the C labels, such that each of the plurality of temporary datasets are distinct datasets ("The cluster process 316 may cluster the samples into subgroups using unsupervised Machine Learning. The clustering algorithm used by this block needs to be able to determine the number of clusters on its own, and isolate outliers" ([0050]), and repeating the dividing, training and obtaining steps C times comprises performing the dividing, training and obtaining steps for each of the temporary datasets, such that for each temporary datasets the dividing step comprises combining the temporary dataset with the labeled training dataset and then dividing into a plurality (k) of training subsets ("The cluster process 316 may cluster the samples into subgroups using unsupervised Machine Learning. The clustering algorithm used by this block needs to be able to determine the number of clusters on its own, and isolate outliers" ([0050]) " ... (a clusterer executed by the computer system that processes the set of data to automatically identify a label group in the set of data ... ") ([0034])… ("The method involves receiving a set of data; performing a plurality of classification processes on the set of data to automatically identify, by each classification process, a label group in the set of data ... ") ([0033])…, and the training and obtaining step comprises training, for each training subset, a plurality (n) of Artificial Intelligence (AI) models on two or more of the remaining (k-1) training subsets and using the plurality of trained AI models to obtain an estimated label for each sample in the training subset for each trained AI model ("The method involves receiving a set of data; performing a plurality of classification processes on the set of data to automatically identify, by each classification process, a label group in the set of data ... ") ([0033])

Claim 21. Dalek discloses the method as claimed in claim 20 wherein assigning a temporary label from the C labels is assigned randomly (" ... voting to determine a selected label group for the set of data based on the label group in the set of data identified by each of the plurality of classification processes ... " ([0033]).

Claim 22. Dalek discloses the method as claimed in claim 20 wherein assigning a temporary label from the C labels is estimated by an AI model trained on the training data ([0023]-[0024]).

Claim 23. Dalek discloses the method as claimed in claim 20 wherein assigning a temporary label from the C labels is assigned from the set of C labels in random order such that each label occurs once in the set of C temporary datasets (" ... voting to determine a selected label group for the set of data based on the label group in the set of data identified by each of the plurality of classification processes ... " ([0033]) "When new data is received the process is reiterated where the voting weight of the dedicated classifier is increased as the dataset grows. ") ([0034]).

Claim 24. Dalek discloses the method as claimed in claim 20 wherein the steps of combining the temporary dataset with the labeled training dataset further comprises splitting the temporary dataset into a plurality of subsets, and combining each subset with the labeled training dataset and dividing into a plurality (k) of training subsets and performing the training step ("The cluster process 316 may cluster the samples into subgroups using unsupervised Machine Learning. The clustering algorithm used by this block needs to be able to determine the number of clusters on its own, and isolate outliers" ([0050]) " ... (a clusterer executed by the computer system that processes the set of data to automatically identify a label group in the set of data ... ") ([0034]).

Claim 26. Dalek discloses the method as claimed in claim 16 wherein C is 1 and the voting strategy is a majority inferred strategy (" ... the voting block 314 may wait until all classifiers have cast their votes before deciding on the final label. Initially, the weight of each classifier is set to a fixed value and a majority vote is performed) ([0049]).

Claim 27. Dalek discloses the method as claimed in claim 16 wherein C is 1 and the voting strategy is a maximum confidence strategy ("This initial classifier is then applied to all entries in the Majority class and each sample is placed in bins based on the class confidence level of the classifier. ") [It is considered these bins correspond to the number of times a label (bin) is estimated by the model (classifier)] ([0056]).

Claim 28. Dalek discloses the method as claimed in claim 16, wherein C is greater than 1, and the voting strategy is a consensus based strategy based on the number of times each label is estimated by plurality of models ("This initial classifier is then applied to all entries in the Majority class and each sample is placed in bins based on the class confidence level of the classifier. ") [It is considered these bins correspond to the number of times a label (bin) is estimated by the model (classifier)] ([0056]).

Claim 29. Dalek discloses the method as claimed in claim 28 wherein C is greater than 1 and the voting strategy counts the number of times each label is estimated for a sample, and assigns the label with the highest count that is more than a threshold amount of the second highest count ("This initial classifier is then applied to all entries in the Majority class and each sample is placed in bins based on the class confidence level of the classifier. ") [It is considered these bins correspond to the number of times a label (bin) is estimated by the model (classifier)] ([0056]).

Claim 30. Dalek discloses the method as claimed in claim 16 wherein C is greater than 1 and the voting strategy is configured to estimate the label which is reliably estimated by a plurality of models ("... voting, by each classification process, to determine a selected label group for the minority class samples ... ") (Claim 1).

Claim 33 represents the system of claim 1 and is rejected along the same rationale.

Claim 34 represents the system of claim 1 and is rejected along the same rationale and Dalek further discloses the one or more processors are further configured to receive input data via the communications interface, process the input data using the stored final trained AI model to generate a model result, and the communications interface is configured to send the model result to a user interface or data storage device ([0030]).

Claim 35 represents the system of claim 16 and is rejected along the same rationale.

Claim Rejections - 35 USC § 103
6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

7.	Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dalek et al. (US 2019/0370384) in view of Yadav et al. (US 2016/0359886).

Claim 7. Dalek discloses the method as claimed in claim 6, but fails to explicitly disclose wherein determining a consistency threshold comprises: generating a histogram of the counts where each bin of the histogram comprises the number of samples in the training dataset with the same count where the number of bins is the number of training subsets multiplied by number of AI models; generating a cumulative histogram from the histogram; calculating a weighted difference between each pair of adjacent bins in the cumulative histogram; setting the consistency threshold as the bin that
However, Yadav discloses generating a histogram of the counts where each bin of the histogram comprises the number of samples in the training dataset with the same count where the number of bins is the number of training subsets multiplied by number of AI models; generating a cumulative histogram from the histogram; calculating a weighted difference between each pair of adjacent bins in the cumulative histogram; setting the consistency threshold as the bin thatDalek. One would have been motivated to do so to facilitate behavior that does not conform to an expected pattern or data.

8.	Claim(s) 9-10, 13-15, 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dalek et al. (US 2019/0370384) in view of McCourt, Jr (US 2020/0019884)

Claim 9. Dalek discloses the method as claimed in claim 1, wherein prior to generating the cleansed dataset the training dataset is tested for positive predictive power and the training dataset is only cleaned if the positive predictive power is within a predefined range, wherein estimating the positive predictive power comprises:
dividing a training dataset into a plurality of validation subsets ("The cluster process 316 may cluster the samples into subgroups using unsupervised Machine Learning. The clustering algorithm used by this block needs to be able to determine the number of clusters on its own, and isolate outliers" ([0050]) " ... (a clusterer executed by the computer system that processes the set of data to automatically identify a label group in the set of data ... ") ([0034]);
training, for each validation subset, a plurality of Artificial Intelligence (AI) models on two or more of the remaining (k-1) validation subsets ("The method involves receiving a set of data; performing a plurality of classification processes on the set of data to automatically identify, by each classification process, a label group in the set of data ... ") ([0033]);
obtaining a first count of the number of times each sample in the validation dataset is either correctly predicted, incorrectly predicted, or passes a threshold confidence level, by the plurality of trained AI models (“This initial classifier is then applied to all entries in the Majority class and each sample is placed in bins based on the class confidence level of the classifier" ([0056]… "Once the machine learning model has been trained and validated, the model may be used for predictions with a confidence threshold. ") ([0025]);
randomly assigning a label or outcome to each sample (" ... voting to determine a selected label group for the set of data based on the label group in the set of data identified by each of the plurality of classification processes ... " ([0033]) "When new data is received the process is reiterated where the voting weight of the dedicated classifier is increased as the dataset grows. ") ([0034]);
training, for each validation subset, a plurality of Artificial Intelligence (AI) models on two or more of the remaining (k-1) validation subsets (The method involves receiving a set of data; performing a plurality of classification processes on the set of data to automatically identify, by each classification process, a label group in the set of data ... ") ([0033]);
estimating the positive predictive power by comparing the first count and the second count ([0056]).
Dalek does not explicitly disclose obtaining a second count of the number of times each sample in the validation dataset is either correctly predicted, incorrectly predicted, or passes a threshold confidence level, by the plurality of trained AI models when random assigned labels are used.
However, McCourt discloses obtaining a second count of the number of times each sample in the validation dataset is either correctly predicted, incorrectly predicted, or passes a threshold confidence level, by the plurality of trained AI models when random assigned labels are used (claim 1). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate this feature in Dalek. One would have been motivated to do so to facilitate classification of trained data.

Claim 10. Dalek and McCourt disclose the method as claimed in claim 9, McCourt further discloses wherein the method is repeated for each dataset in a plurality of datasets and the step of generating a final AI model by training one or more AI models using the cleansed training dataset comprises: generating an aggregated dataset using the plurality of cleaned datasets; generating a final AI model by training one or more AI models using the aggregated dataset ([0144]-[0146]). One would have been motivated to do so to facilitate classification of trained data.

Claim 13. Dalek discloses the method as claimed inThis initial classifier is then applied to all entries in the .Majority class and each sample is placed in bins based on the class confidence level of the classifier" ([0056]… "Once the machine learning model has been trained and validated, the model may be used for predictions with a confidence threshold. ") ([0025]); and Dalek does not explicitly disclose the step of removing or relabeling samples in each training set with a count below a consistency threshold comprises is performed separately for each noisy class and each correct class, and the consistency threshold is a per-class consistency threshold.
However, McCourt discloses removing or relabeling samples in each training set with a count below a consistency threshold comprises is performed separately for each noisy class and each correct class, and the consistency threshold is a per-class consistency threshold ("One way to more accurately determine a (e.g., any) binary classifier's accuracy when the ground truth labels include inaccuracies is to flag all of the incorrect labels (e.g., bad points) of training data which do not fit the model and either correct (e.g., label as the opposite) or remove the training data for these bad points") ([0142]). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate this feature in Dalek. One would have been motivated to do so to facilitate classification of trained data.

Claim 14. Dalek discloses the method claimed inThe cluster process 316 may cluster the samples into subgroups using unsupervised Machine Learning. The clustering algorithm used by this block needs to be able to determine the number of clusters on its own, and isolate outliers") ([0050]); randomizing the class labels in the training set ([0052]); training an AI model on the training set withvalidation set and a second metric for the test set ([0023]-[0024]); Dalek does not explicitly disclose excluding the dataset if the first metric and the second metric are not within a predefined range.
However, McCourt discloses excluding the dataset if the first metric and the second metric are not within a predefined range ("One way to more accurately determine a (e.g., any) binary classifier's accuracy when the ground truth labels include inaccuracies is to flag all of the incorrect labels (e.g., bad points) of training data which do not fit the model and either correct (e.g., label as the opposite) or remove the training data for these bad points") ([0142]). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate this feature in Dalek. One would have been motivated to do so to facilitate classification of trained data.

Claim 15. Dalek discloses the method claimed in claim 1, further comprising assessing the transferability of a dataset comprising: splitting the dataset into a training set, validation set and test set (The cluster process 316 may cluster the samples into subgroups using unsupervised Machine Learning. The clustering algorithm used by this block needs to be able to determine the number of clusters on its own, and isolate outliers") ([0050]); training an AI model on the training set, and testing the AI model using the validation set and test sets ([0055]); Dalek does not explicitly disclose for each epoch in a plurality of epochs, estimating a first metric of the validation set and a second metric of the test set; and estimating the correlation of the first metric and the second metric over the plurality of epochs.
However, McCourt discloses for each epoch in a plurality of epochs, estimating a first metric of the validation set and a second metric of the test set; and estimating the correlation of the first metric and the second metric over the plurality of epochs ([0057]). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate this feature in Dalek. One would have been motivated to do so to facilitate classification of trained data.

Claim 25. Dalek discloses the method as claimed in claim 24 but fails to explicitly disclose wherein the size of each subset is less than the 20% of the size of the training set.
However, McCourt discloses wherein the size of each subset is less than the 20% of the size of the training set ([0112]). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate this feature in Dalek. One would have been motivated to do so to facilitate classification of trained data.

9.	Claim(s) 31-32 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dalek et al. (US 2019/0370384) in view of XU et al. Weakly supervised histopathology cancer image segmentation and classification (22 February 2014) (Hereinafter XU).

Claim 31. Dalek discloses the method as claimed in claim 16, but fails to explicitly disclose wherein the dataset is a healthcare dataset.
However, XU discloses wherein the dataset is a healthcare dataset (… we conduct experiments on two medical image datasets….. Datasets. Colon histopathology images with four cancer types are used…) (section 4 Experiments). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate this feature in Dalek. One would have been motivated to do so to facilitate a better classification of trained data.
Claim 32. Dalek discloses the method as claimed in claim 31 but fails to explicitly disclose wherein the healthcare dataset comprises a plurality of healthcare images.
However, XU discloses wherein the healthcare dataset comprises a plurality of healthcare images (… we conduct experiments on two medical image datasets….. Datasets. Colon histopathology images with four cancer types are used…) (section 4 Experiments). Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate this feature in Dalek. One would have been motivated to do so to facilitate a better classification of trained data.

Allowable Subject Matter
Claims 11-12 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Arguments
10.	Applicant’s arguments filed 11/17/2025 have been fully considered but they are not persuasive.
As per claim 1, Applicants argue “However, the dividing of the training set and training a plurality of models are not disclosed as being done in an order which would enable the training of the plurality of AI models on the two or more remaining (k-1) training subsets, and using the plurality of trained AI models to obtain an estimated label for each sample in the training subset for each trained AI model.”
In response, Examiner respectfully disagrees and submits:
First, the claims do not require a specific temporal sequence beyond what is logically inherent in performing cross-partition training and prediction. Under the broadest reasonable interpretation (BRI) consistent with the specification (see MPEP §2111), the claimed steps encompass training multiple models on complementary subsets of a dataset and generating predictions for samples using the trained models. The order in which dataset partitioning, model training, and prediction occur is dictated by functional necessity and would have been understood by one of ordinary skill in the art.
The cited reference(s) disclose partitioning training data into subsets and training multiple models on respective subsets for validation, ensemble learning, or robustness improvement. Inherent in such techniques—such as k-fold cross-validation or ensemble partition training—is that a model trained on one subset (or combination of subsets) is used to generate predictions for data not used in its training. Thus, training on (k-1) subsets and generating predictions for the remaining subset is a well-understood and conventional machine learning workflow.
Furthermore, to the extent Applicant argues that the reference fails to explicitly state that each trained model is used to estimate labels for samples in a different subset, such functionality is inherent in multi-model validation and ensemble frameworks. See In re Best, 562 F.2d 1252, 1255 (CCPA 1977) (where the claimed and prior art processes are substantially the same, inherent characteristics are presumed). Training multiple models on partitioned data necessarily implies their use in evaluating or predicting labels across dataset partitions.

Applicant argues that there is no disclosure or suggestion of taking the entire dataset and dividing this into k datasets and training n models for each k datasets,” At best, Dalek describes a single minority class dataset and trains n models on this one dataset.
The Examiner respectfully disagrees.
First, under the broadest reasonable interpretation (BRI) consistent with the specification (see MPEP §2111), the claimed limitation does not require an explicit disclosure of dividing the dataset into “k datasets” using the same terminology. Rather, the limitation broadly encompasses partitioning or segmenting a dataset into subsets and training multiple models using those subsets.
Dalek discloses generating multiple models using subsets of data, including minority-class data, for purposes such as improving classification robustness and addressing imbalance. The use of segmented or resampled datasets for training multiple models constitutes a form of dataset partitioning. The fact that Dalek emphasizes minority class datasets does not negate that the dataset is being partitioned or manipulated into subsets for separate model training. Training multiple models on defined subsets of data reasonably corresponds to dividing data into k subsets and training n models accordingly.
Moreover, even assuming arguendo that Dalek explicitly describes training multiple models on a minority dataset derived from a larger dataset, it would have been obvious to one of ordinary skill in the art to partition the entire dataset into multiple subsets—including majority and minority class partitions—and train corresponding models as a predictable variation. Partitioning datasets (e.g., folds, shards, stratified subsets, bootstrap samples) and training multiple models are well-understood and routine machine learning techniques used for ensemble learning, imbalance handling, and performance optimization. The selection of a particular number of partitions (k) or models (n) represents the optimization of a result-effective variable, which is considered obvious absent a showing of criticality. See MPEP §2144.05.

Applicant argues that Dalek’s disclosure of generating a curated labeled dataset does not teach or suggest the claimed feature of “removing or relabeling samples in each training subset which are consistently incorrectly predicted by the plurality of trained AI models.” The Examiner respectfully disagrees.
Dalek discloses generating a curated labeled dataset through iterative evaluation and refinement of training data, including identifying misclassified or low-confidence samples and modifying the dataset accordingly. The process of curating labeled data necessarily involves identifying incorrectly labeled or problematic samples and either correcting (relabeling) or excluding (removing) such samples to improve model performance.
Under the broadest reasonable interpretation (BRI) consistent with the specification (see MPEP §2111), the claimed limitation encompasses detecting samples that are repeatedly mispredicted by trained models and taking corrective action such as removal or relabeling. Dalek’s disclosure of dataset curation based on model evaluation corresponds to this functionality, as identifying and addressing misclassified samples inherently involves evaluating prediction outcomes and modifying the dataset to improve training quality.
Furthermore, even assuming arguendo that Dalek does not explicitly describe the specific phrasing “consistently incorrectly predicted by a plurality of trained AI models,” the use of multiple models to identify unreliable or noisy samples represents a well-known data cleansing and ensemble validation technique in machine learning. 
Additionally, the selection of criteria for identifying problematic samples (e.g., repeated misclassification across multiple models) constitutes optimization of a result-effective variable aimed at improving dataset quality and model accuracy. Such optimization is considered obvious absent evidence of criticality. See MPEP §2144.05.

As per claim 2, Applicant argue “The use of different model architectures ensures the models have different weights and biases, and so the models will give statistically different results. Thus, using different architectures is likely to reduce biases, as the voting will be diversified among the different architectures. This is not disclosed or suggested by Dalek, and thus, claim 2 is not anticipated by Dalek for this additional reason.

In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., The use of different model architectures ensures the models have different weights and biases, and so the models will give statistically different results. Thus, using different architectures is likely to reduce biases, as the voting will be diversified among the different architectures) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

As per claim 3, Applicant argues that the claimed method increases bias mitigation by ensuring that each training subset is trained on by n–1 models, and asserts that Dalek does not disclose training a plurality of AI models on at least two training subsets.
The Examiner respectfully disagrees.
First, under the broadest reasonable interpretation (BRI) consistent with the specification (see MPEP §2111), the claims require training multiple AI models on partitioned subsets of a dataset. Dalek discloses generating multiple models using segmented or curated data subsets to improve robustness and model performance. Training multiple models on defined subsets of data inherently involves training on at least two subsets, whether explicitly described as such or as part of a resampling, stratification, or imbalance-handling framework.
Second, Applicant’s argument regarding bias mitigation reflects an asserted advantage or intended result of the claimed configuration, not a structural or procedural distinction over the prior art. The fact that a claimed arrangement may improve bias mitigation does not render the arrangement patentable if the underlying steps—dataset partitioning and multi-model training—are themselves well-known and conventional techniques. See MPEP §2144.04 (discovery of an additional benefit of a known process does not render the process patentable).
Moreover, training multiple models on complementary subsets of a dataset—such as in k-fold cross-validation, ensemble learning, or bootstrap aggregation (bagging)—is a well-understood and routine technique in machine learning. In such frameworks, each subset is necessarily excluded from one model’s training and included in others, resulting in multiple models being trained on overlapping subsets. Ensuring that each subset is trained on by n–1 models is a predictable outcome of standard partition-and-train methodologies. The specific numeric relationship between subsets and models represents the optimization of a result-effective variable (i.e., number of folds or models), which is considered obvious absent evidence of criticality. See MPEP §2144.05.

As per claim 4, Applicant argues that Dalek differs from the claimed invention because Dalek’s classes or labels indicate noisiness (i.e., whether data is “good” or “bad”), and that Dalek trains an AI model on a small pre-labeled dataset to extrapolate noise prediction, whereas the claimed method does not require a pre-labeled initial dataset. The Examiner respectfully disagrees.
First, Applicant’s argument improperly focuses on the semantic nature of Dalek’s labels (e.g., “good” or “bad”) rather than on the underlying training methodology. Dalek discloses using labeled data to train models that evaluate dataset quality and identify problematic samples. The particular meaning assigned to the labels (noise quality versus another characteristic) does not materially distinguish the structural or procedural aspects of the training process. Under MPEP §2111, claim limitations are interpreted according to their broadest reasonable interpretation, and the claims do not exclude training using labels indicative of data quality or reliability.
Second, even if Dalek begins with a pre-labeled dataset for training purposes, the use of an initial labeled subset to bootstrap model training is a well-understood and routine technique in machine learning. It would have been obvious to one of ordinary skill in the art to employ labeled subsets—whether large or small—to train models for evaluating or refining broader datasets. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398 (2007) (obviousness analysis considers common sense and routine variations). The fact that Dalek uses a pre-labeled dataset to train a predictive model does not distinguish it from the claimed method if the claimed method similarly performs model training and dataset refinement operations.
Third, to the extent Applicant argues that the claimed method “does not require” a pre-labeled initial dataset, the absence of an express requirement does not affirmatively exclude the presence of one. A claim that is silent regarding a particular feature does not exclude prior art embodiments that include that feature. See MPEP §2111.04 (limitations not positively recited in the claim are not read into the claim). Accordingly, even if Dalek employs a pre-labeled dataset, such disclosure does not avoid the rejection where the claim does not prohibit or exclude such an implementation.
Finally, the alleged difference in purpose—predicting noise versus performing another form of dataset evaluation—does not render the claimed method patentably distinct when the underlying technical steps (model training, evaluation of data samples, and dataset refinement) are substantially similar. A change in intended use or recognition of a different advantage does not confer patentability where the structural and procedural steps are otherwise taught or suggested in the prior art. See MPEP §2144.04.

As per claim 7, 
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., specify how to assess where the changes in the assessment by only one AI model out of the total number of AI models…) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).


As per claim 8, Applicant argues that Dalek trains AI models on a dataset that is progressively curated by folding predicted labels into the original dataset (i.e., an augmented dataset), whereas claim 8 recites training newly trained AI models on a “cleansed dataset,” and therefore Dalek does not teach or suggest the claimed method. The Examiner respectfully disagrees.
First, the distinction drawn by Applicant between an “augmented dataset” and a “cleansed dataset” does not reflect a meaningful structural or procedural difference in the context of machine learning workflows. Dalek discloses iteratively refining a dataset by incorporating prediction results and updating labels or excluding unreliable samples to improve model performance. Whether characterized as “curating,” “augmenting,” or “cleansing,” the underlying operation involves modifying the dataset based on model predictions to improve data quality and subsequent training.
Second, training models on a dataset that has been refined or updated through evaluation inherently constitutes training on a dataset that has been altered to improve reliability—i.e., a cleansed dataset. The fact that Dalek describes incorporating predicted labels into the dataset does not preclude the dataset from simultaneously being filtered, corrected, or otherwise improved. Dataset refinement through relabeling or removal of noisy samples is a conventional data cleansing technique in machine learning.
Third, to the extent Applicant asserts that the claimed “cleansed dataset” excludes the incorporation of predicted labels, the claims do not expressly prohibit updating labels or incorporating model-generated labels. Under the broadest reasonable interpretation (see MPEP §2111), a “cleansed dataset” encompasses any dataset that has been processed to improve data quality, including through relabeling, correction, or removal of samples. Dalek’s iterative curation process reasonably falls within this scope.
Finally, even assuming arguendo that Dalek’s approach differs slightly in terminology or sequence, modifying a dataset through relabeling or removal and then retraining models on the resulting dataset represents a routine and predictable variation of known data refinement techniques.

As per claims 9-15, 
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., using the plurality of specific cleaned datasets that are generated using an initial plurality of datasets and applying the Untrainable Data Cleansing ("UDC") method for each dataset separately) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).

As per claims 16 and 20, Supra claim 1 rebuttal.
As per claims 33-35, Supra claim 1 rebuttal.

Conclusion
11.	THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Phenuel S. Salomon whose telephone number is (571) 270-1699.  The examiner can normally be reached on Mon-Fri 7:00 A.M. to 4:00 P.M. (Alternate Friday Off) EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-3800.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PHENUEL S SALOMON/Primary Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Oct 03, 2022
Application Filed
Jul 15, 2025
Non-Final Rejection mailed — §102, §103
Nov 17, 2025
Response Filed
Feb 23, 2026
Final Rejection mailed — §102, §103
May 26, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/075,797
Patent 12632720
EVALUATING SYSTEM GENERATED HISTORICAL TRANSACTION TIMELINE IMAGES
5y 7m to grant Granted May 19, 2026
17/309,923
Patent 12625267
PLANE DETECTION METHOD AND DEVICE BASED ON LASER SENSOR
4y 10m to grant Granted May 12, 2026
18/097,292
Patent 12626191
KNOWLEDGE GRAPH FUSION METHOD BASED ON ITERATIVE COMPLETION
3y 3m to grant Granted May 12, 2026
17/137,285
Patent 12619867
Automated Creation of Machine-learning Modeling Pipelines
5y 4m to grant Granted May 05, 2026
17/568,644
Patent 12602348
DATA ACTOR AND DATA PROCESSING METHOD THEREOF
4y 3m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
90%
With Interview (+17.8%)
3y 4m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 723 resolved cases by this examiner. Grant probability derived from career allowance rate.