DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on April 11, 2024 in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendments
This office action is responsive to the Preliminary Amendment to the claims received on March 29, 2024.
Currently Pending Claim(s) 1-12
Independent Claim(s) 1 and 12
Amended Claim(s) 1-12
Canceled Claim(s) 13-14
Drawings
The drawings are objected to because the unlabeled rectangular box(es) shown in the Fig. 8 and Fig. 9 should be provided with descriptive text labels.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
Applicant is reminded of the proper language and format for an abstract of the disclosure. The form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided. The submitted Abstract includes the legal phraseology “the said class” in the last sentence.
Additionally, the abstract of the disclosure does not commence on a separate sheet in accordance with 37 CFR 1.52(b)(4) and 1.72(b). A new abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text.
See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 2, 6, and 9 are rejected under 35 U.S.C. 112(b) for failing to clearly define the claimed invention.
Regarding claims 2, the limitation "the input data" has insufficient antecedent basis. Additionally, it is unclear from the claims what this input data is and what it is used for.
Regarding claim 6, “the previous input data for the class” has insufficient antecedent basis. Additionally, it is unclear from the claims what this input data is and what it is used for.
Claim 9 recites the limitation "the decision logic." There is insufficient antecedent basis for this limitation in the claim.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1 and 9-12 are rejected under 35 U.S.C. 102 as being anticipated by Ghesu et al. (Quantifying and leveraging predictive uncertainty for medical image assessment. arXiv:2007.04258v1), hereafter Ghesu.
Regarding claim 1, Ghesu teaches A system (SYS) for providing training data ([Section 3] “We propose a model for joint sample classification and predictive uncertainty estimation, following the Dempster-Shafer theory of evidence.”),
a memory that stores a plurality of instructions; and a processor coupled to the memory and configured to execute the plurality of instructions (Ghesu teaches a system which runs machine learning classifiers. This system would require at least a processor and memory to perform operations.) to:
receive at least one classification result for a class from plural pre-defined classes wherein the classification result has been produced by a trained machine learning model in response to processing an input image (Throughout all of Section 3, Ghesu denotes the classification of an image as yk, where k denotes the image index in an image dataset of size N, and y denotes the classification of the image. [Section 3.2] “Let us assume a labeled training dataset is given as
{
I
k
,
y
k
}
k
=
1
N
, consisting of N pairs of images Ik with a binary class assignment yk ∈ {0, 1}. We propose to use a parametric model – a deep convolutional neural network to estimate the per-class evidence values from the image data.” Additionally, probabilistic predictions output from the model are used in Section 3.2.);
analyze input data comprising a received classification result score and an uncertainty value associated with the classification result score (Fig. 6 shows the values
u
^
and
p
^
determined using the experiments explained in Section 4.
u
^
is an uncertainty value, and
p
^
is an estimated probability of belonging to a class output by a classification model.), and
output, per the received classification result, an associated indication whether the input image is or is not useful for re-training the model with respect to the class, wherein the usefulness is based on the uncertainty value associated with the classification result score of the input image (Ghesu teaches Uncertainty-driven bootstrapping for determining whether or not input images are useful for training. [Section 4.1.5] “Using uncertainty driven bootstrapping one can filter the training data, i.e., remove a fraction of training cases with highest uncertainty, with the goal to reduce label noise. On the example of pleural effusion, based on the ChestX-Ray8 dataset, we show that one can retrain the system on the remaining data and achieve better performance on an unseen dataset. Performance is reported as a triple of values [AUC; F1-score (for the positive class); F1-score (for the negative class)].”),
wherein the analysis is based on a deployment experience set, and wherein the deployment experience set comprises all images processed by the model since the last training (Ghesu teaches performing analysis for a training set, such as a ChestX-Ray8 dataset, throughout the experiment in Section 4. The entire data set is run through the model to obtain predictions and calculate the uncertainty for each sample, then a fraction of the data set is determined to be usable or unusable for retraining. Additionally, the method of uncertainty driven bootstrapping (See section 3.3) is applicable to any data set.).
Regarding claim 9, Ghesu teaches the system of claim 1, wherein the decision logic is configured to trigger retraining by a training system of the model, based on a training data set including one or more such input images indicated as useful for retraining (Ghesu teaches retraining a classification model after removing the images that are not useful for training. A fraction.
D
ϵ
is the dataset reduced in size for retraining after a fraction, denoted by ϵ, of images are removed. [Section 3.3] “The hypothesis is that by retraining the model on dataset
D
ϵ
one can increase the robustness during training and improve its performance on unseen data.”.
Regarding claim 10, Ghesu teaches the system of claim 1, wherein the trained machine learning model is a classifier of any one of: artificial neural network model, support vector machine, decision tree, random forest, k-nearest neighbor, naive Bayes, linear discriminate analysis, and ensemble and boosting techniques (The background of Section 2 teaches known classifier models for classification of medical images. Examples taught in Section 2.1 include convolution neural networks, ensembles, dense neural networks, etc. [Section 3.2] “We propose to use a parametric model – a deep convolutional neural network to estimate the per-class evidence values from the image data.”).
Regarding claim 11, Ghesu teaches the system of claim 1, wherein the input image is of any one of: X-ray, magnetic resonance, ultrasound, nuclear (Ghesu teaches experiments for training classifiers on x-rays [Fig. 6], ultrasound images [Fig. 8], MRI images [Fig. 11], and determining the uncertainty for each image. The use of nuclear images would be performed similarly and not provide a meaningful change or inventive concept to overcome the art of Ghesu.).
Regarding claim 12, Ghesu teaches a computer-implemented method for providing training data ([Section 3] “We propose a model for joint sample classification and predictive uncertainty estimation, following the Dempster-Shafer theory of evidence.”), the method comprising:
receiving at least one classification result for a class from plural pre-defined classes, the classification result produced by a trained machine learning model in response to processing an input image (Throughout all of Section 3, Ghesu denotes the classification of an image as yk, where k denotes the image index in an image dataset of size N, and y denotes the classification of the image. [Section 3.2] “Let us assume a labeled training dataset is given as
{
I
k
,
y
k
}
k
=
1
N
, consisting of N pairs of images Ik with a binary class assignment yk ∈ {0, 1}. We propose to use a parametric model – a deep convolutional neural network to estimate the per-class evidence values from the image data.” Additionally, probabilistic predictions output from the model are used in Section 3.2.);
analyzing input data comprising the received classification result score and an uncertainty value associated with the classification result score (Fig. 6 shows the values
u
^
and
p
^
determined using the experiments explained in Section 4.
u
^
is an uncertainty value, and
p
^
is an estimated probability of belonging to a class output by a classification model.), and
outputting per the received classification result, an associated indication whether the input image is or is not useful for re-training the model with respect to the class, wherein the usefulness is based on the uncertainty value associated with the classification result score of the input image (Ghesu teaches Uncertainty-driven bootstrapping for determining whether or not input images are useful for training. [Section 4.1.5] “Using uncertainty driven bootstrapping one can filter the training data, i.e., remove a fraction of training cases with highest uncertainty, with the goal to reduce label noise. On the example of pleural effusion, based on the ChestX-Ray8 dataset, we show that one can retrain the system on the remaining data and achieve better performance on an unseen dataset. Performance is reported as a triple of values [AUC; F1-score (for the positive class); F1-score (for the negative class)].”), and
wherein the analysis is based on a deployment experience set, wherein the deployment experience set comprises all images processed by the model since the last training (Ghesu teaches performing analysis for a training set, such as a ChestX-Ray8 dataset, throughout the experiment in Section 4. The entire data set is run through the model to obtain predictions and calculate the uncertainty for each sample, then a fraction of the data set is determined to be usable or unusable for retraining.).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Ghesu et al. (Quantifying and leveraging predictive uncertainty for medical image assessment. arXiv:2007.04258v1) and further in view of Kim et al. (US 2020/0034661 A1), hereafter Kim.
Regarding claim 2, Ghesu teaches the system of claim 1 which operates on a medical image data set, such as the ChestX-Ray8 dataset. In Section 3.3, uncertainty driven bootstrapping is taught generally and could be applied to any data set. However, Ghesu doesn’t specifically teach that new data is processed and added to the training dataset. Thus, Ghesu fails to teach wherein the input data for the images that have been processed by the model since the last training is addable to the deployment experience set.
However, Kim teaches wherein the input data for the images that have been processed by the model since the last training is addable to the deployment experience set (Fig. 5 shows a flowchart of Kim’s invention. New sensor data is read in, evaluated for uncertainty, and added to the training data if the uncertainty is not too high.).
Ghesu and Kim are analogous in the art to the claimed invention, because both teach methods of evaluating training data for uncertainty and determining if training samples are useful or not for training machine learning models. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ghesu’s method by adding new training data into the experience training set. This modification would allow for Ghesu’s method to be applied to systems that are continuing to learn from data over time (For example, Kim teaches applying a similar method to autonomous vehicles [0099-0108] where machine learning models are constantly being improved by recording sensor data, determining if the sensor data is useful for training, and adding useful data to the training set [Fig. 5].).
Regarding claim 6, Ghesu teaches the system of claim 1 including calculating uncertainty values and determining which samples have the most uncertainty. Although this method inherently involves outliers being the samples with the most certainty (see Fig. 1 and the discussion in Section 2.3), Ghesu does not specifically teach performing an outlier analysis. Thus, Ghesu fails to teach wherein the analysis by the system includes the system performing an outlier analysis in respect of the uncertainty value relative to uncertainty values of the previous input data for the class.
However, Kim teaches wherein the analysis by the system includes the system performing an outlier analysis in respect of the uncertainty value relative to uncertainty values of the previous input data for the class (Fig. 7 shows identifying data that is most near class boundaries, such as the data in 725 and 724. Kim teaches identifying data samples closest to class boundaries, which has higher uncertainty [Fig. 7, 0210-0222], and determining the uncertainty of data to compare to criterion (first and second reference values) for determining if data is useful for retraining [Fig. 9, 0224-0238]. Additionally, Kim teaches an example of determining the model confidence on a prediction for a sample based on the prediction’s distance from a class [Fig. 12]; thus, outliers in the data are identified as having higher uncertainty. Kim also compares confidence values to predetermined reference values to determine outliers which are outside of class domains [Fig. 12, 0254].).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ghesu’s invention by performing an outlier analysis to identify outliers with high uncertainty that may be not useful for retraining. This modification would further expand upon Ghesu’s teachings (Ghesu shows how uncertainty is represented graphically by data outliers in Fig. 1.) and apply them to open-set recognition models with more than one class (See Kim Fig. 7 and 0214-0216. Kim teaches that models trained on specific domains cannot grasp data outside of those domains, such as outliers. Open set-recognition models can be improved by indicating when data with high-uncertainty is outside of the known domains since models cannot provide high prediction confidence on data outside of the known domains.).
Claims 3-5 are rejected under 35 U.S.C. 103 as being unpatentable over over Ghesu (Quantifying and leveraging predictive uncertainty for medical image assessment. arXiv:2007.04258v1) in view of Erenrich et al. (US 10,325,224 B1), hereafter Erenrich.
Regarding claim 3, Ghesu teaches the system of claim 1, wherein the analysis by the system is based on at least one criterion (Ghesu teaches determining usable and unusable images by a criterion. Ghesu teaches a threshold ut, which is the maximum uncertainty an image label prediction can have to be considered useful. [4.1.3] “…set a threshold ut and configure the system to not output its prediction on any cases with an expected uncertainty larger than ut.” [4.3.1] “To evaluate the efficacy of uncertainty-driven sampling rejection with high predictive uncertainty, we measured the classification performance of the trained model in different coverage settings.” Additionally, Ghesu teaches that a criterion could be a fraction of the data set, where the fraction of samples with the highest uncertainty are considered to be not useful. [4.1.3] “Formally, we refer to the degree of sample rejection using the term coverage, as an expected percentile of cases to be rejected.” [4.1.5] “Using uncertainty driven bootstrapping one can filter the training data, i.e., remove a fraction of training cases with highest uncertainty, with the goal to reduce label noise.” Also note that using a criterion, such as a threshold or percentage, to determine how many samples are not useful is well-known in the art as shown across multiple references not used in this rejection. For example, see Step S511 in US 2020/0034661 A1 from the IDS.), and
wherein the criterion is adaptable (Table 2 shows examples of changing the coverage value. Section 4.1.5 teaches changing the criterion using uncertainty-driven bootstrapping to retrain on only useful samples. Not useful samples were a fraction (criterion) of the total samples with the highest uncertainty.).
Ghesu teaches using adaptable criterion for determining what fraction of the samples are useful, and this would change the number of useful samples based on the size of the training set. However, Ghesu does not teach changing the criterion itself based on the size of the training set. Thus, Ghesu fails to teach wherein the criterion is adaptable based on the size of the deployment experience set.
However, Erenrich teaches wherein the criterion is adaptable based on the size of the deployment experience set (See Fig. 8 and Col. 22, line 12 – Col. 23, line 25. Erenrich teaches selecting batches of training data with the most uncertainty (i.e. the least useful data for retraining) for manual labeling by a human. The amount of uncertain data samples manually labeled is dependent upon the size of the training set. [Col. 23, lines 5-9] “The threshold number may be determined according to at least one of batch size, estimated time for a user to label an example, system processing speed, training dataset size, and other factors.”).
Ghesu and Erenrich are analogous in the art to the claimed invention, because both teach methods of determining the uncertainty of predictions and determining which data samples are useful for training a machine learning model. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ghesu’s method by changing the criterion based on the size of the training data set. This modification would allow for manual labeling of uncertain samples without large amounts of downtime during labeling since the amount of uncertain samples to be labeled is based on the dataset size (Erenrich [Col. 23, lines 17-25] “Process 800 may provide an efficient and streamlined method of selecting training examples for labeling in a supervised active machine learning training process. The process 800 may be used to select training examples to accelerate the machine learning process by selecting examples for which the model is least certain. The process 800 may further be used to select optimal sizes of training example batches so as to avoid user downtime.” [Col. 23, lines 5-9] “The threshold number may be determined according to at least one of batch size, estimated time for a user to label an example, system processing speed, training dataset size, and other factors.”).
Regarding claim 4, Ghesu and Erenrich teach the system of claim 3, and Ghesu further teaches comprising a counter configured to track, per class, the size of a subset of the deployment experience set ([Section 4.1.5] “Performance is reported as a triple of values [AUC; F1-score (for the positive class); F1-score (for the negative class)].” In Figs. 4 and 7, Ghesu shows results of retraining the network (performance on the y-axis) with different criterion (coverage on the x-axis). This would require that the amount of samples per each subclass can be counted. In Section 3.3, Ghesu also teaches indexing training samples using the variable k.).
Regarding claim 5, Ghesu and Erenrich teach the system of claim 3, and Ghesu further teaches wherein the criterion is relaxed so as to increase the number of future input images indicatable as useful for retraining (In Section 4.1.5, Ghesu discusses how changing the criterion to change the amount of images considered useful affects the robustness of the model. Table 2 shows changes in ROC-AUC as the criterion is changed. Fig. 3 shows an increase in F1-scores as the criterion is relaxed.).
Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over over Ghesu (Quantifying and leveraging predictive uncertainty for medical image assessment. arXiv:2007.04258v1) in view of Kim (US 2020/0034661 A1), and further in view of Porikli et al. (US 8,140,450 B2), hereafter Porikli.
Regarding claim 7, Ghesu teaches the system of claim 1, but neither Ghesu or Kim teach wherein the system is configured to identify from a plurality of not useful images with similar classification result score and uncertainty value a new class, not among the pre-defined classes.
However, Porikli teaches wherein the system is configured to identify from a plurality of not useful images with similar classification result score and uncertainty value a new class ([Col. 5, lines 15-22] “FIG. 1 shows our method. At each iteration 162 of active learning, we estimate 110 class membership probabilities 111 for all data (vectors) in the active pool 100. Data 132 with a largest estimated value of discrete entropy or uncertainty 111, i.e., lowest probability, are selected 130 to be labeled 140 by the user 141. After user labels 132 are obtained, the corresponding labeled training data are used to update and train 145 the classifier 160.”), not among the pre-defined classes ([Col. 2, lines 27-31] “The method can be extremely important in real-world problems, where the number of classes might not be known beforehand. Therefore, the number of classes can increase with time, because the method does not rely on the number of classes.”).
Ghesu and Porikli are analogous in the art to the claimed invention, because both teach methods of determining the amount of uncertainty associated with a classifier prediction for a sample and using the uncertainty to determine if a sample is useful for training the classifier. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ghesu’s method by re-labeling the samples which are not useful for training the classifier. This modification would allow for only data with high uncertainty to be manually checked to ensure that labels are correct (Porikli [Col. 2, lines 19-23] “The method uses uncertainty sampling, wherein only unlabeled data that are hardest to classify are selected. Unlike most conventional methods, the present method is computationally efficient and can easily handle a large number of classes.”). Additionally, this method would expand upon Ghesu’s method by applying it to classifiers which classify data into more than 2 classes (Porikli [Col. 2, lines 4-7] “A confidence based active learning method uses a conditional error as a metric of uncertainty of each example. However, that method only focuses only on the binary classification problem.)”.
Regarding claim 8, Ghesu, Kim, and Porikli teach the system of claim 7, and Kim further teaches where the identifying is based on an n- dimensional, outlier analysis of the of the uncertainty value, where n is greater or equal than 2, wherein the uncertainty value is based on a latent space of the machine learning model associated with the different classes of the model (Fig. 7 shows identifying data that is most near class boundaries, such as the data in 725 and 724. Kim teaches identifying data samples closest to class boundaries, which has higher uncertainty [Fig. 7, 0210-0222], and determining the uncertainty of data to compare to criterion (first and second reference values) for determining if data is useful for retraining [Fig. 9, 0224-0238]. Additionally, Kim teaches an example of determining the model confidence on a prediction for a sample based on the prediction’s distance from a class domain [Fig. 12]; thus, outliers in the data are identified as having higher uncertainty. Kim also compares confidence values to predetermined reference values to determine outliers which are outside of class domains [Fig. 12, 0254].),
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Ghesu’s invention by performing an outlier analysis to identify outliers with high uncertainty that may be not useful for retraining. This modification would further expand upon Ghesu’s teachings (Ghesu shows how uncertainty is represented graphically by data outliers in Fig. 1.) and apply them to open-set recognition models with more than one class (See Kim Fig. 7 and 0214-0216. Kim teaches that models trained on specific domains cannot grasp data outside of those domains, such as outliers. Open set-recognition models can be improved by indicating when data with high-uncertainty is outside of the known domains since models cannot provide high prediction confidence on data outside of the known domains.).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Cao et al. (US 7,512,582 B2) teaches a method where a first classifier selects a portion of data with the most uncertainty, and a second classifier is used to label that data. Similarly, the first classifier can label data that the second classifier is uncertain about.
Northcut et al. (Confident Learning: Estimating Uncertainty in Dataset Labels. Journal of Artificial Intelligence Research. 70. 1373-1411.) teaches methods of estimating the level of uncertainty in classifier predictions at labeling data, and the data samples with the highest uncertainty are filtered out for future training to improve the classifier performance.
Hond et al. (WO 2022/023697 A1) teaches a method for determining a degree of dissimilarity between received data and existing data that a model has already been trained on. The method involves using the dissimilarity to determine the expected performance of the model, and the expected performance is compared to a threshold.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC JAMES SHOEMAKER whose telephone number is (571)272-6605. The examiner can normally be reached Monday through Friday from 8am to 5pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner' s supervisor, JENNIFER MEHMOOD, can be reached at (571)272-2976. The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Eric Shoemaker/
Patent Examiner
/JENNIFER MEHMOOD/ Supervisory Patent Examiner, Art Unit 2664