DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Hogan et al. (U.S. 2022/0084636 hereinafter Hogan).
As Claim 1, Hogan teaches a method of determining presence of an analyte in a sample, comprising the steps of:
a) obtaining mass spectrometry (MS) data from a sample (Hogan (¶0073 line 1-4), “After processing, the samples were analyzed by LC/Q-TOF for metabolite discovery”);
b) extracting, by a computer, features from the MS data (Hogan (¶0075 line 5-8), “Untargeted metabolomics identified a total of 3,366 ion features. Of these, 48 ion features were removed since they showed "zero" values for all samples tested, leaving 3,318 ion features for analysis”);
c) inputting, by a computer, the features extracted in step b) into a trained prediction model, wherein the prediction model is trained to predict presence of an analyte in said sample (Hogan (¶0096 line 1-5), “As noted, ion features showing zero values through all samples tested were removed from the dataset. The remaining dataset was partitioned without normalization into a training set used to develop machine learning models”); and
d) generating an output, wherein the output comprises prediction of the presence of the analyte in said sample (Hogan (¶0056 line 7-10), “uses this learned model on new inputs (the metabolic profiles of new samples) to make predictions of new outputs (biomarker identification in new samples)”).
As Claim 2, besides claim 1, Hogan teaches wherein the features comprise statistical features (Hogan (¶0050 line 6-8), “typically features of interest are filtered after data acquisition applying different statistical methods followed by their identification”) and morphological features (Hogan (¶0055 line 1-6), “the presently contemplated methods and systems provide not only the feature importance, but also the direction of the difference (relative abundance of the differentiating compound). Furthermore, these methods and systems provide the necessary infrastructure to automate potential biomarker identification”).
As Claim 3, besides Claim 2, Hogan teaches wherein the statistical features comprise Peak_Max, Peak_Area, Peak_Ratio, and/or Peak_Shift (Hogan (¶0090 last 4 lines), “Data were directly exported from Progenesis for machine learning analysis using peak area filters of 0; 5,000; 10,000 and 20,000 relative abundance values”).
As Claim 4, besides Claim 2, Hogan teaches wherein the morphological features comprise: updown-difference, similarity, jaggedness, modality, symmetry, and/or FWHM (Hogan (¶0055 line 1-6), “the presently contemplated methods and systems provide not only the feature importance, but also the direction of the difference (relative abundance of the differentiating compound). Furthermore, these methods and systems provide the necessary infrastructure to automate potential biomarker identification”).
As Claim 5, besides Claim 4, Hogan teaches wherein the morphological features are extracted using normalized MS data (Hogan (¶0050 last 5 lines), “Metabolomics platforms generate a large amount of data that is also complex, therefore highlighting the need for appropriate data processing tools that allow the uniform and normalized preparation of chromatographic and spectral data for data analysis”).
As Claim 6, besides Claim 1, Hogan teaches wherein in step d) the output further comprises feature importance (Hogan (¶0055 line 1-6), “the presently contemplated methods and systems provide not only the feature importance, but also the direction of the difference (relative abundance of the differentiating compound). Furthermore, these methods and systems provide the necessary infrastructure to automate potential biomarker identification”).
As Claim 7, besides Claim 6, Hogan teaches wherein the feature importance is obtained by calculating a Shapley Additive exPlanation (SHAP) value for each extracted feature (Hogan (¶0060 line 1-3), “The Shapley Additive exPlanations (SHAP) method was often used to quantify an impact of features on the models”), and sorting the features by the SHAP value (Hogan (¶0061 line 4-5), “The top k features with highest overall importance to the machine learning models were used”).
As Claim 8, Hogan teaches a method of building a machine learning pipeline, comprising the steps of:
a) extracting features from mass spectrometry (MS) or liquid-chromatography mass spectrometry (LC-MS) data regarding presence of an analyte (Hogan (¶0075 line 5-8), “Untargeted metabolomics identified a total of 3,366 ion features. Of these, 48 ion features were removed since they showed "zero" values for all samples tested, leaving 3,318 ion features for analysis”);
b) constructing, by one or more computing devices that implement a machine learning program, two or more machine learning models using an active learning workflow (Hogan (¶0097 line 1-5), “All models were developed on the training set, and their final performance reported on the holdout test set and/or the prospective cohort. Within the training set, cross-validation was used to develop the models to avoid overfitting to the training set”);
c) optimizing, by the one or more computing devices, the machine learning model (Hogan (¶0097 line 1-5), “All models were developed on the training set, and their final performance reported on the holdout test set and/or the prospective cohort. Within the training set, cross-validation was used to develop the models to avoid overfitting
to the training set”); and
d) selecting, by the one or more computing devices, a best model (Hogan (¶0097 last 7 lines), “grid search was used to find the best set of hyperparameters for model training; the same hyperparameter settings were used across all k folds. The resulting k models ( one from each fold) were used to make k sets of predictions on the test set, which were then averaged using a simple mean to make the final prediction for each sample in the test set”); wherein the features in step a) comprises statistical (Hogan (¶0050 line 6-8), “typically features of interest are filtered after data acquisition applying different statistical methods followed by their identification”) and morphological features (Hogan (¶0055 line 1-6), “the presently contemplated methods and systems provide not only the feature importance, but also the direction of the difference (relative abundance of the differentiating compound). Furthermore, these methods and systems provide the necessary infrastructure to automate potential biomarker identification”).
As Claim 9, besides Claim 8, Hogan teaches wherein the active learning workflow comprises at least one of: (i) label balancing, and (ii) even score distribution (Hogan (¶0097 last 7 lines), “the resulting k models (one from each fold) were used to make k sets of predictions on the test set, which were then averaged using a simple mean to make the final prediction for each sample in the test set”).
As Claim 10, besides Claim 9, Hogan teaches wherein the label balancing comprises randomly providing positive rate of training dataset (Hogan (¶0097 line 5-8), “the training dataset was randomly partitioned into k=4 equal sized subsamples consisting of an approximately equal percentage of each class”).
As Claim 11, besides Claim 9, Hogan teaches wherein the even score distribution evaluates at least one of the following: accuracy, sensitivity, specificity, area under curve (AUC), and F1 (Hogan (¶0101 line 1-4), “The primary measure of model performance was the area under the receiver operating characteristic curve (AUC), which illustrates the diagnostic discriminative performance of the models”).
As Claim 12, besides Claim 8, Hogan teaches wherein the features comprise statistical features and morphological features, and wherein the morphological features are extracted using normalized MS data (Hogan (¶0055 line 1-6), “the presently contemplated methods and systems provide not only the feature importance, but also the direction of the difference (relative abundance of the differentiating compound). Furthermore, these methods and systems provide the necessary infrastructure to automate potential biomarker identification”).
As Claim 13, besides Claim 8, Hogan teaches wherein the machine learning model in step c) comprises training set optimization (Hogan (¶0075 line 5-8), “Untargeted metabolomics identified a total of 3,366 ion features. Of these, 48 ion features were removed since they showed "zero" values for all samples tested, leaving 3,318 ion features for analysis”).
As Claim 14, Hogan teaches a system, comprising:
a) at least one processor (Hogan (¶0025 line 2), processors);
b) a memory, storing program instructions that when executed by the at least one processor causes the at least one processor to perform a machine learning pipeline, the machine learning pipeline (Hogan (¶0025 line 3), storage medium)) is configured to perform at least one of the following modes:
i) training mode:
(A) receive mass spectrometry data of a sample (Hogan (¶0073 line 1-4), “After processing, the samples were analyzed by LC/Q-TOF for metabolite discovery”);
(B) extract at least one feature from the mass spectrometry data, wherein the at least one feature is a statistical features (Hogan (¶0050 line 6-8), “typically features of interest are filtered after data acquisition applying different statistical methods followed by their identification”) and/or morphological features (Hogan (¶0055 line 1-6), “the presently contemplated methods and systems provide not only the feature importance, but also the direction of the difference (relative abundance of the differentiating compound). Furthermore, these methods and systems provide the necessary infrastructure to automate potential biomarker identification”);
(C) optimize training dataset by active learning strategy (Hogan (¶0097 line 1-5), “All models were developed on the training set, and their final performance reported on the holdout test set and/or the prospective cohort. Within the training set, cross-validation was used to develop the models to avoid overfitting to the training set”); and
(D) select a best prediction model (Hogan (¶0097 last 7 lines), “grid search was used to find the best set of hyperparameters for model training; the same hyperparameter settings were used across all k folds. The resulting k models (one from each fold) were used to make k sets of predictions on the test set, which were then averaged using a simple mean to make the final prediction for each sample in the test set”);
ii) prediction mode:
(A) receive mass spectrometry data of a sample (Hogan (¶0056 line 7-10), “uses this learned model on new inputs (the metabolic profiles of new samples) to make predictions of new outputs (biomarker identification in new samples)”);
(B) extract at least one feature from the mass spectrometry data (Hogan (¶0094 line 4-11), “Machine learning is a class of techniques that uses data to learn a model that maps an input (the metabolic profile of a sample; includes mass-to-charge ratio (m/z) and retention time for each sample) to its associated output (the influenza infection outcome of the sample) and uses this learned model on new inputs (the metabolic profiles of new samples) to make predictions of new outputs (the influenza outcomes of new samples”), wherein the at least one feature is statistical features (Hogan (¶0050 line 6-8), “typically features of interest are filtered after data acquisition applying different statistical methods followed by their identification”) and/or morphological features (Hogan (¶0055 line 1-6), “the presently contemplated methods and systems provide not only the feature importance, but also the direction of the difference (relative abundance of the differentiating compound). Furthermore, these methods and systems provide the necessary infrastructure to automate potential biomarker identification”); and
(C) generate an output of determining whether an analyte is present in the sample (Hogan (¶0056 line 7-10), “uses this learned model on new inputs (the metabolic profiles of new samples) to make predictions of new outputs (biomarker identification in new samples)”).
As Claim 15-20, the Claim is rejected for the same reasons as Claim 2-7, respectively.
As Claim 21, the Claim is rejected for the same reasons as Claim 14.
As Claim 22, the Claim is rejected for the same reasons as Claims 15-17.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Farkas et al. (U.S. 2019/0293620) teaches a method to train spectral analysis machine learning model.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NHAT HUY T NGUYEN whose telephone number is (571)270-7333. The examiner can normally be reached M-F: 12:00-8:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at 571-270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NHAT HUY T NGUYEN/Primary Examiner, Art Unit 2147