Last updated: May 29, 2026
Application No. 15/931,022
MODEL-BASED FEATURIZATION AND CLASSIFICATION

Non-Final OA §101§102§103
Filed
May 13, 2020
Priority
May 13, 2019 — provisional 62/847,223 +2 more
Examiner
KRIANGCHAIVECH, KETTIP
Art Unit
1686
Tech Center
1600 — Biotechnology & Organic Chemistry
Assignee
Grail, Inc.
OA Round
5 (Non-Final)
Interview Optional

— +32.8% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 21% grant rate with +32.8% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 48 resolved cases, 2023–2026
Examiner Intelligence

KRIANGCHAIVECH, KETTIP View full profile →
Grants only 21% of cases
Career Allowance Rate
10 granted / 48 resolved
-39.2% vs TC avg
Strong +33% interview lift
Without
With
+32.8%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
16 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
27.2%
-12.8% vs TC avg
§103
48.6%
+8.6% vs TC avg
§102
7.4%
-32.6% vs TC avg
§112
0.8%
-39.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 48 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.   Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.   Applicant's submission filed on 12/01/2025 has been entered.

Applicant's response, filed on 12/01/2025, has been fully considered.  The following rejections and/or objections are either reiterated or newly applied.  They constitute the complete set presently being applied to the instant application.

Status of claims
Canceled:
2-3, 6, 8-9, 15, 18-20, 22, 26-27, 29, 32-33, 35-39, 41-46, 48-92, 93-138, 140-216
Pending:
1,4-5,7,10-14,16-17,21,23-25,28,30-31,34,40,47,139, 217
New:
217
Amended:
none
Withdrawn:
none
Examined:
1,4-5,7,10-14,16-17,21,23-25,28,30-31,34,40,47,139, 217
Independent:
1, 47, 139, 217
Allowable:
none



Priority
As detailed on the 10/22/2020 filing receipt, this application claims priority to as early as 05/13/2019.  

Drawings
The drawings filed May 13, 2020 are accepted.

Withdrawn Objections/Rejections
The rejection of claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25, 28, 30-31, 34, 40, 47, 93 and 139 under 35 U.S.C. §112(a), in the Office action mailed 05/30/2025 is withdrawn in view of the Remarks filed 12/01/2025.
The rejection of claims 1, 4-5, 7, 10-14, 16, 21, 23-25, 28, 30-31, 34 and 40 under 35 U.S.C. §103 over Zhang and Li, in the Office action mailed 05/30/2025 is withdrawn in view of the Remarks filed 12/01/2025. However, a new rejection is applied.
The rejection of claim 17 under 35 U.S.C. §103 over Zhang and Li, in view of Zheng, in the Office action mailed 05/30/2025 is withdrawn in view of the Remarks filed 12/01/2025. However, a new rejection is applied.
The rejection of claims 47 and 93 under 35 U.S.C. §103 over Zhang and Li, in the Office action mailed 05/30/2025 is withdrawn in view of the Remarks filed 12/01/2025. However, a new rejection is applied.
The rejection of claim 139 under 35 U.S.C. §103 over Zhang and Li, in the Office action mailed 05/30/2025 is withdrawn in view of the Remarks filed 12/01/2025. However, a new rejection is applied.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C.  102 and 103 (or as subject to pre-AIA  35 U.S.C.  102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C.  103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C.  103 are summarized as follows:
1.  Determining the scope and contents of the prior art.
2.  Ascertaining the differences between the prior art and the claims at issue.
3.  Resolving the level of ordinary skill in the pertinent art.
4.  Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1, 4-5, 7, 10-14, 16, 21, 23-25, 28, 30-31, 34 and 40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (U.S.  Patent No 2018 / 0341745 A1, published Nov.  29, 2018; cited on the 05/11/2023 “Notice of References Cited” form 892) in view of Li (CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic acids research 46.15 (2018): e89-e89, published 2018; cited on the 05/11/2023 “Notice of References Cited” form 892) and Xia ("A convolutional neural network based ensemble method for cancer prediction using DNA methylation data." Proceedings of the 2019 11th International Conference on Machine Learning and Computing. Feb. 2019.; cited on the 05/29/2025 IDS Document). 

Regarding claim 1, Zhang teaches in claim 27 A method of diagnosing a cancer in an individual in need thereof, comprising: a) processing an extracted genomic DNA with a deaminating agent to generate a treated genomic DNA comprising deaminated nucleotides, wherein the extracted genomic DNA is obtained from a biological sample from the individual; b) generating a methylation profile of one or more biomarkers selected from Table 58 from the treated genomic DNA; and c) diagnosing whether the individual has a cancer by comparing the methylation profile to a reference CpG methylation profile obtained from a cancer CpG methylation profile database, wherein a correlation between the methylation profile and the reference CpG methylation profile determines the presence of cancer in the individual.
Zhang further teaches in claim 28 The method of claim 27, wherein the reference CpG methylation profile obtained from the cancer CpG methylation profile database is generated by the steps of: a) generating CpG methylation data from a set of biological samples by a sequencing method, wherein the set comprises a first cancerous biological sample, a second cancerous biological sample, a third cancerous biological sample, a first normal biological sample, a second normal biological sample, and a third normal biological sample; wherein the first, second, and third cancerous biological samples are different; and wherein the first, second, and third normal biological samples are different.
According to Zhang, "....data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model.  A “known sample” is a sample that has been pre-classified, such as, for example, a suitable control (e.g., biomarkers) from a non-diseased or non-cancer “normal” sample and/or suitable control (e.g., biomarkers from a known tumor tissue type or stage, or cancer status" (Para.  [0276]).  This corresponds to the claim limitation of training, using plurality of reference sequence reads.
The disease state is interpreted to be equivalent to being normal, cancerous, or cancer type.
The recited "sequence reads" reads on Zhang's CpG methylation data.
The recited "reference sequence reads" reads on Zhang's data that are generated using samples such as “known samples” or “control” or reference CpG methylation profile
Therefore, the teachings of Zhang correspond to the claim limitation of generating a first plurality of reference sequence reads from a first reference sample, the first reference sample from first subject having a first disease state, wherein the first disease state is cancer; generating a second plurality of reference sequence reads from a second reference sample, the second sample from second subject having a second disease state, wherein the second disease state is non-cancer.
	Zhang teaches a number of methods are utilized to measure, detect, determine, identify, and characterize the methylation status/level of a biomarker (i.e., a region/fragment of DNA or a region/fragment of genome DNA (e.g., CpG island-containing region/fragment)) (Para.  [0179]).  Zhang also teaches a panel comprises 1000 or more biomarkers (Para.  [0159]), which would correspond to the claim limitation of reference sequence reads from over 1,000 nucleic acid fragments.
Zhang further teaches in some instances, a difference within each said pair of dataset is calculated and the differences are then input into the machine learning/classification program (103) (Paragraph [0150]).  In some cases, a pair-wise methylation difference dataset from the first, second, and third pair of datasets is generated and then analyzed in the presence of a control dataset or a training dataset (104) by the machine learning/classification method (103) to generate the cancer CpG methylation profile database (105) (Paragraph [0150]).  In some cases, the machine learning method comprises identifying a plurality of markers and a plurality of weights based on a top score (e.g., a t-test value, a β test value), and classifying the samples based on the plurality of markers and the plurality of weights (Paragraph [0150]).  In some cases, the cancer CpG methylation profile database (105) comprises a set of CpG methylation profiles and each CpG methylation profile represents a cancer type (Paragraph [0150]).  Zhang also discusses the identification of a cancer type specific signature was achieved by comparing a pair-wise methylation difference between a particular cancer type versus its surrounding normal tissue, difference between two different cancer types, as well as difference between two different normal tissues (Paragraph [0366]).  All of 485,000 CpG methylation sites were investigated in a training cohort of 1100 tumor samples and 231 matched adjacent-normal tissue samples (Paragraph [0366]).  Zhang describes 2 models that corresponds to the first probabilistic model as the machine learning/classification method that generates the cancer CpG methylation profile database (Paragraph [0150]) and the second probabilistic model as the machine learning model that identifies a plurality of markers and a plurality of weights based on a top score (e.g., a t-test value, a β test value), and classifying the samples based on the plurality of markers and the plurality of weights (Paragraph [0150]).  
Zhang further discusses that the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model (Para.  [0095]).  The recited "probabilistic model" corresponds to Zhang's classification model, such as the logistic regression model.
Zhang's teaching corresponds to the claim limitation of training, using the first plurality of reference sequence reads, a first probabilistic model, the first probabilistic model associated with the first disease state and training, using the second plurality of reference sequence reads, a second probabilistic model, the second probabilistic model associated with a second disease state.
Zhang further teaches in particular embodiments, the biomarker panels of the present invention may show a statistical difference in different cancer statuses of at least p<0.05, p<10−2, p<10−3, p<104 or p<10−5 (Paragraph [0239]).  Diagnostic tests that use these biomarkers may show an ROC of at least 0.6, at least about 0.7, at least about 0.8, or at least about 0.9 (Paragraph [0239]).  The biomarkers are differentially methylated in unaffected individual (or a normal control individual) and cancer, and the biomarkers for each cancer type are differentially methylated, and, therefore, are useful in aiding in the determination of cancer status.  In certain embodiments, the biomarkers are measured in a patient sample using the methods described herein and compared, for example, to predefined biomarker levels and correlated to cancer status.  In other embodiments, the correlation of a combination of biomarkers in a patient sample is compared, for example, to a predefined biomarker panel (Paragraph [0239]).  In yet another embodiment, the methylation profile of one or more genes in a patient sample are compared to the methylation profile of genes identified differentially methylated correlated to a tumor type or state or cancer status (Paragraph [0239]).  In particular embodiments, the measurement(s) may then be compared with a relevant diagnostic amount(s), cut-off(s), or multivariate model scores that distinguish a positive cancer status from a negative cancer status (Paragraph [0239]).  The diagnostic amount(s) represents a measured amount of epigenetic biomarker(s) above which or below which a patient is classified as having a particular cancer status.  As is well understood in the art, by adjusting the particular diagnostic cut-off(s) used in an assay, one can increase sensitivity or specificity of the diagnostic assay depending on the preference of the diagnostician (Paragraph [0239]).  In particular embodiments, the particular diagnostic cut-off can be determined, for example, by measuring the amount of biomarker hypermethylation or hypomethylation in a statistically significant number of samples from patients with the different cancer statuses, and drawing the cut-off to suit the desired levels of specificity and sensitivity (Paragraph [0239]).  
The recited "probability values" reads on Zhang's statistically significant values utilized to determine cancer status, such as a positive cancer status or a negative cancer status.  
	Zhang teaches Various methodologies described herein include a step that involves comparing a value, level, feature, characteristic, property, etc.  to a suitable control, referred to interchangeably herein as an appropriate control, a control sample, or as a control (Paragraph [0177]).  In some embodiments, a control is a value, level, feature, characteristic, property, etc., determined in a cell, a tissue, an organ, or a sample obtained from a patient.  In some instances, the cell, tissue, organ, or sample is a normal cell, tissue, organ, or sample.  In some cases, the cell tissue, organ, or sample is a cancerous cell, tissue, organ, or sample (Paragraph [0177]).  For example, the biomarkers of the present invention is assayed for their methylation level in a sample from an unaffected individual or a normal control individual, or the subject's unaffected family member (Paragraph [0177]).  In another embodiment, a control is a value, level, feature, characteristic, property, etc. determined prior to initiating a therapy (e.g., a cancer treatment) on a patient, or in between a therapeutic regimen (Paragraph [0177]).  In further embodiments, a control is a predefined value, level, feature, characteristic, property, etc.  (Paragraph [0177]).  This corresponds to the claim limitation of identifying one or more features by comparing the first probability value and the second probability value for each sequence read.  
Zhang also teaches "...the pattern recognition method comprises a linear combination of methylation levels, or a nonlinear combination of methylation levels to extract the probability that a biological sample is from a patient who exhibits no evidence of disease, who exhibits systemic cancer, or who exhibits biochemical recurrence, as well as to distinguish these disease states and types, particularly the primary tumor type" (Para, [0274]), which also corresponds to the claim limitation of comparing probability values.
Therefore, Zhang teaches the claim limitation of applying the sequence read to the first probabilistic model to determine a first probability value, the first probability value being a probability that the sequence read originated from a sample associated with the first disease state and applying the sequence read to the second probabilistic model to determine a second probability value, the second probability value being a probability that the sequence read originated from a sample associated with the second disease state.
	Zhang teaches the recited training samples include a first subset of training samples obtained from subjects diagnosed with cancer and a second subset of training samples obtained from subjects not diagnosed with cancer at least with Zhang's Claim 28.  Zhang's Claim 28 step a includes a) generating CpG methylation data from a set of biological samples by a sequencing method, wherein the set comprises a first cancerous biological sample, a second cancerous biological sample, a third cancerous biological sample, a first normal biological sample, a second normal biological sample, and a third normal biological sample..." and with "In some embodiments, data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model." (Para. [0276]). 
Zhang teaches the recited training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a likelihood of presence of cancer in an individual based on a feature set for a test sample of the individual at least with "Once trained, the classification model recognizes patterns in data generated using unknown samples. In some instances, the classification model is then used to classify the unknown samples into classes. This is useful, for example, in predicting whether or not a particular biological sample is associated with a certain biological condition (e.g., diseased versus non-diseased)." (Para. [0276]).

Zhang teaches the recited each of a plurality of training samples different from the first reference sample and the second reference sample with “The samples were divided into five equal parts and 4 of the parts were used for training and the fifth part was used to test the results.” (para. [0379]).
Zhang teaches the claim limitation of generating the first reference sample from a first subject having a first disease state, wherein the first disease state is cancer with “(i) a first pair of CpG methylation datasets generated from the first cancerous biological sample and the first normal biological sample, wherein CpG methylation data generated from the first cancerous biological sample form a first dataset within the first pair of datasets, CpG methylation data generated from the first normal biological sample form a second dataset within the first pair of datasets, and the first cancerous biological sample and the first normal biological sample are from the same biological sample source” (para. [0006]). Zhang’s teaching of “the first cancerous biological sample and the first normal biological sample are from the same biological sample source” corresponds to the recited “first subject”.
Zhang teaches predict a likelihood that a given sequence read originates from one subject having the first disease state with “(2) analyze the pair-wise methylation difference dataset with a control dataset by a machine learning method to generate the cancer CpG methylation profile database, wherein (i) the machine learning method comprises: identifying a plurality of markers and a plurality of weights based on a top score, and classifying the samples based on the plurality of markers and the plurality of weights; and (ii) the cancer CpG methylation profile database comprises a set of CpG methylation profiles and each CpG methylation profile represents a cancer type.” (para. [0006]).
Zhang teaches the claim limitation of generating the second reference sample from a second subject having a second disease state, wherein the second disease state is non-cancer with “a. obtaining a fourth pair of CpG methylation datasets, with the first processor, generated from a fourth cancerous biological sample and a fourth normal biological sample, wherein CpG methylation data generated from the fourth cancerous biological sample form a seventh dataset within the fourth pair of datasets, CpG methylation data generated from the first normal biological sample form an eighth dataset within the fourth pair of datasets, and the fourth cancerous biological sample and the fourth normal biological sample are from the same biological sample source” (para. [0051]). Zhang’s teaching of “the fourth cancerous biological sample and the fourth normal biological sample are from the same biological sample source” corresponds to the recited “second subject”.
Zhang teaches the claim limitation of predict a likelihood that a given sequence read originates from one subject having the second disease state with “e. analyzing the second pair-wise methylation difference dataset with the cancer CpG methylation profile database described above, wherein a correlation between the second pair-wise methylation difference dataset and a CpG methylation profile within the cancer CpG methylation profile database determines a cancer type of the individual.” (para. [0051]).

Zhang does not explicitly teach two models for detecting a first and a second disease state. Zhang teaches a machine learning model that is trained using data that are generated from known samples.  According to Zhang, "....data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model.  A “known sample” is a sample that has been pre-classified, such as, for example, a suitable control (e.g., biomarkers) from a non-diseased or non-cancer “normal” sample and/or suitable control (e.g., biomarkers from a known tumor tissue type or stage, or cancer status" (Para.  [0276]).  Zhang further discusses that the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model (Para.  [0095]).  Therefore, the recited "reference sequence reads" corresponds to Zhang's data that are generated using samples such as “known samples” or “control”.  While the recited "probabilistic model" corresponds to Zhang's classification model, such as the logistic regression model because the model taught by Zhang uses statistical techniques for the prediction of cancer.  Zhang’s classification model corresponds to the recited probabilistic model and a machine learning classifier. Zhang's teaching corresponds to the claim limitation of generating a first plurality of reference sequence reads and training a first probabilistic model associated with the first disease state.  Although Zhang does not explicitly teach two models for detecting a first and second disease state, it would have been obvious to repeat the procedure and model taught by Zhang with a second set of data for the purpose of predicting a second disease state of non-cancer.  However, Li teaches two probabilistic models for predicting a likelihood that a given sequence read originates from one subject having the first or second disease state of non-cancer and cancer with Figure 3 (page 4). Figure 3 depicts the likelihood of the sequence read being tumor or normal. Li teaches calculating the class-specific likelihood of each cfDNA sequencing read (Page 4, col. 1, para. 3) and then predicting tumor-derived cfDNA fraction (Page 4, col. 2, para. 2). Li also teaches in Figure 2, a probabilistic framework to infer the tumor-derived cfDNA fraction (i.e. tumor fraction), denoted as 0 ≤ θ < 1, by classifying cfDNA reads into two classes (class T for tumor-derived DNAs and class N for normal plasma cfDNAs), based on a set of markers associated with the methylation patterns of two classes. (Page 4, col. 2, para. 2). This meets that claim limitation of a first and second probabilistic model and then predicting the likelihood of the presence of cancer in an individual. 

Zhang and Li does not explicitly teach applying the first probabilistic model to the training sequence read to determine a first probability value, the first probability value being a probability that the training sequence read originated from a sample associated with the first disease state, and applying the second probabilistic model to the training sequence read to determine a second probability value, the second probability value being a probability that the training sequence read originated from a sample associated with the second disease state; and for each training sample, generating a feature set of one or more features for the training sample by comparing the first probability value and the second probability value for each training sequence read; and training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a likelihood of presence of cancer in an individual based on a feature set for a test sample of the individual of claim 1. However, Xia teaches this claim limitation with “In this paper, we introduce a convolutional neural network based ensemble method for cancer prediction using DNA methylation data. We first conduct t-test to choose a set of significantly differential methylation points. Then, the selected feature was feed into Naive Bayesian Classifier, k-Nearest Neighbor, Decision Tree, Random Forest and Gradient Boosting Decision Tree five basic classifiers for the first stage classification. Here we use S-fold cross validation method by dividing the whole datasets into S groups and choose S-1 groups as training sets, the left one as test sets at each time. Finally, a convolutional neural network is used to ensemble the predictions of the first stage classifiers and extract the internal relationship among different classifiers to predict a more reliable result. The flowchart of the proposed ensemble method is shown in Fig. 1.” (Page 192, col.1, section 2. Method) and Figure 1. Fig. 1 depicts a Flowchart of the proposed convolutional neural network based ensemble method (page 192).

Therefore, it would have been prima facia obvious to combine the teachings of Zhang and Li to arrive at the claimed invention.  Li demonstrated that CancerDetector provides high sensitivity and specificity in detecting tumor cfDNAs (Abstract) and Xia discussed that the use of convolutional neural network to ensemble the predictions of the first stage classifiers and extract the internal relationship among different classifiers provides for a more reliable prediction (Page 192, col.1, section 2. Method). A person of ordinary skill in the art would have been motivated to combine the method of Zhang with the method of Li to include training probabilistic models for calculating the likelihood that the sequence read is normal or tumor to better identify sequence reads that are of tumor type.  A person of ordinary skill in the art would have also been motivated to combine the method of Zhang with the method of Xia to include training a machine learning classifier with the feature sets generated from the training sample of the probabilistic models to predict a more reliable result.  Furthermore, there would have been a reasonable expectation of success because Zhang, Li and Xia are in the same field of endeavor of determining the likelihood of cancer or non-cancer.  


With respect to claim 4, Zhang teaches in Tables 44, 45, 52, 53A, and 55A 44 the training sets for various types of organs for both normal and cancerous samples.  The tables list approximately 10 different types of cancer used for training the model.  This corresponds to the claim limitation of wherein the method further comprises: generating a plurality of reference sequence reads from a third, fourth, fifth, sixth, seventh, eighth, ninth, and/or tenth reference sample, each of the third, fourth, fifth, sixth, seventh, eighth, ninth, and/or tenth reference samples having a different disease state, and wherein each of the different disease states is a different type of cancer and training, using the third, fourth, fifth, sixth, seventh, eighth, ninth, and/or tenth plurality of reference sequence reads, a third, fourth, fifth, sixth, seventh, eighth, ninth, and/or tenth probabilistic model, wherein each of the third, fourth, fifth, sixth, seventh, eighth, ninth, and/or tenth probabilistic models are each associated the different types of cancer.  

With respect to claim 5, Zhang teaches in some embodiments, the cancer type is a solid cancer type or a hematologic malignant cancer type (Paragraph [0022]).  In some embodiments, the cancer type is a metastatic cancer type or a relapsed or refractory cancer type (Paragraph [0022]).  In some embodiments, the cancer type comprises acute myeloid leukemia (LAML or AML), acute lymphoblastic leukemia (ALL), adrenocortical carcinoma (ACC), bladder urothelial cancer (BLCA), brain stem glioma, brain lower grade glioma (LGG), brain tumor, breast cancer (BRCA), bronchial tumors, Burkitt lymphoma, cancer of unknown primary site, carcinoid tumor, carcinoma of unknown primary site, central nervous system atypical teratoid/rhabdoid tumor, central nervous system embryonal tumors, cervical squamous cell carcinoma, endocervical adenocarcinoma (CESC) cancer, childhood cancers, cholangiocarcinoma (CHOL), chordoma, chronic lymphocytic leukemia, chronic myelogenous leukemia, chronic myeloproliferative disorders, colon (adenocarcinoma) cancer (COAD), colorectal cancer, craniopharyngioma, cutaneous T-cell lymphoma, endocrine pancreas islet cell tumors, endometrial cancer, ependymoblastoma, ependymoma, esophageal cancer (ESCA), esthesioneuroblastoma, Ewing sarcoma, extracranial germ cell tumor, extragonadal germ cell tumor, extrahepatic bile duct cancer, gallbladder cancer, gastric (stomach) cancer, gastrointestinal carcinoid tumor, gastrointestinal stromal cell tumor, gastrointestinal stromal tumor (GIST), gestational trophoblastic tumor, glioblastoma multiforme glioma GBM), hairy cell leukemia, head and neck cancer (HNSD), heart cancer, Hodgkin lymphoma, hypopharyngeal cancer, intraocular melanoma, islet cell tumors, Kaposi sarcoma, kidney cancer, Langerhans cell histiocytosis, laryngeal cancer, lip cancer, liver cancer, Lymphoid Neoplasm Diffuse Large B-cell Lymphoma [DLBCL), malignant fibrous histiocytoma bone cancer, medulloblastoma, medulla epithelioma, melanoma, Merkel cell carcinoma, Merkel cell skin carcinoma, mesothelioma (MESO), metastatic squamous neck cancer with occult primary, mouth cancer, multiple endocrine neoplasia syndromes, multiple myeloma, multiple myeloma/plasma cell neoplasm, mycosis fungoides, myelodysplastic syndromes, myeloproliferative neoplasms, nasal cavity cancer, nasopharyngeal cancer, neuroblastoma, Non-Hodgkin lymphoma, nonmelanoma skin cancer, non-small cell lung cancer, oral cancer, oral cavity cancer, oropharyngeal cancer, osteosarcoma, other brain and spinal cord tumors, ovarian cancer, ovarian epithelial cancer, ovarian germ cell tumor, ovarian low malignant potential tumor, pancreatic cancer, papillomatosis, paranasal sinus cancer, parathyroid cancer, pelvic cancer, penile cancer, pharyngeal cancer, pheochromocytoma and paraganglioma (PCPG), pineal parenchymal tumors of intermediate differentiation, pineoblastoma, pituitary tumor, plasma cell neoplasm/multiple myeloma, pleuropulmonary blastoma, primary central nervous system (CNS) lymphoma, primary hepatocellular liver cancer, prostate cancer such as prostate adenocarcinoma (PRAD), rectal cancer, renal cancer, renal cell (kidney) cancer, renal cell cancer, respiratory tract cancer, retinoblastoma, rhabdomyosarcoma, salivary gland cancer, sarcoma (SARC), Sezary syndrome, skin cutaneous melanoma (SKCM), small cell lung cancer, small intestine cancer, soft tissue sarcoma, squamous cell carcinoma, squamous neck cancer, stomach (gastric) cancer, supratentorial primitive neuroectodermal tumors, T-cell lymphoma, testicular cancer testicular germ cell tumors (TGCT), throat cancer, thymic carcinoma, thymoma (THYM), thyroid cancer (THCA), transitional cell cancer, transitional cell cancer of the renal pelvis and ureter, trophoblastic tumor, ureter cancer, urethral cancer, uterine cancer, uterine cancer, uveal melanoma (UVM), vaginal cancer, vulvar cancer, Waldenstrom macroglobulinemia, or Wilm's tumor (Paragraph [0022]).  This corresponds to the claim limitation of wherein the first disease state is selected from the group including breast cancer, uterine cancer, cervical cancer, ovarian cancer, bladder cancer, urothelial cancer of renal pelvis and ureter, renal cancer other than urothelial, prostate cancer, anorectal cancer, colorectal cancer, squamous cell cancer of esophagus, esophageal cancer other than squamous, gastric cancer, hepatobiliary cancer arising from hepatocytes, hepatobiliary cancer arising from cells other than hepatocytes, pancreatic cancer, human-papillomavirus-associated head and neck cancer, head and neck cancer not associated with human papillomavirus, lung adenocarcinoma, small cell lung cancer, squamous cell lung cancer and lung cancer other than adenocarcinoma or small cell lung cancer, neuroendocrine cancer, melanoma, thyroid cancer, sarcoma, multiple myeloma, lymphoma, and leukemia.  

With respect to claim 7, Zhang teaches in Table 44 the training and testing sets for various types of organs for both normal and cancerous samples.  This corresponds to the claim limitation of wherein the first disease state comprises a first tissue of origin and the second disease state comprises a second tissue of origin.  

With respect to claim 10, Zhang further teaches in some embodiments, the machine learning method comprises a semi-supervised learning method or an unsupervised learning method (Paragraph [0045]).  In some embodiments, the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model (Paragraph [0045]).  This corresponds to the claim limitation of wherein the first probabilistic model or second probabilistic model is a constant model, a binomial model, an independent site model, a neural net model, or a Markov model.  
With respect to claim 11, Zhang teaches in Figure 33A and 33B the methylation percentage of colon cancer cfDNA and colon cancer pool genomic DNA and normal cfDNA and normal pool genomic DNA.  FIG.  33A-FIG.  33B show two different probes that differentiate primary colon cancer from normal sample (Paragraph [0137]).  FIG.  33A shows probe Cob-2 which targets the CpG site cg10673833 and the methylation profiles from the cfDNA samples of three colon cancer patients, normal cfDNA sample, primary colon cancer tissue reference sample (genomic DNA), and normal lymphocyte reference sample (genomic DNA) (Paragraph [0137]).  Two of the three patients (2043089 and 2042981) have primary colon cancer (Paragraph [0137]).  The remainder patient (2004651) has metastatic colon cancer (Paragraph [0137]).  FIG.  33B shows probe Brb-2 which targets the CpG site cg07974511 and the methylation profiles from the cfDNA samples of two primary colon cancer patients (2043089 and 2042981), normal cfDNA sample, primary colon cancer tissue reference sample (genomic DNA), and normal lymphocyte reference sample (genomic DNA) (Paragraph [0137]).  Zhang also teaches that in some embodiments, the methylation index for each genomic site (e.g., a CpG site) refers to the proportion of sequence reads showing methylation at the site over the total number of reads covering that site (Paragraph [0174]).  Zhang further teaches that in some instances, a methylation profile comprises a set of methylation index of a CpG site, a set of methylation density of CpG sites in a region, a set of distribution of CpG sites over a contiguous region, a set of pattern or level of methylation of one or more individual CpG site(s) within a region that contains more than one CpG site, a set of absent CpG methylation, a set of non-CpG methylation, or a combination thereof (Paragraph [0165]).  The methylation percentage and methylation index correspond to rate of methylation.  The incorporation of the methylation index into the methylation profile that is used in the machine learning model is equivalent to the first probabilistic model or second probabilistic model is parameterized by products of the rates of methylation.  This corresponds to the claim limitation of determining rates of methylation for each of a plurality of CpG sites within the first plurality of reference sequence reads or second plurality of reference sequence reads, wherein the first probabilistic model or second probabilistic model is parameterized by products of the rates of methylation.  

With respect to claim 12, Zhang teaches in one embodiment, the correlated results for each methylation panel are rated by their correlation to the disease or tumor type positive state, such as for example, by p-value test or t-value test or F-test.  Rated (best first, i.e.  low p- or t-value) markers are then subsequently selected and added to the methylation panel until a certain diagnostic value is reached (Paragraph [0232]).  Zhang also teaches that other methods include the step of specifying a significance level to be used for determining the biomarkers that will be included in the biomarker panel (Paragraph [0232]).  Biomarkers that are differentially methylated between the classes at a univariate parametric significance level less than the specified threshold are included in the panel (Paragraph [0232]).  Zhang also teaches in FIG.  12 illustrates heat map comparing differential expression of hyper-methylated genes in either breast cancer or liver cancer compared with matched normal tissue (Paragraph [0116]).  Zhang further teaches in FIG.  13A-FIG.  13C illustrate RNA-seq data from TCGA as a discovery cohort to calculate the differential expression of hypermethylated genes in either breast cancer or liver cancer compared with matched normal tissue (Paragraph [0117]).  In some instances, hypermethylation is the average methylation state corresponding to an increased presence of 5-mCyt at one or a plurality of CpG dinucleotides within a DNA sequence of a test DNA sample, relative to the amount of 5-mCyt found at corresponding CpG dinucleotides within a normal control DNA sample (Paragraph [0173]), which corresponds to the threshold level.  In some cases, hypomethylation is the average methylation state corresponding to a decreased presence of 5-mCyt at one or a plurality of CpG dinucleotides within a DNA sequence of a test DNA sample, relative to the amount of 5-mCyt found at corresponding CpG dinucleotides within a normal control DNA sample (Paragraph [0173]).  This corresponds to the claim limitation of determining for each sequence read of the first plurality of reference sequence reads, the second plurality of sequence reads, or the plurality of training sequence reads, whether the sequence read is hypomethylated or hypermethylated by determining whether at least a threshold number of CpG sites with at least a threshold percentage of the CpG sites are unmethylated or are methylated, respectively.  

With respect to claim 13, Zhang teaches that Markers were ranked with the lowest p values by t-statistic and the largest difference in a mean methylation fraction between each comparison and the top ten markers in each group were selected for further validation analysis (Paragraph [0401]).  This corresponds to the claim limitation of determining for each sequence read of the first plurality of reference sequence reads, the second plurality of sequence reads, or the plurality of training sequence reads, whether the sequence read is anomalous methylated; and filtering the first plurality of reference sequence reads with p-value filtering by removing sequence reads from the first plurality of reference sequence reads having below a threshold p-value.  

With respect to claim 28, Zhang teaches in FIG.  34 shows the analysis of cfDNA from breast cancer patients (Paragraph [0510]).  Four probes were used (Brb-3, Brb-4, Brb-8, and Brb-13 (Paragraph [0510])).  The methylation ratio of cfDNA primary breast cancer was compared to normal cfDNA sample, primary breast cancer tissue reference sample (genomic DNA), and normal lymphocyte reference sample (genomic DNA) (Paragraph [0510]).  All four probes were able to detect the presence of breast cancer in cfDNA samples (Paragraph [0510]).  The ability to detect the presence of breast cancer corresponds to reaching or exceeding threshold value.  This corresponds to the claim limitation of wherein comparing the first probability value and the second probability value comprises determining a ratio of the first probability value and the second probability value, and wherein the set of one or more features comprise sequence read counts of sequence reads that exceed a ratio threshold value.  

With respect to claim 31, Zhang teaches in some embodiments, a number of methods are utilized to measure, detect, determine, identify, and characterize the methylation status/level of a biomarker (i.e., a region/fragment of DNA or a region/fragment of genome DNA (e.g., CpG island-containing region/fragment)) in the development of a disease or condition (e.g., cancer) and thus diagnose the onset, presence or status of the disease or condition (Paragraph [0179]).  Zhang teaches that FIG.  14 shows graphs that illustrate methylation patterns correlate with gene expression profiles and cancer behaviors.  The mRNA expression of differentially methylated genes in breast cancer and liver cancer was determined using qPCR.  The mRNA expression in tumor samples was normalized to expression in nearby normal tissue derived from the same patient (Paragraph [0118]).  Zhang also teaches Quantitative MethyLight uses bisulfite to convert genomic DNA and the methylated sites are amplified using PCR with methylation independent primers (Paragraph [0192]).  Detection probes specific for the methylated and unmethylated sites with two different fluorophores provides simultaneous quantitative measurement of the methylation (Paragraph [0192]).  Zhang teachings correspond to the claim limitation of determining, for each feature of the set of one or more features, a measure of the feature in distinguishing between the first disease state and the second disease state.  

	Regarding claim 34, Zhang teaches the recited training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a presence or absence of a disease, a disease type, and/or a disease tissue of origin at least with "Once trained, the classification model recognizes patterns in data generated using unknown samples. In some instances, the classification model is then used to classify the unknown samples into classes. This is useful, for example, in predicting whether or not a particular biological sample is associated with a certain biological condition (e.g., diseased versus non-diseased)." (Para. [0276]).
Zhang teaches in claims 28(f)(1) and (2), and 30-32, a method of classifying cancer type by machine learning.  
(1) the machine learning method comprises: identifying a plurality of makers and a plurality of weights based on a top score, and classifying the samples based on the plurality of markers and the plurality of weights; and
(2) the cancer CpG methylation profile database comprises a set of CpG methylation profiles and each CpG methylation profile represents a cancer type.
30.  The method of claim 28, wherein the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model.
31.  The method of claim 28, wherein the CpG methylation data is generated from an extracted genomic DNA treated with a deaminating agent.
32.  The method of claim 27, wherein the comparing further comprises determining the cancer type of the individual.
Zhang also teaches that in specific embodiments, provided herein include methods for determining the risk of developing cancer in a patient (Paragraph [0263]).  Biomarker methylation percentages, amounts or patterns are characteristic of various risk states, e.g., high, medium or low (Paragraph [0263]).  The risk of developing cancer is determined by measuring the methylation status of the relevant biomarkers and then either submitting them to a classification algorithm or comparing them with a reference amount, i.e., a predefined level or pattern of methylated (and/or unmethylated) biomarkers that is associated with the particular risk level (Paragraph [0263]).  

With respect to claim 40, Zhang teaches in some instances, the methylation profile is generated from a biological sample isolated from an individual.  In some embodiments, the biological sample is a biopsy (Paragraph [0180]).  In some instances, the biological sample is a tissue sample (Paragraph [0180]).  In other instances, the biological sample is a cell-free biological sample.  In other instances, the biological sample is a circulating tumor DNA sample (Paragraph [0180]).  In one embodiment, the biological sample is a cell free biological sample containing circulating tumor DNA (Paragraph [0180]).  This corresponds to the claim limitation of wherein the first reference sample or the second reference sample is a cell free nucleic acid sample or a tissue nucleic acid sample from a subject having a known disease state.  

Zhang does not teach wherein the first probabilistic model or the second probabilistic model is parameterized by a sum of a plurality of mixture components each associated with a product of the rates of methylation of claim 14; wherein training the first probabilistic model or second probabilistic model comprises: determining, for the probabilistic model a set of parameters that maximizes a total log-likelihood of the first plurality of reference sequence reads or second plurality of reference sequence reads deriving from subjects associated with the first disease state or the second disease state associated with the probabilistic model of claim 16;  wherein the one or more features comprise a count of outlier sequence reads of the plurality of training sequence reads where the first probability value is greater than the second probability value of claim 21; wherein the one or more features includes a total count of outlier sequence reads of claim 23; wherein the one or more features includes a total count of anonymously methylated sequence reads of claim 24; and wherein the one or more features comprise a count of fragments including one or more particular methylation patterns of claim 25.  However, these limitations were known in the art at the time of the effective filing date of the invention, as taught by Li.

With respect to claim 14, Li teaches that the key to their method is to focus on the joint methylation states of multiple adjacent CpG sites on an individual cfDNA sequencing read, in order to exploit the pervasive nature of DNA methylation for signal amplification (Page 2, Column 1, Paragraph 2).  Traditional DNA methylation analysis focuses on the methylation rate of an individual CpG site in a cell population (Page 2, Column 1, Paragraph 2).  This rate, often called the β-value, is the proportion of cells in which the CpG site is methylated (see an example in Figure 1) (Page 2, Column 1, Paragraph 2).  However, such population-average measures are not sensitive enough to capture an abnormal methylation signal affecting only a small proportion of the cfDNAs.  Figure 1 illustrates this point: the average methylation rates of the individual CpG sites are βnormal = 1 for normal plasma cfDNAs, and βtumor = 0 for tumor cfDNAs; assuming the presence of 1% tumor cfDNAs, the traditional measure yields βmixed= 0.99, which is hard to differentiate from βnormal = 1 (Page 2, Column 1, Paragraph 2).  However, based on the pervasive nature of DNA methylation, we came up with a new way to differentiate disease specific cfDNA reads from normal cfDNA reads (Page 2, Column 1, Paragraph 2).  If we average the methylation values of all CpG sites in a given read (denoted α-value), we see a striking difference (0 and 1) between the abnormally methylated cfDNAs and the normal cfDNAs (αtumor = 0% and αnormal = 100%) (Page 2, Column 1, Paragraph 2).  In other words, given the pervasive nature of DNA methylation, the joint methylation states of multiple adjacent CpG sites may easily distinguish cancer-specific cfDNA reads from normal cfDNA reads (Page 2, Column 1, Paragraph 2).  Inspired by the α-value, we realized that the key to exploiting pervasive methylation is to estimate whether the joint probability of all CpG sites in a read follows the DNA methylation signature of a disease.  We therefore propose a novel, read-based probabilistic approach, termed ‘CancerDetector’, that can sensitively identify a trace amount of tumor cfDNAs out of all cfDNAs in plasma (Page 2, Column 1, Paragraph 2).  The abnormally methylated cfDNAs and the normal cfDNAs corresponds to the mixture components.  This corresponds to the claim limitation of wherein the first probabilistic model or the second probabilistic model is parameterized by a sum of a plurality of mixture components each associated with a product of the rates of methylation.  

With respect to claim 16, Li teaches the development a probabilistic framework to infer the tumor-derived cfDNA fraction (i.e.  tumor fraction), denoted as 0 ≤ θ < 1, by classifying cfDNA reads into two classes (class T for tumor-derived DNAs and class N for normal plasma cfDNAs), based on a set of markers associated with the methylation patterns of two classes (Page 4, Column 2, Paragraph 2).  We denote the methylation patterns of all K markers as M = {(mT 1 , mN 1 ),...,(mT k , mN k ),...,(mT K , mN K )} (Page 4, Column 2, Paragraph 2).  We also denote the methylation sequencing data of a patient’s cfDNAs as a set of N reads R = {r(1), ··· , r(N) } that in total cover M CpG sites (Page 4, Column 2, Paragraph 2).  For a read that is aligned to the region of marker k, we assume that it can come from one of two classes with the class-specific likelihood P(r|mc k), where mc k is the methylation pattern of class c.  Let θ be the tumor-derived cfDNA fraction, so the fraction of normal cfDNA is 1 − θ(Page 4, Column 2, Paragraph 2).  We want to estimate θ by maximizing the log-likelihood log P(R|θ ,M).  This is a maximum likelihood estimation problem (Page 4, Column 2, Paragraph 2).  Assuming the independence of each read (as widely adopted in literatures (25,26)), P(R|θ ,M) = N i=1 P(r(i) |θ ,M) (Page 4, Column 2, Paragraph 2).  This corresponds to the claim limitation of determining, for the probabilistic model a set of parameters that maximizes a total log- likelihood of the first plurality of reference sequence reads or second plurality of reference sequence reads deriving from subjects associated with the first disease state or the second disease state associated with the probabilistic model.  

With respect to claim 21, Li estimated a global tumor fraction (θ) across all cancer-specific markers (Page 5, Column 1, Paragraph 2).  The tumor fraction (θ) can also be estimated only for a single marker (Page 5, Column 1, Paragraph 2).  Ideally, for an early-stage cancer patient, the estimated θ should be a small number (e.g., <20%), either across all markers or in individual markers.  However, in real cancer patient data, we observed a number of markers with individually estimated tumor fractions far larger than the global tumor fraction (Page 5, Column 1, Paragraph 2).  This corresponds to the first probability value is greater than the second probability value.  Therefore, cfDNA fragments harboring aberrant methylation in these ‘outlier’ markers obviously do not come from cancerous cells, but likely from normal cells (e.g.  white blood cells) due to inter-individual variance (e.g.  age, environment exposure, or other diseases the person may have) (Page 5, Column 1, Paragraph 2).  Consequently, including these ‘confounding’ markers would impair the accuracy of tumor fraction estimation (Page 5, Column 1, Paragraph 2).  We therefore design an iterative algorithm to adjust the global tumor fraction after identifying and removing ‘germline’ markers.  We denote θk as the tumor fraction at the marker k, to distinguish from the global fraction θ obtained using all markers (Page 5, Column 1, Paragraph 2).  Li teaches in the Initialization––Let M denote the set of markers used for θ estimation.  Initially, we put all markers into M (Page 5, Column 1, Paragraph 2).  In order to arrive at the set of markers a count including the total count of the outlier reads are required.  Therefore, this corresponds to the claim limitation of wherein the set of one or more features comprise a count of outlier sequence reads of the plurality of training sequence reads where the first probability value is greater than the second probability value.  

With respect to claim 23, Li estimated a global tumor fraction (θ) across all cancer-specific markers (Page 5, Column 1, Paragraph 2).  The tumor fraction (θ) can also be estimated only for a single marker (Page 5, Column 1, Paragraph 2).  Ideally, for an early-stage cancer patient, the estimated θ should be a small number (e.g., <20%), either across all markers or in individual markers.  However, in real cancer patient data, we observed a number of markers with individually estimated tumor fractions far larger than the global tumor fraction (Page 5, Column 1, Paragraph 2).  Therefore, cfDNA fragments harboring aberrant methylation in these ‘outlier’ markers obviously do not come from cancerous cells, but likely from normal cells (e.g.  white blood cells) due to inter-individual variance (e.g.  age, environment exposure, or other diseases the person may have) (Page 5, Column 1, Paragraph 2).  Consequently, including these ‘confounding’ markers would impair the accuracy of tumor fraction estimation (Page 5, Column 1, Paragraph 2).  We therefore design an iterative algorithm to adjust the global tumor fraction after identifying and removing ‘germline’ markers.  We denote θk as the tumor fraction at the marker k, to distinguish from the global fraction θ obtained using all markers (Page 5, Column 1, Paragraph 2).  Li teaches in the Initialization––Let M denote the set of markers used for θ estimation.  Initially, we put all markers into M (Page 5, Column 1, Paragraph 2).  In order to arrive at the set of markers a count including the total count of the outlier reads are required.  Therefore, this corresponds to the claim limitation of wherein the set of one or more features includes a total count of outlier sequence reads.

 	With respect to claim 24, Li teaches after the removal of PCR duplicates, the numbers of methylated and unmethylated cytosines were counted for each CpG site (Page 5, Column 2, Paragraph 4).  The methylation level of a CpG cluster is calculated as the ratio between the number of methylated cytosines and the total number of cytosines within the cluster (Page 5, Column 2, Paragraph 4).  However, if the total number of cytosines in the reads aligned to the CpG cluster is treated as NA (Not Available) (Page 5, Column 2, Paragraph 4).  This corresponds to the claim limitation of wherein the set of one or more features includes a total count of anonymously methylated sequence reads.  

With respect to claim 25, Li teaches in Figure 2 the methylation patterns of cfDNA fragments and fragment counts.  Li also teaches the development a probabilistic framework to infer the tumor-derived cfDNA fraction (i.e.  tumor fraction), denoted as 0 ≤ θ < 1, by classifying cfDNA reads into two classes (class T for tumor-derived DNAs and class N for normal plasma cfDNAs), based on a set of markers associated with the methylation patterns of two classes (Page 4, Column 2, Paragraph 2).  We denote the methylation patterns of all K markers as M = {(mT 1 , mN 1 ),...,(mT k , mN k ),...,(mT K , mN K )} (Page 4, Column 2, Paragraph 2).  We also denote the methylation sequencing data of a patient’s cfDNAs as a set of N reads R = {r(1), ··· , r(N) } that in total cover M CpG sites (Page 4, Column 2, Paragraph 2).  For a read that is aligned to the region of marker k, we assume that it can come from one of two classes with the class-specific likelihood P(r|mc k), where mc k is the methylation pattern of class c.  Let θ be the tumor-derived cfDNA fraction, so the fraction of normal cfDNA is 1 − θ(Page 4, Column 2, Paragraph 2).  We want to estimate θ by maximizing the log-likelihood log P(R|θ ,M).  This is a maximum likelihood estimation problem (Page 4, Column 2, Paragraph 2).  Assuming the independence of each read (as widely adopted in literatures), P(R|θ ,M) = N i=1 P(r(i) |θ ,M) (Page 4, Column 2, Paragraph 2).  This corresponds to the claim limitation of wherein the set of one or more features comprise a count of fragments including one or more particular methylation patterns.  

With respect to claim 30, Li also teaches the development a probabilistic framework to infer the tumor-derived cfDNA fraction (i.e.  tumor fraction), denoted as 0 ≤ θ < 1, by classifying cfDNA reads into two classes (class T for tumor-derived DNAs and class N for normal plasma cfDNAs), based on a set of markers associated with the methylation patterns of two classes (Page 4, Column 2, Paragraph 2).  We denote the methylation patterns of all K markers as M = {(mT 1 , mN 1 ),...,(mT k , mN k ),...,(mT K , mN K )} (Page 4, Column 2, Paragraph 2).  We also denote the methylation sequencing data of a patient’s cfDNAs as a set of N reads R = {r(1), ··· , r(N) } that in total cover M CpG sites (Page 4, Column 2, Paragraph 2).  For a read that is aligned to the region of marker k, we assume that it can come from one of two classes with the class-specific likelihood P(r|mc k), where mc k is the methylation pattern of class c.  Let θ be the tumor-derived cfDNA fraction, so the fraction of normal cfDNA is 1 − θ(Page 4, Column 2, Paragraph 2).  We want to estimate θ by maximizing the log-likelihood log P(R|θ ,M).  This is a maximum likelihood estimation problem (Page 4, Column 2, Paragraph 2).  Assuming the independence of each read (as widely adopted in literatures), P(R|θ ,M) = N i=1 P(r(i) |θ ,M) (Page 4, Column 2, Paragraph 2).  Li teaches θ be the tumor-derived cfDNA fraction, so the fraction of normal cfDNA is 1 – θ (Page 4, Column 2, Paragraph 2) and 0 ≤ θ < 1, which corresponds to the threshold value for determining whether cfDNA is tumor-derived DNA or normal plasma.  This corresponds to the claim limitation of determining a log-likelihood ratio of the first probability value to the second probability value; and determining, for one or more threshold values, a count of the sequence reads having a log-likelihood ratio exceeding the threshold value.  

It would have been prima facia obvious to combine the teachings of Zhang and Li to achieve the claimed invention.  Li’s method of analyzing cell-free DNA methylation sequences based on α-value is particularly advantageous when tumor fractions and sequencing coverages are low (Page 8, Column 2, Paragraph 2) and could reduce the cost of cancer detection (Page 9, Column 1, Paragraph 1).  A person of ordinary skill in the art would have been motivated to modify the method of Zhang to incorporate the method of Li when analyzing small amounts of tumor cfDNAs because Li’s method provides a robust and sensitive estimate of the tumor fraction (Page 9, Column 1, Paragraph 1) when volumes are low while also reducing costs (Page 9, Column 1, Paragraph 1).  Furthermore, there would have been a reasonable expectation of success, since both Zhang and Li teach methods that pertain to the analysis of methylated DNA sequences.
Claim(s) 17 is/are rejected under 35 U.S.C.  103 as being unpatentable over Zhang (U.S.  Patent No 2018 / 0341745 A1, published Nov.  29, 2018; cited on the 05/11/2023 “Notice of References Cited” form 892), in view of Li (CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data.  Nucleic acids research 46.15 (2018): e89-e89, published 2018; cited on the 05/11/2023 “Notice of References Cited” form 892) and Xia ("A convolutional neural network based ensemble method for cancer prediction using DNA methylation data." Proceedings of the 2019 11th International Conference on Machine Learning and Computing. Feb. 2019.; cited on the 05/29/2025 IDS Document) as applied to claims 1, 4-5, 7, 10-14, 16, 21, 23-25, 28, 30-31, 34 and 40 above and further in view of Zheng (Prediction of genome-wide DNA methylation in repetitive elements.  Nucleic acids research 45.15 (2017): 8697-8711, published 2017; cited on the 05/11/2023 “Notice of References Cited” form 892).

Zhang, Li and Xia are applied to claims 1, 4-5, 7, 10-14, 16, 21, 23-25, 28, 30-31, 34 and 40 above.

Zhang does not teach wherein the method further comprises: for each of a plurality of windows: selecting a plurality of the first plurality of reference sequence reads derived from the window and utilizing the sequence reads derived from the window to train the first probabilistic model for the window; and selecting a plurality of the second plurality of reference sequence reads derived from the window and utilizing the sequence reads to train the probabilistic model for each window of claim 17.  However, this limitation was known in the art at the time of the effective filing date of the invention, as taught by Zheng

With respect to claim 17, Zheng teaches that for a given flanking window size, we generated these predictors and trained a model to predict methylation levels of the target CpGs (Page 8699, Column 1, Paragraph 2).  Zheng also teaches in Figure 1 of determining the window used for training the machine learning model.  This corresponds to the claim limitation of for each of a plurality of windows: selecting a plurality of the first plurality of reference sequence reads derived from the window and utilizing the sequence reads derived from the window to train the first probabilistic model for the window; and selecting a plurality of the second plurality of reference sequence reads derived from the window and utilizing the sequence reads to train the probabilistic model for each window.  

Thus, it would have been obvious to combine the teachings of Zhang and Zheng to achieve the claimed invention.  Zheng’s proposed algorithm can be applied to the widely used methylation profiling platforms and extend RE CpG coverage in a highly cost-effective manner and promotes genome-wide, locus-specific repetitive element (RE) methylation association analyses in large human population and clinical studies by providing extended coverage of locus-specific RE methylation (Page 8707, Column 2, Paragraph 2).  This allows for more precise investigations into the tumorigenic (and potentially other etiological) roles of RE methylation, improving the accuracy of epigenetic studies (Page 8707, Column 2, Paragraph 2).  A person of ordinary skill in the art would have been motivated to modify the method of Zhang to incorporate the method of Zheng for methylation profiling genome wide because Zheng’s method is highly cost-effective.  Furthermore, there would have been a reasonable expectation of success, since both Zhang and Zheng teach methods that pertain to the analysis of methylated DNA sequences.


Claim(s) 47 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (U.S.  Patent No 2018 / 0341745 A1, published Nov.  29, 2018; cited on the 05/11/2023 “Notice of References Cited” form 892) in view of Li (CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic acids research 46.15 (2018): e89-e89, published 2018; cited on the 05/11/2023 “Notice of References Cited” form 892) and Xia ("A convolutional neural network based ensemble method for cancer prediction using DNA methylation data." Proceedings of the 2019 11th International Conference on Machine Learning and Computing. Feb. 2019.; cited on the 05/29/2025 IDS Document). 

Regarding claim 47, Zhang teaches methods, systems, platforms, non-transitory computer-readable medium, services, and kits for determining a cancer type in an individual (Abstract).  Zhang also teaches in some embodiments, described herein is a computing system comprising a processor, a memory module, an operating system configured to execute machine readable instructions, and a computer program including instructions executable by the processor to create an analysis application for generating a cancer CpG methylation profile database (Paragraph [0025]).  Zhang also teaches in some embodiments, a computer-readable medium refers to any storage device used for storing data accessible by a computer, as well as any other means for providing access to data by a computer (Paragraph [0284]).  The client software or web browser provides a user interface for a user of the invention to input data and information and receive access to data and information (Paragraph [0285]).  Zhang also teaches in some embodiments, a computer-readable medium refers to any storage device used for storing data accessible by a computer, as well as any other means for providing access to data by a computer (Paragraph [0284]).  the client software or web browser provides a user interface for a user of the invention to input data and information and receive access to data and information (Paragraph [0285]).  This corresponds to the claim limitation of a system comprising a computer processor and a memory, the memory storing computer program instructions that when executed by the computer processor cause the processor to perform steps of claim 47. 

Zhang teaches in claim 27 A method of diagnosing a cancer in an individual in need thereof, comprising: a) processing an extracted genomic DNA with a deaminating agent to generate a treated genomic DNA comprising deaminated nucleotides, wherein the extracted genomic DNA is obtained from a biological sample from the individual; b) generating a methylation profile of one or more biomarkers selected from Table 58 from the treated genomic DNA; and c) diagnosing whether the individual has a cancer by comparing the methylation profile to a reference CpG methylation profile obtained from a cancer CpG methylation profile database, wherein a correlation between the methylation profile and the reference CpG methylation profile determines the presence of cancer in the individual.
Zhang further teaches in claim 28 The method of claim 27, wherein the reference CpG methylation profile obtained from the cancer CpG methylation profile database is generated by the steps of: a) generating CpG methylation data from a set of biological samples by a sequencing method, wherein the set comprises a first cancerous biological sample, a second cancerous biological sample, a third cancerous biological sample, a first normal biological sample, a second normal biological sample, and a third normal biological sample; wherein the first, second, and third cancerous biological samples are different; and wherein the first, second, and third normal biological samples are different.
According to Zhang, "....data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model.  A “known sample” is a sample that has been pre-classified, such as, for example, a suitable control (e.g., biomarkers) from a non-diseased or non-cancer “normal” sample and/or suitable control (e.g., biomarkers from a known tumor tissue type or stage, or cancer status" (Para.  [0276]).  This corresponds to the claim limitation of training, using plurality of reference sequence reads.
The disease state is interpreted to be equivalent to being normal, cancerous, or cancer type.
The recited "sequence reads" reads on Zhang's CpG methylation data.
The recited "reference sequence reads" reads on Zhang's data that are generated using samples such as “known samples” or “control” or reference CpG methylation profile
Therefore, the teachings of Zhang correspond to the claim limitation of generating a first plurality of reference sequence reads from a first reference sample, the first reference sample from a subject having a first disease state, wherein the first disease state is cancer; generating a second plurality of reference sequence reads from a second reference sample, the second sample from a subject having a second disease state, wherein the second disease state is non-cancer.
	Zhang teaches a number of methods are utilized to measure, detect, determine, identify, and characterize the methylation status/level of a biomarker (i.e., a region/fragment of DNA or a region/fragment of genome DNA (e.g., CpG island-containing region/fragment)) (Para.  [0179]).  Zhang also teaches a panel comprises 1000 or more biomarkers (Para.  [0159]), which would correspond to the claim limitation of reference sequence reads from over 1,000 nucleic acid fragments.
Zhang further teaches in some instances, a difference within each said pair of dataset is calculated and the differences are then input into the machine learning/classification program (103) (Paragraph [0150]).  In some cases, a pair-wise methylation difference dataset from the first, second, and third pair of datasets is generated and then analyzed in the presence of a control dataset or a training dataset (104) by the machine learning/classification method (103) to generate the cancer CpG methylation profile database (105) (Paragraph [0150]).  In some cases, the machine learning method comprises identifying a plurality of markers and a plurality of weights based on a top score (e.g., a t-test value, a β test value), and classifying the samples based on the plurality of markers and the plurality of weights (Paragraph [0150]).  In some cases, the cancer CpG methylation profile database (105) comprises a set of CpG methylation profiles and each CpG methylation profile represents a cancer type (Paragraph [0150]).  Zhang also discusses the identification of a cancer type specific signature was achieved by comparing a pair-wise methylation difference between a particular cancer type versus its surrounding normal tissue, difference between two different cancer types, as well as difference between two different normal tissues (Paragraph [0366]).  All of 485,000 CpG methylation sites were investigated in a training cohort of 1100 tumor samples and 231 matched adjacent-normal tissue samples (Paragraph [0366]).  Zhang describes 2 models that corresponds to the first probabilistic model as the machine learning/classification method that generates the cancer CpG methylation profile database (Paragraph [0150]) and the second probabilistic model as the machine learning model that identifies a plurality of markers and a plurality of weights based on a top score (e.g., a t-test value, a β test value), and classifying the samples based on the plurality of markers and the plurality of weights (Paragraph [0150]).  
Zhang further discusses that the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model (Para.  [0095]).  The recited "probabilistic model" reads on Zhang's classification model, such as the logistic regression model.
Zhang's teaching corresponds to the claim limitation of training, using the first plurality of reference sequence reads, a first probabilistic model, the first probabilistic model associated with the first disease state and training, using the second plurality of reference sequence reads, a second probabilistic model, the second probabilistic model associated with a second disease state.
Zhang further teaches in particular embodiments, the biomarker panels of the present invention may show a statistical difference in different cancer statuses of at least p<0.05, p<10−2, p<10−3, p<104 or p<10−5 (Paragraph [0239]).  Diagnostic tests that use these biomarkers may show an ROC of at least 0.6, at least about 0.7, at least about 0.8, or at least about 0.9 (Paragraph [0239]).  The biomarkers are differentially methylated in unaffected individual (or a normal control individual) and cancer, and the biomarkers for each cancer type are differentially methylated, and, therefore, are useful in aiding in the determination of cancer status.  In certain embodiments, the biomarkers are measured in a patient sample using the methods described herein and compared, for example, to predefined biomarker levels and correlated to cancer status.  In other embodiments, the correlation of a combination of biomarkers in a patient sample is compared, for example, to a predefined biomarker panel (Paragraph [0239]).  In yet another embodiment, the methylation profile of one or more genes in a patient sample are compared to the methylation profile of genes identified differentially methylated correlated to a tumor type or state or cancer status (Paragraph [0239]).  In particular embodiments, the measurement(s) may then be compared with a relevant diagnostic amount(s), cut-off(s), or multivariate model scores that distinguish a positive cancer status from a negative cancer status (Paragraph [0239]).  The diagnostic amount(s) represents a measured amount of epigenetic biomarker(s) above which or below which a patient is classified as having a particular cancer status.  As is well understood in the art, by adjusting the particular diagnostic cut-off(s) used in an assay, one can increase sensitivity or specificity of the diagnostic assay depending on the preference of the diagnostician (Paragraph [0239]).  In particular embodiments, the particular diagnostic cut-off can be determined, for example, by measuring the amount of biomarker hypermethylation or hypomethylation in a statistically significant number of samples from patients with the different cancer statuses, and drawing the cut-off to suit the desired levels of specificity and sensitivity (Paragraph [0239]).  
The recited "probability values" reads on Zhang's statistically significant values utilized to determine cancer status, such as a positive cancer status or a negative cancer status.  
	Zhang teaches Various methodologies described herein include a step that involves comparing a value, level, feature, characteristic, property, etc.  to a suitable control, referred to interchangeably herein as an appropriate control, a control sample, or as a control (Paragraph [0177]).  In some embodiments, a control is a value, level, feature, characteristic, property, etc., determined in a cell, a tissue, an organ, or a sample obtained from a patient.  In some instances, the cell, tissue, organ, or sample is a normal cell, tissue, organ, or sample.  In some cases, the cell tissue, organ, or sample is a cancerous cell, tissue, organ, or sample (Paragraph [0177]).  For example, the biomarkers of the present invention is assayed for their methylation level in a sample from an unaffected individual or a normal control individual, or the subject's unaffected family member (Paragraph [0177]).  In another embodiment, a control is a value, level, feature, characteristic, property, etc.  determined prior to initiating a therapy (e.g., a cancer treatment) on a patient, or in between a therapeutic regimen (Paragraph [0177]).  In further embodiments, a control is a predefined value, level, feature, characteristic, property, etc.  (Paragraph [0177]).  This corresponds to the claim limitation of identifying one or more features by comparing the first probability value and the second probability value for each sequence read.  
Zhang also teaches "...the pattern recognition method comprises a linear combination of methylation levels, or a nonlinear combination of methylation levels to extract the probability that a biological sample is from a patient who exhibits no evidence of disease, who exhibits systemic cancer, or who exhibits biochemical recurrence, as well as to distinguish these disease states and types, particularly the primary tumor type" (Para, [0274]), which also corresponds to the claim limitation of comparing probability values.
Therefore, Zhang teaches the claim limitation of applying the sequence read to the first probabilistic model to determine a first probability value, the first probability value being a probability that the sequence read originated from a sample associated with the first disease state and applying the sequence read to the second probabilistic model to determine a second probability value, the second probability value being a probability that the sequence read originated from a sample associated with the second disease state.
	Zhang teaches the recited training samples include a first subset of training samples obtained from subjects diagnosed with cancer and a second subset of training samples obtained from subjects not diagnosed with cancer at least with Zhang's Claim 28.  Zhang's Claim 28 step a includes a) generating CpG methylation data from a set of biological samples by a sequencing method, wherein the set comprises a first cancerous biological sample, a second cancerous biological sample, a third cancerous biological sample, a first normal biological sample, a second normal biological sample, and a third normal biological sample..." and with "In some embodiments, data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model." (Para. [0276]). 
	Zhang teaches the recited training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a likelihood of presence of cancer in an individual based on a feature set for a test sample of the individual at least with "Once trained, the classification model recognizes patterns in data generated using unknown samples. In some instances, the classification model is then used to classify the unknown samples into classes. This is useful, for example, in predicting whether or not a particular biological sample is associated with a certain biological condition (e.g., diseased versus non-diseased)." (Para. [0276]).

Zhang teaches the recited each of a plurality of training samples different from the first reference sample and the second reference sample with “The samples were divided into five equal parts and 4 of the parts were used for training and the fifth part was used to test the results.” (para. [0379]).
Zhang teaches the claim limitation of generating the first reference sample from a first subject having a first disease state, wherein the first disease state is cancer with “(i) a first pair of CpG methylation datasets generated from the first cancerous biological sample and the first normal biological sample, wherein CpG methylation data generated from the first cancerous biological sample form a first dataset within the first pair of datasets, CpG methylation data generated from the first normal biological sample form a second dataset within the first pair of datasets, and the first cancerous biological sample and the first normal biological sample are from the same biological sample source” (para. [0006]). Zhang’s teaching of “the first cancerous biological sample and the first normal biological sample are from the same biological sample source” corresponds to the recited “first subject”.
Zhang teaches predict a likelihood that a given sequence read originates from one subject having the first disease state with “(2) analyze the pair-wise methylation difference dataset with a control dataset by a machine learning method to generate the cancer CpG methylation profile database, wherein (i) the machine learning method comprises: identifying a plurality of markers and a plurality of weights based on a top score, and classifying the samples based on the plurality of markers and the plurality of weights; and (ii) the cancer CpG methylation profile database comprises a set of CpG methylation profiles and each CpG methylation profile represents a cancer type.” (para. [0006]).
Zhang teaches the claim limitation of generating the second reference sample from a second subject having a second disease state, wherein the second disease state is non-cancer with “a. obtaining a fourth pair of CpG methylation datasets, with the first processor, generated from a fourth cancerous biological sample and a fourth normal biological sample, wherein CpG methylation data generated from the fourth cancerous biological sample form a seventh dataset within the fourth pair of datasets, CpG methylation data generated from the first normal biological sample form an eighth dataset within the fourth pair of datasets, and the fourth cancerous biological sample and the fourth normal biological sample are from the same biological sample source” (para. [0051]). Zhang’s teaching of “the fourth cancerous biological sample and the fourth normal biological sample are from the same biological sample source” corresponds to the recited “second subject”.
Zhang teaches the claim limitation of predict a likelihood that a given sequence read originates from one subject having the second disease state with “e. analyzing the second pair-wise methylation difference dataset with the cancer CpG methylation profile database described above, wherein a correlation between the second pair-wise methylation difference dataset and a CpG methylation profile within the cancer CpG methylation profile database determines a cancer type of the individual.” (para. [0051]).

Zhang does not explicitly teach two models for detecting a first and a second disease state. Zhang teaches a machine learning model that is trained using data that are generated from known samples.  According to Zhang, "....data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model.  A “known sample” is a sample that has been pre-classified, such as, for example, a suitable control (e.g., biomarkers) from a non-diseased or non-cancer “normal” sample and/or suitable control (e.g., biomarkers from a known tumor tissue type or stage, or cancer status" (Para.  [0276]).  Zhang further discusses that the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model (Para.  [0095]).  Therefore, the recited "reference sequence reads" corresponds to Zhang's data that are generated using samples such as “known samples” or “control”.  While the recited "probabilistic model" corresponds to Zhang's classification model, such as the logistic regression model because the model taught by Zhang uses statistical techniques for the prediction of cancer.  Zhang’s classification model corresponds to the recited probabilistic model and a machine learning classifier. Zhang's teaching corresponds to the claim limitation of generating a first plurality of reference sequence reads and training a first probabilistic model associated with the first disease state.  Although Zhang does not explicitly teach two models for detecting a first and second disease state, it would have been obvious to repeat the procedure and model taught by Zhang with a second set of data for the purpose of predicting a second disease state of non-cancer.  However, Li teaches two probabilistic models for predicting a likelihood that a given sequence read originates from one subject having the first or second disease state of non-cancer and cancer with Figure 3 (page 4). Figure 3 depicts the likelihood of the sequence read being tumor or normal. Li teaches calculating the class-specific likelihood of each cfDNA sequencing read (Page 4, col. 1, para. 3) and then predicting tumor-derived cfDNA fraction (Page 4, col. 2, para. 2). Li also teaches in Figure 2, a probabilistic framework to infer the tumor-derived cfDNA fraction (i.e. tumor fraction), denoted as 0 ≤ θ < 1, by classifying cfDNA reads into two classes (class T for tumor-derived DNAs and class N for normal plasma cfDNAs), based on a set of markers associated with the methylation patterns of two classes. (Page 4, col. 2, para. 2). This meets that claim limitation of a first and second probabilistic model and then predicting the likelihood of the presence of cancer in an individual. 

Zhang and Li does not explicitly teach applying the first probabilistic model to the training sequence read to determine a first probability value, the first probability value being a probability that the training sequence read originated from a sample associated with the first disease state, and applying the second probabilistic model to the training sequence read to determine a second probability value, the second probability value being a probability that the training sequence read originated from a sample associated with the second disease state; and for each training sample, generating a feature set of one or more features for the training sample by comparing the first probability value and the second probability value for each training sequence read; and training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a likelihood of presence of cancer in an individual based on a feature set for a test sample of the individual. However, Xia teaches this claim limitation with “In this paper, we introduce a convolutional neural network based ensemble method for cancer prediction using DNA methylation data. We first conduct t-test to choose a set of significantly differential methylation points. Then, the selected feature was feed into Naive Bayesian Classifier, k-Nearest Neighbor, Decision Tree, Random Forest and Gradient Boosting Decision Tree five basic classifiers for the first stage classification. Here we use S-fold cross validation method by dividing the whole datasets into S groups and choose S-1 groups as training sets, the left one as test sets at each time. Finally, a convolutional neural network is used to ensemble the predictions of the first stage classifiers and extract the internal relationship among different classifiers to predict a more reliable result. The flowchart of the proposed ensemble method is shown in Fig. 1.” (Page 192, col.1, section 2. Method) and Figure 1. Fig. 1 depicts a Flowchart of the proposed convolutional neural network based ensemble method (page 192).

Therefore, it would have been prima facia obvious to combine the teachings of Zhang and Li to arrive at the claimed invention.  Li demonstrated that CancerDetector provides high sensitivity and specificity in detecting tumor cfDNAs (Abstract) and Xia discussed that the use of convolutional neural network to ensemble the predictions of the first stage classifiers and extract the internal relationship among different classifiers provides for a more reliable prediction (Page 192, col.1, section 2. Method). A person of ordinary skill in the art would have been motivated to combine the method of Zhang with the method of Li to include training probabilistic models for calculating the likelihood that the sequence read is normal or tumor to better identify sequence reads that are of tumor type.  A person of ordinary skill in the art would have also been motivated to combine the method of Zhang with the method of Xia to include training a machine learning classifier with the feature sets generated from the training sample of the probabilistic models to predict a more reliable result.  Furthermore, there would have been a reasonable expectation of success because Zhang, Li and Xia are in the same field of endeavor of determining the likelihood of cancer or non-cancer.  


Claim(s) 139 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (U.S.  Patent No 2018 / 0341745 A1, published Nov.  29, 2018; cited on the 05/11/2023 “Notice of References Cited” form 892) in view of Li (CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic acids research 46.15 (2018): e89-e89, published 2018; cited on the 05/11/2023 “Notice of References Cited” form 892) and Xia ("A convolutional neural network based ensemble method for cancer prediction using DNA methylation data." Proceedings of the 2019 11th International Conference on Machine Learning and Computing. Feb. 2019.; cited on the 05/29/2025 IDS Document). 

Regarding claim 139, Zhang further teaches in claim 28 The method of claim 27, wherein the reference CpG methylation profile obtained from the cancer CpG methylation profile database is generated by the steps of: a) generating CpG methylation data from a set of biological samples by a sequencing method, wherein the set comprises a first cancerous biological sample, a second cancerous biological sample, a third cancerous biological sample, a first normal biological sample, a second normal biological sample, and a third normal biological sample; wherein the first, second, and third cancerous biological samples are different; and wherein the first, second, and third normal biological samples are different.
Zhang also teaches "....data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model.  A “known sample” is a sample that has been pre-classified, such as, for example, a suitable control (e.g., biomarkers) from a non-diseased or non-cancer “normal” sample and/or suitable control (e.g., biomarkers from a known tumor tissue type or stage, or cancer status" (Para.  [0276]).  Zhang further discusses that the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model (Para.  [0095]).  Therefore, the recited "reference sequence reads" reads on Zhang's data that are generated using samples such as “known samples” or “control”.  This corresponds to the claim limitation of generating, for each disease state of a plurality of disease states, a plurality of reference sequence reads sequenced from reference samples having the disease state and generating, for each training sample of a plurality of training samples.
The recited "each disease state associated with a tissue of origin cancer type" reads on Zhang's known tumor tissue type.
Zhang also teaches Hierarchal clustering of samples according to differential methylation of CpG sites in this fashion was able to distinguish cancer tissue of origin as well as from normal tissue in the TCGA training cohort (Table 45) (Paragraph [0408]).  This corresponds to the claim limitation of generating a first plurality of reference sequence reads from reference samples having one of a plurality of disease states each associated with a tissue of origin cancer type.
Zhang teaches a number of methods are utilized to measure, detect, determine, identify, and characterize the methylation status/level of a biomarker (i.e., a region/fragment of DNA or a region/fragment of genome DNA (e.g., CpG island-containing region/fragment)) (Para.  [0179]).  Zhang also teaches a panel comprises 1000 or more biomarkers (Para.  [0159]), which would correspond to the claim limitation of reference sequence reads from over 1,000 nucleic acid fragments and training sequence reads sequenced from over 1,000 nucleic acid fragments in the training sample.

Zhang teaches the recited each of a plurality of training samples different from the first reference sample and the second reference sample with “The samples were divided into five equal parts and 4 of the parts were used for training and the fifth part was used to test the results.” (para. [0379]).
Zhang teaches in claims 28(e)(1) and (2), and 30-32, a method of classifying cancer type by machine learning.  
(1) the machine learning method comprises: identifying a plurality of makers and a plurality of weights based on a top score, and classifying the samples based on the plurality of markers and the plurality of weights.  
This corresponds to the claim limitation of applying the probabilistic model to the sequence read to determine a value based at least on a first probability that the sequence read originated from a sample associated with the disease state associated with the probabilistic model.
(2) the cancer CpG methylation profile database comprises a set of CpG methylation profiles and each CpG methylation profile represents a cancer type.
30.  The method of claim 28, wherein the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model.  
31.  The method of claim 28, wherein the CpG methylation data is generated from an extracted genomic DNA treated with a deaminating agent.
32.  The method of claim 27, wherein the comparing further comprises determining the cancer type of the individual.  This corresponds to a disease state.
Zhang further teaches in some instances, a difference within each said pair of dataset is calculated and the differences are then input into the machine learning/classification program (103) (Paragraph [0150]).  In some cases, a pair-wise methylation difference dataset from the first, second, and third pair of datasets is generated and then analyzed in the presence of a control dataset or a training dataset (104) by the machine learning/classification method (103) to generate the cancer CpG methylation profile database (105) (Paragraph [0150]).  In some cases, the machine learning method comprises identifying a plurality of markers and a plurality of weights based on a top score (e.g., a t-test value, a β test value), and classifying the samples based on the plurality of markers and the plurality of weights (Paragraph [0150]).  In some cases, the cancer CpG methylation profile database (105) comprises a set of CpG methylation profiles and each CpG methylation profile represents a cancer type (Paragraph [0150]).  Zhang also discusses the identification of a cancer type specific signature was achieved by comparing a pair-wise methylation difference between a particular cancer type versus its surrounding normal tissue, difference between two different cancer types, as well as difference between two different normal tissues (Paragraph [0366]).  All of 485,000 CpG methylation sites were investigated in a training cohort of 1100 tumor samples and 231 matched adjacent-normal tissue samples (Paragraph [0366]).  Zhang describes 2 models that corresponds to the first probabilistic model as the machine learning/classification method that generates the cancer CpG methylation profile database (Paragraph [0150]) and the second probabilistic model as the machine learning model that identifies a plurality of markers and a plurality of weights based on a top score (e.g., a t-test value, a β test value), and classifying the samples based on the plurality of markers and the plurality of weights (Paragraph [0150]).  This corresponds to the claim limitation of training a classifier as a machine-learning model using the sets of features for the training samples, the classifier being trained to predict, for an input set of features derived from sequence reads.
Zhang also teaches that in specific embodiments, provided herein include methods for determining the risk of developing cancer in a patient.  Biomarker methylation percentages, amounts or patterns are characteristic of various risk states, e.g., high, medium or low (Paragraph [0263]).  The risk of developing cancer is determined by measuring the methylation status of the relevant biomarkers and then either submitting them to a classification algorithm or comparing them with a reference amount, i.e., a predefined level or pattern of methylated (and/or unmethylated) biomarkers that is associated with the particular risk level (Paragraph [0263]).  
The recited "training, for each disease state of the plurality of disease states" reads on Zhang's “train” a classification model.  
	Zhang further teaches in claim 28 The method of claim 27, wherein the reference CpG methylation profile obtained from the cancer CpG methylation profile database is generated by the steps of: a) generating CpG methylation data from a set of biological samples by a sequencing method, wherein the set comprises a first cancerous biological sample, a second cancerous biological sample, a third cancerous biological sample, a first normal biological sample, a second normal biological sample, and a third normal biological sample; wherein the first, second, and third cancerous biological samples are different; and wherein the first, second, and third normal biological samples are different.  This corresponds to the claim limitation of sequenced from nucleic acid fragments in a test sample of a test subject, a disease state or a tissue of origin associated with a disease state of the plurality of disease states. 
	Zhang also teaches the recited training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a presence or absence of a disease, a disease type, and/or a disease tissue of origin at least with "Once trained, the classification model recognizes patterns in data generated using unknown samples. In some instances, the classification model is then used to classify the unknown samples into classes. This is useful, for example, in predicting whether or not a particular biological sample is associated with a certain biological condition (e.g., diseased versus non-diseased)." (Para. [0276]).
	Zhang teaches the recited training samples include a first subset of training samples obtained from subjects diagnosed with cancer and a second subset of training samples obtained from subjects not diagnosed with cancer at least with Zhang's Claim 28.  Zhang's Claim 28 step a includes a) generating CpG methylation data from a set of biological samples by a sequencing method, wherein the set comprises a first cancerous biological sample, a second cancerous biological sample, a third cancerous biological sample, a first normal biological sample, a second normal biological sample, and a third normal biological sample..." and with "In some embodiments, data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model." (Para. [0276]).

Zhang teaches the recited each of a plurality of training samples different from the first reference sample and the second reference sample with “The samples were divided into five equal parts and 4 of the parts were used for training and the fifth part was used to test the results.” (para. [0379]).
Zhang teaches the claim limitation of generating the first reference sample from a first subject having a first disease state, wherein the first disease state is cancer with “(i) a first pair of CpG methylation datasets generated from the first cancerous biological sample and the first normal biological sample, wherein CpG methylation data generated from the first cancerous biological sample form a first dataset within the first pair of datasets, CpG methylation data generated from the first normal biological sample form a second dataset within the first pair of datasets, and the first cancerous biological sample and the first normal biological sample are from the same biological sample source” (para. [0006]). Zhang’s teaching of “the first cancerous biological sample and the first normal biological sample are from the same biological sample source” corresponds to the recited “first subject”.
Zhang teaches predict a likelihood that a given sequence read originates from one subject having the first disease state with “(2) analyze the pair-wise methylation difference dataset with a control dataset by a machine learning method to generate the cancer CpG methylation profile database, wherein (i) the machine learning method comprises: identifying a plurality of markers and a plurality of weights based on a top score, and classifying the samples based on the plurality of markers and the plurality of weights; and (ii) the cancer CpG methylation profile database comprises a set of CpG methylation profiles and each CpG methylation profile represents a cancer type.” (para. [0006]).
Zhang teaches the claim limitation of generating the second reference sample from a second subject having a second disease state, wherein the second disease state is non-cancer with “a. obtaining a fourth pair of CpG methylation datasets, with the first processor, generated from a fourth cancerous biological sample and a fourth normal biological sample, wherein CpG methylation data generated from the fourth cancerous biological sample form a seventh dataset within the fourth pair of datasets, CpG methylation data generated from the first normal biological sample form an eighth dataset within the fourth pair of datasets, and the fourth cancerous biological sample and the fourth normal biological sample are from the same biological sample source” (para. [0051]). Zhang’s teaching of “the fourth cancerous biological sample and the fourth normal biological sample are from the same biological sample source” corresponds to the recited “second subject”.
Zhang teaches the claim limitation of predict a likelihood that a given sequence read originates from one subject having the second disease state with “e. analyzing the second pair-wise methylation difference dataset with the cancer CpG methylation profile database described above, wherein a correlation between the second pair-wise methylation difference dataset and a CpG methylation profile within the cancer CpG methylation profile database determines a cancer type of the individual.” (para. [0051]).

Zhang does not explicitly teach two models for detecting a first and a second disease state and for each training sample, generating a feature set of one or more features for the training sample by comparing the first probability value and the second probability value for each training sequence read; and training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a likelihood of presence of cancer in an individual based on a feature set for a test sample of the individual Zhang teaches a machine learning model that is trained using data that are generated from known samples.  According to Zhang, "....data that are generated using samples such as “known samples” or “control” are then used to “train” a classification model.  A “known sample” is a sample that has been pre-classified, such as, for example, a suitable control (e.g., biomarkers) from a non-diseased or non-cancer “normal” sample and/or suitable control (e.g., biomarkers from a known tumor tissue type or stage, or cancer status" (Para.  [0276]).  Zhang further discusses that the machine learning method utilizes an algorithm selected from one or more of the following: a principal component analysis, a logistic regression analysis, a nearest neighbor analysis, a support vector machine, and a neural network model (Para.  [0095]).  Therefore, the recited "reference sequence reads" corresponds to Zhang's data that are generated using samples such as “known samples” or “control”.  While the recited "probabilistic model" corresponds to Zhang's classification model, such as the logistic regression model because the model taught by Zhang uses statistical techniques for the prediction of cancer.  Zhang’s classification model corresponds to the recited probabilistic model and a machine learning classifier. Zhang's teaching corresponds to the claim limitation of generating a first plurality of reference sequence reads and training a first probabilistic model associated with the first disease state.  Although Zhang does not explicitly teach two models for detecting a first and second disease state, it would have been obvious to repeat the procedure and model taught by Zhang with a second set of data for the purpose of predicting a second disease state of non-cancer.  However, Li teaches two probabilistic models for predicting a likelihood that a given sequence read originates from one subject having the first or second disease state of non-cancer and cancer with Figure 3 (page 4). Figure 3 depicts the likelihood of the sequence read being tumor or normal. Li teaches calculating the class-specific likelihood of each cfDNA sequencing read (Page 4, col. 1, para. 3) and then predicting tumor-derived cfDNA fraction (Page 4, col. 2, para. 2). Li also teaches in Figure 2, a probabilistic framework to infer the tumor-derived cfDNA fraction (i.e. tumor fraction), denoted as 0 ≤ θ < 1, by classifying cfDNA reads into two classes (class T for tumor-derived DNAs and class N for normal plasma cfDNAs), based on a set of markers associated with the methylation patterns of two classes. (Page 4, col. 2, para. 2). This meets that claim limitation of a first and second probabilistic model and then predicting the likelihood of the presence of cancer in an individual. 

Zhang and Li does not explicitly teach applying the first probabilistic model to the training sequence read to determine a first probability value, the first probability value being a probability that the training sequence read originated from a sample associated with the first disease state, and applying the second probabilistic model to the training sequence read to determine a second probability value, the second probability value being a probability that the training sequence read originated from a sample associated with the second disease state; and for each training sample, generating a feature set of one or more features for the training sample by comparing the first probability value and the second probability value for each training sequence read; and training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a likelihood of presence of cancer in an individual based on a feature set for a test sample of the individual. However, Xia teaches this claim limitation with “In this paper, we introduce a convolutional neural network based ensemble method for cancer prediction using DNA methylation data. We first conduct t-test to choose a set of significantly differential methylation points. Then, the selected feature was feed into Naive Bayesian Classifier, k-Nearest Neighbor, Decision Tree, Random Forest and Gradient Boosting Decision Tree five basic classifiers for the first stage classification. Here we use S-fold cross validation method by dividing the whole datasets into S groups and choose S-1 groups as training sets, the left one as test sets at each time. Finally, a convolutional neural network is used to ensemble the predictions of the first stage classifiers and extract the internal relationship among different classifiers to predict a more reliable result. The flowchart of the proposed ensemble method is shown in Fig. 1.” (Page 192, col.1, section 2. Method) and Figure 1. Fig. 1 depicts a Flowchart of the proposed convolutional neural network based ensemble method (page 192).

Therefore, it would have been prima facia obvious to combine the teachings of Zhang and Li to arrive at the claimed invention.  Li demonstrated that CancerDetector provides high sensitivity and specificity in detecting tumor cfDNAs (Abstract) and Xia discussed that the use of convolutional neural network to ensemble the predictions of the first stage classifiers and extract the internal relationship among different classifiers provides for a more reliable prediction (Page 192, col.1, section 2. Method). A person of ordinary skill in the art would have been motivated to combine the method of Zhang with the method of Li to include training probabilistic models for calculating the likelihood that the sequence read is normal or tumor to better identify sequence reads that are of tumor type.  A person of ordinary skill in the art would have also been motivated to combine the method of Zhang with the method of Xia to include training a machine learning classifier with the feature sets generated from the training sample of the probabilistic models to predict a more reliable result.  Furthermore, there would have been a reasonable expectation of success because Zhang, Li and Xia are in the same field of endeavor of determining the likelihood of cancer or non-cancer.  


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 217 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Xia ("A convolutional neural network based ensemble method for cancer prediction using DNA methylation data." Proceedings of the 2019 11th International Conference on Machine Learning and Computing. Feb. 2019.; cited on the 05/29/2025 IDS Document). 

Regarding independent claim 217, Xia teaches a non-transitory computer-readable storage medium with “With the rapid development of computer science and machine learning technologies, computer-aid cancer prediction has achieved increasingly progress.” (abstract).
Xia teaches the claim limitation of a plurality of models, each model comprising: a function that transforms an input sequence read into a prediction that the input sequence read originates from one subject having one disease state of a plurality of disease states, and a set of learned parameters for the function of the model, wherein the set of learned parameters are learned from a plurality of reference sequence reads associated with the disease state of the model; a classifier configured as a machine-learning model, the classifier comprising: a function that transforms an input set of features for a sample based on outputs of the plurality of models applied to sequencing reads of the sample into a prediction of one disease state for the sample, and a set of learned parameters for the function of classifier, wherein the set of learned parameters are learned by: applying each model to each training sequence read of each training sample of a plurality of training samples to determine a prediction that the training sequence read originated from a sample with the disease state associated with the model, generating a set of features for each training sample based on the predictions for the training sequence reads across the disease states, and training the classifier using the sets for the training samples with “In this study, we introduce a convolutional neural network based multi-model ensemble method for cancer prediction using DNA methylation data. We first choose five basic machine learning methods as the first stage classifiers and conduct prediction individually. Then, a convolutional neural network is used to find the high-level features among the classifiers and gives a credible prediction result.” (abstract) and “In this paper, we introduce a convolutional neural network based ensemble method for cancer prediction using DNA methylation data. We first conduct t-test to choose a set of significantly differential methylation points. Then, the selected feature was feed into Naive Bayesian Classifier, k-Nearest Neighbor, Decision Tree, Random Forest and Gradient Boosting Decision Tree five basic classifiers for the first stage classification. Here we use S-fold cross validation method by dividing the whole datasets into S groups and choose S-1 groups as training sets, the left one as test sets at each time. Finally, a convolutional neural network is used to ensemble the predictions of the first stage classifiers and extract the internal relationship among different classifiers to predict a more reliable result. The flowchart of the proposed ensemble method is shown in Fig. 1.” (Page 192, col.1, section 2. Method) and Figure 1. Fig. 1 depicts a Flowchart of the proposed convolutional neural network based ensemble method (page 192).


Response to 35 USC §103 (Remarks filed 12/01/2025, pages 20-22)

Applicant cancelled independent claim 93 and added new independent claim 217. No claims are amended. Claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25, 28, 30- 31, 34, 40, 47, 139, and 217 are pending.
Applicant argues that Li does not teach a "classifier" trained with "the feature sets of the training samples" derived from outputs of the "plurality of probabilistic models," and "configured to predict a likelihood of presence of cancer in an individual based on a feature set for a test sample of the individual" of claims 1, 47 and 139.
In response, Applicant’s remarks have been fully considered and are persuasive. Therefore, the 103 rejection in the office action mailed 05/30/2025 is withdrawn and a new rejection is applied as discussed above. As discussed, Xia teaches this claim limitation with “In this paper, we introduce a convolutional neural network based ensemble method for cancer prediction using DNA methylation data. We first conduct t-test to choose a set of significantly differential methylation points. Then, the selected feature was feed into Naive Bayesian Classifier, k-Nearest Neighbor, Decision Tree, Random Forest and Gradient Boosting Decision Tree five basic classifiers for the first stage classification. Here we use S-fold cross validation method by dividing the whole datasets into S groups and choose S-1 groups as training sets, the left one as test sets at each time. Finally, a convolutional neural network is used to ensemble the predictions of the first stage classifiers and extract the internal relationship among different classifiers to predict a more reliable result. The flowchart of the proposed ensemble method is shown in Fig. 1.” (Page 192, col.1, section 2. Method) and Figure 1. Fig. 1 depicts a Flowchart of the proposed convolutional neural network based ensemble method (page 192).


Claim rejections - 101
35 USC 101 reads: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
For each rejection below, dependent claims are rejected similarly as not remedying the rejection, unless otherwise noted.

Judicial exceptions (JEs) to 101 patentability
Claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217 are rejected under 35 USC 101 because the claimed inventions are not directed to patent eligible subject matter.  After consideration of relevant factors with respect to each claim as a whole, each claim is directed to one or more judicially-recognized exceptions to patentability (JEs), i.e. an abstract idea, a natural phenomenon, a law of nature and/or a product of nature, as identified below.  As set forth below, it is not clear that any element or combination of elements in addition to the JE(s), i.e. and "additional elements," either integrate the identified JE(s) into a practical application and/or is a non-conventional additional element, such that it is not clear that any claim is directed to significantly more than the identified JE(s).
MPEP 2106 organizes JE analysis into Steps 1, 2A (1st prong & 2nd prong) and 2B as analyzed below.  MPEP 2106 and the following USPTO website provide further explanation and case law citations: www.uspto.gov/patent/laws-and-regulations/examination-policy/examination-guidance-and-training-materials.

Analysis of claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217

Step 1: Are the claims directed to a 101 process, machine, manufacture, or composition of matter (MPEP 2106.03)?
Independent claim 1 is directed to a 101 process, here a "method," with process steps such as "generating..."  and "training..."
Independent claim 47 is directed to a 101 machine or manufacture, here a "system," with non-transitory elements such as "computer processor."
Independent claim 139 is directed to a 101 process, here a "method," with process steps such as "generating..."  and "training..."
Independent claim 217 is directed to a 101 machine or manufacture, here a "non-transitory computer readable storage medium." 

[Step 1: claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217: YES]

Step 2A, 1st prong: Do the claims recite a judicially-recognized exception (JE), e.g. a law of nature, a natural phenomenon or product, or an abstract idea (MPEP 2106.04.II.A.1 & .04(a))?
	The MPEP at 2106.I, 2nd para. explains that JEs have been court-recognized as occurring in at least four types: abstract ideas, laws of nature and natural phenomena (including natural products).
MPEP § 2106.04(a)(2) further explains that abstract ideas may be grouped as:
• mathematical concepts (mathematical formulas or equations, mathematical relationships and mathematical calculations);
• certain methods of organizing human activity (fundamental economic practices or principles, managing personal behavior or relationships or interactions between people);
and/or
• mental processes (procedures for observing, evaluating, analyzing/ judging and
organizing information).

Regarding the instant claims and with respect to Step 2A, 1st prong, at least preliminarily these claims recite JEs in the form of abstract ideas and certain methods of organizing human activity as follows. 

Mental processes recited include:
Claims 1 and 47 recite:  comparing the first probability value and the second probability value..., determine a first probability value..., determine a second probability value..., predict a likelihood that a given sequence read originates from one subject… and predict a likelihood presence of cancer... The process of comparing, determining and predicting are acts of evaluating and analyzing data that could be that could be practically performed in the human mind and/or with pen and paper.
Claim 5 recites:  the first disease state is selected from the group...  The process of selecting is an act of evaluating and analyzing data that could be that could be practically performed in the human mind and/or with pen and paper.
Claim 11 recites: determining rates of methylation for each of a plurality of CpG sites...  The process of determining involves evaluating and analyzing data that could be practically performed in the human mind and/or with pen and paper. 
Claim 12 recites: determining for each sequence read of the first plurality of reference sequence reads... and determining whether at least a threshold number of CpG sites... The process of determining involves evaluating and analyzing data that could be practically performed in the human mind and/or with pen and paper. 
Claim 13 recites: determining for each sequence read of the first plurality of reference sequence reads... and filtering the first plurality of reference sequence... The process of determining and filtering involves evaluating and analyzing data that could be practically performed in the human mind and/or with pen and paper. 
Claim 16 recites: determining, for the probabilistic model The process of determining involves evaluating and analyzing data that could be practically performed in the human mind and/or with pen and paper. 
Claim 17 recites: selecting a plurality of the first plurality of reference sequence reads... and selecting a plurality of the second plurality of reference sequence reads...  The process of selecting is an act of evaluating and analyzing data that could be that could be practically performed in the human mind and/or with pen and paper.
Claim 21 recites a count of outlier sequence reads of the plurality of training sequence reads where the first probability value is greater than the second probability value.  This limitation is involved with counting and comparing values which are acts of evaluating, analyzing and observing data that could be practically performed in the human mind and/or with pen and paper.
Claim 23 recites: a total count of outlier sequence reads.  This limitation is involved with counting reads which are acts of evaluating, analyzing and observing data that could be practically performed in the human mind and/or with pen and paper.
Claim 24 recites: a total count of anonymously methylated sequence reads.  This limitation is involved with counting reads which are acts of evaluating, analyzing and observing data that could be practically performed in the human mind and/or with pen and paper.
Claim 25 recites: a count of fragments. This limitation is involved with counting which are acts of evaluating, analyzing and observing data that could be practically performed in the human mind and/or with pen and paper.
Claim 28 recites: comparing the first probability value... and determining a ratio... The process of determining and comparing involves evaluating and analyzing data that could be practically performed in the human mind and/or with pen and paper. 
Claim 30 recites: determining a log-likelihood ratio of the first probability value..., determining, for one or more threshold values... and log-likelihood ratio exceeding the threshold value.  These elements are acts of evaluating and analyzing data that could be practically performed in the human mind and/or with pen and paper. 
Claim 31 recites: determining, for each feature...  Determining is an act of evaluating and analyzing data that could be practically performed in the human mind and/or with pen and paper. 
Claim 34 recites: predict the likelihood... The process of predicting is an act of evaluating and analyzing data that could be that could be practically performed in the human mind and/or with pen and paper.
Claim 139 recites: determine a probability value..., predict a likelihood that a given sequence read originates from one subject… and predict a likelihood presence of cancer... The process of determining and predicting are acts of evaluating and analyzing data that could be that could be practically performed in the human mind and/or with pen and paper.  
Claim 217 recites: determine a prediction that the training sequence read originated from a sample with the disease state… The process of determining and predicting are acts of evaluating and analyzing data that could be that could be practically performed in the human mind and/or with pen and paper.  

Mathematical concepts recited include:
Claims 1 and 47 recite:  probabilistic model..., probability value..., training a machine-learning classifier... predict a likelihood that a given sequence read originates from one subject… and a likelihood... These elements are mathematical formulas and concepts.  A series of calculations are required to determine the probability value and a likelihood.  
Claim 10 recites: the first probabilistic model or second probabilistic model is a constant model, a binomial model, an independent site model, a neural net model, or a Markov model. These elements are mathematical formulas and concepts.
Claim 12 recites: ...a threshold number of CpG sites... and ...a threshold percentage... These elements are mathematical formulas and concepts.
Claim 13 recites: ... p-value... and a threshold p-value... These elements are mathematical formulas and concepts.
Claim 14 recites: the first probabilistic model or the second probabilistic model is parameterized by a sum of a plurality of mixture components each associated with a product of the rates of methylation. These elements are mathematical formulas and concepts.
Claim 16 recites: the first probabilistic model or the second probabilistic model... and the probabilistic model a set of parameters that maximizes a total log-likelihood... These elements are mathematical formulas and concepts.
Claim 17 recites: the first probabilistic model and probabilistic model.  These elements are mathematical formulas and concepts.
Claim 21 recites: a count of outlier sequence reads of the plurality of training sequence reads where the first probability value is greater than the second probability value. These elements are mathematical formulas and concepts.
Claim 23 recites: a total count of outlier sequence reads is involved with mathematical formulas and concepts.
Claim 24 recites: a total count of anonymously methylated sequence reads is involved with mathematical formulas and concepts.
Claim 25 recites: a count of fragments is involved with mathematical formulas and concepts.
Claim 28 recites: probability value, ratio, ratio threshold value and read counts are mathematical formulas and concepts.
Claim 30 recites:  a log-likelihood ratio of the first probability value... and threshold values, a count of the sequence reads probability value... and log-likelihood ratio exceeding the threshold value are mathematical formulas and concepts.
Claim 31 recites: a measure of the feature is involved with mathematical formulas and concepts.
Claim 34 recites: training the machine-learning classifier to predict the likelihood... is involved with mathematical formulas and concepts.
Claim 28 recites: probability value, ratio, ratio threshold value and read counts are involved with mathematical formulas and concepts.
Claims 139 recites:  probabilistic model..., probability value..., training a machine-learning classifier..., predict a likelihood that a given sequence read originates from one subject… and a likelihood... These elements are mathematical formulas and concepts.  A series of calculations are required to determine the probability value and a likelihood.  
Claim 217 recites: a function that transforms an input sequence read into a prediction…, …a classifier configured as a machine-learning model, the classifier comprising: a function that transforms an input set of features… and training the classifier using the sets for the training samples.

Law of Nature
Claims 1 and 47 recites "...generating a first plurality of reference sequence reads from over 1,000 nucleic acid fragments in a first reference sample, the first reference sample from a first subject having a first disease state, wherein the first disease state is cancer; generating a second plurality of reference sequence reads from over 1,000 nucleic acid fragments in a second reference sample, the second reference sample from a second subject having a second disease state, wherein the second disease state is non-cancer... and training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a likelihood of presence of cancer in an individual based on a feature set for a test sample of the individual." The claim element recites a correlation between sequence reads from nucleic fragments and disease states, which is a law of nature because it describes a consequence of natural processes in the human body, e.g., the naturally-occurring relationship between sequence reads from nucleic fragments and disease states of cancer or non-cancer.

Claim 217 recites: determine a prediction that the training sequence read originated from a sample with the disease state… The claim element recites a correlation between sequence reads from a subject and disease states, which is a law of nature because it describes a consequence of natural processes in the human body, e.g., the naturally-occurring relationship between sequence reads and disease states of cancer or non-cancer.

[Step 2A, 1st prong: claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217: YES]

Step 2A, 2nd prong: Are the above-identified JEs integrated into a practical application (MPEP 2106.04.II.A.2 & .04(d))?
Generally regarding Step 2A, 2nd prong
MPEP 2106.04(d).I lists the following considerations for evaluating whether additional elements integrate a judicial exception into a practical application:
An improvement in the functioning of a computer, or an improvement to other technology or technical field, as discussed in MPEP §§ 2106.04(d)(1) and 2106.05(a);
Applying or using a judicial exception to affect a particular treatment or prophylaxis for a disease or medical condition, as discussed in MPEP § 2106.04(d)(2);
Implementing a judicial exception with, or using a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim, as discussed in MPEP § 2106.05(b);
Effecting a transformation or reduction of a particular article to a different state or thing, as discussed in MPEP § 2106.05(c); and
Applying or using the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception, as discussed in MPEP § 2106.05(e).
Additionally, the courts have also identified limitations that did not integrate a judicial exception into a practical application:
Merely reciting a phrase such as "apply it" (or an equivalent) along with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP 2106.05(f);
Adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP 2106.05(g); and
Generally linking the use of a judicial exception to a particular technological environment or field of use, as discussed in MPEP 2106.05(h).

In Step 2A, 1st prong above, claim steps and/or elements were identified as part of one or more judicial exceptions (JEs).
In Step 2B below, any remaining steps and/or elements are therefore in addition to the identified JE(s).  Any such additional steps and additional elements are further discussed in Step 2B.
Here in Step 2A, 2nd prong, no additional step or element clearly demonstrates integration of the JE(s) into a practical application.
At this point in examination it is not yet the case that any of the Step 2A, 2nd prong considerations enumerated above clearly demonstrates integration of the identified JE(s) into a practical application.  Referring to the considerations above, none of 1. an improvement, 2. treatment, 3. a particular machine or 4. a transformation is clear in the record.  
For example, regarding the first consideration at MPEP 2106.04(d)(1), the record does not yet clearly disclose an explanation of improvement over the previous state of the technology field.  An explanation of improvement requires detailed explanation applicable to all embodiments reasonably within the claim scope.  In particular, such an explanation of improvement over the previous state of technology may include: identification of the technology field, the particular improvement, as particular as possible identification of any asserted improvements, explanation of a clear difference from the technology field, explanation that reasonably all embodiments within the claim scope result in the asserted improvement, and an extension of the explanation as far as possible to include the result of an identified practical application.  The claims do not yet clearly result in such an improvement (e.g. specification: para. [00202]).  See MPEP 2106.04(d) and (d)(1).
[Step 2A, 2nd prong: claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217: NO]

Step 2B: Do the claims recite a non-conventional arrangement of additional elements (i.e. elements in addition to any identified JE) (MPEP 2106.05)?
All elements of claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217 are part of one or more identified JEs (as described above), except for elements identified here as conventional elements in addition to the above JEs:  
Elements of the following claims are additional elements but nonetheless are conventional elements of a laboratory or computing environment, conventional data gathering elements or conventional post-processing elements: 
claim 1: the recited "generating a first plurality of reference sequence reads...," "generating a second plurality of reference sequence reads...," "generating a plurality of training sequence reads...," "applying the first probabilistic model..." "training samples obtained from subjects...," and "applying the second probabilistic model..." step/element, as evidenced by MPEP 2106.05(g), e.g. "insignificant extra solution activity" since the recitation is a conventional element of a laboratory and/or computing environment, conventional data gathering/input elements, and/or conventional post-processing or output elements.
claim 4: the recited "generating a plurality of reference sequence reads..." step/element, as evidenced by MPEP 2106.05(g), e.g. "insignificant extra solution activity" and “insignificant computer implementation” (penultimate para.), since the recitation is a conventional element of a laboratory and/or computing environment, conventional data gathering/input elements, and/or conventional post-processing or output elements.
claim 30: the recited "generating the feature set..." step/element, as evidenced by MPEP 2106.05(g), e.g. "insignificant extra solution activity" since the recitation is a conventional element of a laboratory and/or computing environment, conventional data gathering/input elements, and/or conventional post-processing or output elements.
claim 47: the recited "a computer processor and a memory, the memory storing computer program instructions," accessing a first plurality of reference sequence reads...," "accessing a second plurality of reference sequence reads...," "accessing a plurality of training sequence reads...," "training samples obtained from subjects...," "applying the first probabilistic model...," "applying the second probabilistic model..." and "nucleic acid fragments obtained from the individual" step/element, as evidenced by MPEP 2106.05(g), e.g. "insignificant extra solution activity" since the recitation is a conventional element of a laboratory and/or computing environment, conventional data gathering/input elements, and/or conventional post-processing or output elements.
claim 139: the recited "generating, for each of a plurality of disease states...," "generating, for each of a plurality of training samples...," "training samples obtained from subjects...," and "applying each probabilistic model..." step/element, as evidenced by MPEP 2106.05(g), e.g. "insignificant extra solution activity" since the recitation is a conventional element of a laboratory and/or computing environment, conventional data gathering/input elements, and/or conventional post-processing or output elements.
Claim 217: the recited "non-transitory computer readable storage medium," "input sequence read...," "a set of learned parameters...," "reference sequence reads...," "input set of features...," "outputs of the plurality of model...," "training sequence read of each training sample of a plurality of training samples..." and "generating a set of features for each training sample based on the predictions for the training sequence reads across the disease states" step/element, as evidenced by MPEP 2106.05(g), e.g. "insignificant extra solution activity" since the recitation is a conventional element of a laboratory and/or computing environment, conventional data gathering/input elements, and/or conventional post-processing or output elements.

Claims 5, 7, 10-14, 16-17, 21, 23-25, 28, 31, 34 and 40 do not recite additional elements.
Claims 5, 7 and 40 are providing information on what the data represents and do not change the character of the data obtaining step beyond mere data gathering activity.
The above listed additional elements are data gathering and outputting steps. Data gathering steps are not an abstract idea, they are extra-solution activity, as they collect the data needed to carry out the abstract idea.  Data gathering does not impose any meaningful limitation on the abstract idea, or how the abstract idea is performed.  Data gathering steps are not sufficient to integrate an abstract idea into a practical application.  (MPEP 2106.05(g)).  Although claim 47 recites a computer processor and a memory and claim 217 recites non-transitory computer readable storage medium, these limitations equate to generic computer components for data gathering and outputting. Limitations that equate to mere data gathering and outputting via generic computer components, such as receiving data at a computer or outputting data, amount to insignificant extra-solution activity as set forth by the courts in Mayo, 566 U.S. at 79, 101 USPQ2d at 1968 and OIP Techs., Inc, v, Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015).  Also, storing and retrieving information in memory were identified by the courts as well-understood, routine and conventional in Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.  The use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more as identified by the courts in Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit). Additionally, the method of generating sequence reads is known in the art as disclosed by Zhang (U.S.  Patent No 2018/0341745 A1, published Nov. 29, 2018; cited on the 05/11/2023 “Notice of References Cited” form 892). Zhang teaches generating sequences via nucleic acid amplification procedures which are well known in the art (para. [0210]).  Although, the machine learning architecture described in independent claims 1, 47, 139 and 217 are not an additional element, it appears to correspond to a machine learning ensemble that is well known and conventional as discussed by Re ("1 Ensemble methods: a review 3." (1); 2001.; cited on the attached 892 form).
[Step 2B: claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217: NO]

Summary and conclusion regarding claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217 
Summing up the above 101 JE analysis of claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25 ,28, 30-31, 34, 40, 47, 139 and 217, each viewed as a whole and considering all elements individually and in combination, no claim recites limitations that transform the claim, finally interpreted as directed to the above-identified JE(s), into patent eligible subject matter. 
The claims have all been examined to identify the presence of one or more judicial exceptions.  Each additional element in the claims has been addressed, alone and in combination, to determine whether the additional elements integrate the judicial exception into a practical application.  Each additional limitation in the claims has been addressed, alone and in combination, to determine whether those additional limitations provide an inventive concept which provides significantly more than those exceptions.  Individually, the limitations of the claims and the claims as a whole have been found to be patent ineligible under 35 U.S.C. 101.



Response to 35 USC § 101 Arguments (Remarks filed 12/01/2025, pages 15-20) 
Applicant cancelled independent claim 93 and added new independent claim 217. No claims are amended. Claims 1, 4-5, 7, 10-14, 16-17, 21, 23-25, 28, 30- 31, 34, 40, 47, 139, and 217 are pending.
Applicant argues that (1) under Step 2A, Prong Two, the additional elements integrate any alleged judicial exception into a practical application, and (2) under Step 2B, the additional elements are non-routine and unconventional activity that amount to an inventive concept.
Under step 2A, prong 1, Applicant states that claim 1 recites limitations of "training, using the first plurality of reference sequence reads, a first probabilistic model to predict a likelihood that a given sequence read originates from one subject having the first disease state, the first probabilistic model associated with the first disease state," "training, using the second plurality of reference sequence reads, a second probabilistic model to predict a likelihood that a given sequence read originates from one subject having the second disease state, the second probabilistic model associated with the second disease state," and "training a machine-learning classifier with the feature sets of the training samples, wherein the machine-learning classifier is configured to predict a likelihood of presence of cancer in an individual based on a test sample comprising nucleic acid fragments obtained from the individual."  Applicant argues that these limitations cannot be characterized as directed to any abstract idea. Applicant states that the various limitations rely on mathematical principles, but do not themselves expressly recite the mathematical concepts. In particular, though these steps relating to training of various models (e.g., the probabilistic models and the machine-learning classifier) based in mathematics, the steps do not themselves expressly recite mathematical concepts. Applicant references Claim 1 of Example 48 of the USPTO July 2024 Subject Matter Eligibility Examples. Applicant indicated that Example 48, identifies the deployment of the DNN as an additional element beyond the abstract idea judicial exception, Id., p. 20. Applicant states that training of such models, as recited in the claims, is itself not a mathematical step and cannot be characterized as reciting any mathematical concepts.
In response, Applicant’s arguments are not persuasive. The training of the probabilistic models and the machine-learning classifier as recited in claims 1, 47 and 139 of the instant application are not considered as additional elements, but as the abstract idea of mathematical concepts for the 101 analysis. As indicated in the 05/30/2025 Final office action, claims 1, 47 and 139 of the instant application recite probabilistic model..., probability value..., training a machine-learning classifier... predict a likelihood that a given sequence read originates from one subject… and a likelihood... These elements are mathematical formulas and concepts and a series of calculations are required to determine the probability value and a likelihood. Also, the limitation of training probabilistic models and a machine-learning classifier are mathematical concepts as determined by the court in Recentive Analytics, Inc. V. Fox Corp. In Recentive Analytics, Inc. V. Fox Corp., the court found that Machine Learning Training are directed to abstract ideas at step one of Alice.

Applicant further argues that these limitations are not mental processes. Applicant discusses that the various steps of claim 1 relate to training of a computer-based models, rooted in computer functionality. Applicant states that the human mind is not structured to implement the computer-based models, formed by parameters and one or more functions for transforming an input into an output. Applicant references the Specification paragraphs [0162]-[0163] and further states that the human mind cannot practically perform the computations at the scale required for the machine-learning model. 

	In response, Applicant’s arguments are not persuasive. As mentioned, the steps of training machine learning models are mathematical concepts. The process of transforming an input into an output by machine learning models is a mathematical process because it would require performing a series of mathematical calculations. Although the machine learning model may be a computer based model and a general-purpose computer can perform calculations at a rate and accuracy that can far outstrip the mental performance of a skilled artisan, the nature of the activity is essentially the same, and constitutes an abstract idea. See Bancorp Serves., L.L. C. v. Sun Life Assur. Co. of Canada (U.S.), 687 F.3d 1266,1278 (Fed. Cir. 2012) (holding that “the fact that the required calculations could be performed more efficiently via a computer does not materially alter the patent eligibility of the claimed subject matter”); see also See SiRF Tech., Inc. v. Int’l Trade Comm ’n, 601 F.3d 1319,1333 (Fed. Cir. 2010) (holding that: In order for the addition of a machine to impose a meaningful limit on the scope of a claim, it must play a significant part in permitting the claimed method to be performed, rather than function solely as an obvious mechanism for permitting a solution to be achieved more quickly, i.e., through the utilization of a computer for performing calculations). As indicated in the 05/30/2025 Final office action, claims 1, 47 and 139 also recite the abstract idea of mental processes. For instance, the recited “predict a likelihood that a given sequence read originates from one subject having the first disease state…”, which is both a mental process and a mathematical concept. Predicting is a mental concept because predicting requires analyzing data and then making a judgement to determine the likelihood that a sequence read originates from a subject with the first disease state or second disease state. Predicting is a mathematical concept because a series of calculations is performed to determine likelihood of an event occurring. Another example of a mental process recited in claim 1 includes “determine a first probability value…” The process of determining is a mental process because it is involve with analyzing, evaluating and judging data. As discussed in MPEP 2106.04(a)(2)(III), mental process contain limitations that can practically be performed in the human mind, including for example, observations, evaluations, judgments, and opinions. Although, claim 47 recite performing the method as part of a method executed on a computer, there are no additional imitations to indicate that anything other than a generic computer is required.  However, merely requiring that the steps are carried out with a generic computer does not negate the mental nature of these steps and equates rather to merely using a computer as a tool to perform the mental process.  

Under step 2A, prong 2, Applicant argues that the additional elements mentioned under step 2A, prong 1 arguments integrate any of the alleged judicial exceptions into a practical application. Applicant argues that the additional elements embody an improvement to the technical field of machine-learning. Applicant states that the additional elements specify a multitiered machine-learning architecture. The additional elements specify leveraging at least two probabilistic models trained on separate, different training datasets from different subjects with different disease states. The probabilistic models are then deployed to output features for training samples in a separate training dataset. The output features for the training samples are then used in the training of the downstream machine-learning classifier. This architecture yields an improvement to machine-learning technology.

In response, Applicant’s arguments are not persuasive. As discussed in the Response to Arguments section of the office action mailed 05/30/2025, the additional elements of the independent claims 1, 47 and 139 as indicated above include data gathering steps that serves as inputs into the machine learning or probabilistic models. Examples of additional elements in claim 1 include generating reference sequence reads and generating a feature set that equates to data gathering activities (as indicated in the 101 rejections section). Data gathering is an insignificant extra solution activity. As explained by the Supreme Court, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood or conventional. (see MPEP 2106.05(g)). The claim elements do not integrate the JE into a practical application because the asserted improvement machine learning amounts to an improvement to the abstract idea of training probabilistic or machine learning model and is in itself an improvement to the JE.  The additional elements of the claims do not apply, rely on or use the JE in way that imposes a meaningful limit on the claims to provide an improvement. The process of training the machine learning and/or probabilistic models are mathematical concepts and the models itself are also mathematical concepts and formulas.  Overall, the claims of the instant application recite JEs that are not integrated into a practical application because the additional elements do not rely or use the JE in a meaningful way. Also, improvements to machine learning are improvements to the JE and the JE alone cannot provide for the improvement. As stated in MPEP 2106.05(a), the judicial exception alone cannot provide the technical improvement. The improvement can be provided by one or more additional elements as seen in Diamond v. Diehr, 450 U.S. 175, 187 and 191-92, 209 USPQ 1, 10 (1981)) in subsection II. In addition, the improvement can be provided by the additional element(s) in combination with the recited judicial exception as seen in Finjan, Inc. v. Blue Coat Sys., Inc., 879 F.3d 1299, 1303-04, 125 USPQ2d 1282, 1285-87 (Fed. Cir. 2018)).  
Also, Applicant’s argument of improvement is a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art. From the asserted improvement, it is not clear how the claimed invention improves over existing technology and it is also not clear how one would gauge the improvement since there are no metrics for comparison between the claimed technology and previous technology. Overall, one of ordinary skill in the art cannot gauge whether the improvements asserted are delivered by the claims because the details provided in the specification do not provide sufficient details such that the improvement would be apparent, do not explain the details of an unconventional technical solution expressed in the claim, or identify technical improvements realized by the claim over the prior art.  As stated in MPEP 2106.05(a) and MPEP 2106.04(d), the disclosure must provide sufficient details such that one of ordinary skill in the art would recognize the claimed invention as providing an improvement. Furthermore, if the specification explicitly sets forth an improvement but in a conclusory manner (i.e., a bare assertion of an improvement without the detail necessary to be apparent to a person of ordinary skill in the art), the examiner should not determine the claim improves technology. An indication that the claimed invention provides an improvement can include a discussion in the specification that identifies a technical problem and explains the details of an unconventional technical solution expressed in the claim, or identifies technical improvements realized by the claim over the prior art. (see MPEP 2106.05(a) and MPEP 2106.04(d)).  

Under Step 2B, Applicant argues that the additional elements are non-routine, unconventional, and not well understood activity in the technological field, thereby amounting to an inventive concept. Applicant argues that the additional elements are not widely prevalent and the prior art does not teach nor suggest the additional elements, specifically the machine-learning architecture contemplate in claim 1.

In response, Applicant’s arguments are not persuasive. As discussed in the Response to Arguments section of the office action mailed 05/30/2025, under step 2B of the 101 analysis the claims are evaluated to determine whether the claims recite a non-conventional arrangement of additional elements (i.e. elements in addition to any JE). The additional elements of the independent claims 1, 47 and 139 (as discussed above in the 101 rejections section) include data gathering steps that serves as inputs into the machine learning or probabilistic models. Examples of additional elements in claim 1 include generating reference sequence reads and generating a feature set that equates to data gathering activities (as indicated in the 101 rejections section). Data gathering is an insignificant extra solution activity. As explained by the Supreme Court, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood or conventional. (see MPEP 2106.05(g)). The method of generating sequence reads is known in the art as disclosed by Zhang. Zhang teaches generating sequences via nucleic acid amplification procedures which are well known in the art. (para. [0210]).  Additionally, the machine learning model as discussed previously is a mathematical concept and not an additional element. Furthermore, the machine learning architecture described in claim 1 appears to correspond to a machine learning ensemble that is well known and conventional as discussed by Re ("1 Ensemble methods: a review 3." (1); 2001.; cited on the attached 892 form).


Conclusion
	No claims are allowed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KETTIP KRIANGCHAIVECH whose telephone number is (571)272-1735.  The examiner can normally be reached 8:30am-5:00pm EDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Larry D. Riggs can be reached on (571) 270-3062.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/K.K./Examiner, Art Unit 1686                                                                                                                                                                                                        
/LARRY D RIGGS II/Supervisory Patent Examiner, Art Unit 1686
Read full office action
Prosecution Timeline

Show 8 earlier events
Sep 27, 2024
Non-Final Rejection mailed — §101, §102, §103
Feb 25, 2025
Applicant Interview (Telephonic)
Feb 25, 2025
Examiner Interview Summary
Feb 27, 2025
Response Filed
May 30, 2025
Final Rejection mailed — §101, §102, §103
Dec 01, 2025
Request for Continued Examination
Dec 02, 2025
Response after Non-Final Action
Feb 18, 2026
Non-Final Rejection mailed — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/988,965
Patent 12597484
TRAIT PREDICTION COORDINATION FOR GENOMIC APPLICATION ENVIRONMENT
5y 8m to grant Granted Apr 07, 2026
18/513,357
Patent 12584844
FLOW CYTOMETRY IMMUNOPROFILING OF PERIPHERAL BLOOD
2y 4m to grant Granted Mar 24, 2026
16/631,405
Patent 12512185
DNA-BASED DATA STORAGE AND RETRIEVAL
5y 11m to grant Granted Dec 30, 2025
16/347,104
Patent 12415981
AUTOMATED COLLECTION OF A SPECIFIED NUMBER OF CELLS
6y 4m to grant Granted Sep 16, 2025
16/237,959
Patent 12364989
HIGH THROUGHPUT METHOD AND SYSTEM FOR ANALYZING THE EFFECTS OF AGENTS ON PLANARIA
6y 6m to grant Granted Jul 22, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
21%
Grant Probability
54%
With Interview (+32.8%)
4y 8m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 48 resolved cases by this examiner. Grant probability derived from career allowance rate.