Office Action Analysis: 17776498 — CLASSIFIER MODELS TO PREDICT TISSUE OF ORIGIN FROM TARGETED TUMOR DNA SEQUENCING

Office Action

§101 §102 §103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
Claim(s) 1-20 are currently pending and under examination herein.
Claim(s) 1-20 are rejected.

Priority
Domestic priority is currently acknowledged to PCT/US20/59977 field on 11/11/2020, US Provisional Application 63104323 filed on 10/22/2020, and US Provisional Application 62934848 filed on 11/13/2019. The effective filing date of claims 1-20 in this action is therefore established to be 11/13/2019. In future actions, the effective filing date of one or more claims may change, due to amendments to the claims, or further analysis of the disclosures of the priority applications.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/12/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawling’s filed on 05/12/2022 are accepted.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 3-4, 10, and 20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, regards as the invention.

Claim 3 recites the limitation - a feature set for the predictive model comprises one or more categories. It is unclear if “categories” referenced in clam 3 are the same as the “gene alteration categories” referenced in claim 1. Claim 4 is also rejected because it depends from claim 3, and thus contains the above issues due to said dependence.
Claim 10 recites the limitation - labels indicate whether a set of genes in the training dataset is from a cancer subject in the cohort of study subjects. It is unclear if “cancer subject” refers to a subject with cancer, a specific type of cancer, or something else. Claim 20 also recites “cancer subject” and is indefinite for the same reason.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. In accordance with MPEP § 2106, claims found to recite statutory subject matter (Step 1: YES) are then analyzed to determine if the claims recite any concepts that equate to an abstract idea (Step 2A, Prong 1). In the instant application, the claims recite the following limitations that equate to an abstract idea:
Claim 1 recites the limitation - applying a predictive model to the subject sample dataset to generate one or more cancer origin site classifications, the predictive model having been trained using a training dataset generated from sequence reads corresponding to genetic material from a cohort of study subjects with known cancers, the training dataset comprising one or more genes, one or more gene alteration categories corresponding to the one or more genes, and one or more labels characterizing tumor origin sites for the known cancers of the study subjects in the cohort. Based on the broadest reasonable interpretation, applying a predictive model to generate a classification could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea. Additionally, this describes a natural correlation, which classifies the limitation as a law of nature. Claims 6-12 depend on claim 1, and thus contain the above issues due to said dependence.
Claim 2 recites the limitation - wherein the predictive model is a random forest classification model. Based on the broadest reasonable interpretation, using random forest as the predictive model could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea. Claim 3 depends on claim 2, and thus contain the above issues due to said dependence.
Claim 4 recites the limitation - wherein classifier scores for the predictive model were calibrated using multinomial logistic regression to match empirically observed classification probabilities. Based on the broadest reasonable interpretation, calibrating score using multinomial logistic regression could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea.
Claim 5 recites the limitation - training the predictive model using supervised or unsupervised learning. Based on the broadest reasonable interpretation, training the predictive model using supervised or unsupervised classification could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea.
Claim 13 recites the limitation - the predictive model is further configured to generate a confidence score for each cancer origin site classification. Based on the broadest reasonable interpretation, using the predictive model to generate a confidence score could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea. Claim 14 depends on claim 13, and thus contain the above issues due to said dependence.
Claim 15 recites the limitation - apply a predictive model to the subject sample dataset to generate one or more cancer origin site classifications, the predictive model having been trained using a training dataset generated using sequence reads corresponding to genetic material from a cohort of study subjects with known cancers, the training dataset comprising one or more genes, one or more gene alteration categories corresponding to the one or more genes, and one or more labels characterizing tumor origin sites for the known cancers of the study subjects in the cohort. Based on the broadest reasonable interpretation, applying a trained predictive model to generate a classification could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea. Additionally, this describes a natural correlation, which classifies the limitation as a law of nature. Claims 17-18 depend on claim 1, and thus contain the above issues due to said dependence.
Claim 16 recites the limitation - the predictive model is a random forest classification model. Based on the broadest reasonable interpretation, using a random forest as the predictive model could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea.
Claim 19 recites the limitation - the predictive model is further configured to generate a confidence score for each cancer origin site classification, wherein each confidence score corresponds to a likelihood of a cancer origin site for a tumor. Based on the broadest reasonable interpretation, the predictive model generating confidence scores could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea.
Claim 20 recites the limitation - train, using the plurality of sample genetic sequences, a classification model configured to generate likelihoods for corresponding cancer origin sites. Based on the broadest reasonable interpretation, training the model could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea. Claim 20 also recites apply the classification model to the genetic sequence to determine a set of likelihoods for a corresponding set of origin sites of cancers, each likelihood indicating a probability measure that the genetic sequence correlates with a presence of cancer at a corresponding origin site. Based on the broadest reasonable interpretation, applying the model to determine likelihoods could include equations that could practically be done by the human mind. This draws the limitation to a mathematical concept and a mental process, which classifies the limitation as an abstract idea. Additionally, this describes a natural correlation, which classifies the limitation as a law of nature.

These limitations recite concepts of analyzing, organizing and identifying information that are so generically recited that they can be practically performed in the human mind as claimed, which falls under the “Mental processes” and “Mathematical concepts” grouping of abstract ideas. These recitations are similar to the concepts of collecting information, analyzing it and displaying certain results of the collection and analysis in Electric Power Group, LLC, v. Alstom (830 F.3d 1350, 119 USPQ2d 1739 (Fed. Cir. 2016)), organizing and manipulating information through mathematical correlations in Digitech Image Techs., LLC v Electronics for Imaging, Inc. (758 F.3d 1344, 111 U.S.P.Q.2d 1717 (Fed. Cir. 2014)) and comparing information regarding a sample or test to a control or target data in Univ. of Utah Research Found. v. Ambry Genetics Corp. (774 F.3d 755, 113 U.S.P.Q.2d 1241 (Fed. Cir. 2014)) and Association for Molecular Pathology v. USPTO (689 F.3d 1303, 103 U.S.P.Q.2d 1681 (Fed. Cir. 2012)) that the courts have identified as concepts that can be practically performed in the human mind or mathematical relationships. Therefore, these limitations fall under the “Mental process” and “Mathematical concepts” groupings of abstract ideas. Additionally, the limitations describe natural correlations, which fall under natural laws. This is similar to a correlation between the presence of myeloperoxidase in a bodily sample (such as blood or plasma) and cardiovascular disease risk (Cleveland Clinic Foundation v. True Health Diagnostics, LLC, 859 F.3d 1352, 1361, 123 USPQ2d 1081, 1087 (Fed. Cir. 2017)) that the courts have identified as a law of nature. As such claims 1-20 recite an abstract idea and law of nature (Step 2A, Prong 1: YES).
Claims found to recite a judicial exception under Step 2A, Prong 1 are then further analyzed to determine if the claims as a whole integrate the recited judicial exception into a practical application or not (Step 2A, Prong 2). These judicial exceptions are not integrated into a practical application because the claims do not recite an additional element that reflects an improvement to technology (MPEP § 2106.04(d)(1)). Rather, the claims provide insignificant extra-solution activity (MPEP § 2106.05(g)) and provide mere instructions to apply a judicial exception (MPEP § 2106.05(f)). Specifically, the claims recite the following additional elements:
Claim 1 recites a method for classifying tumor origin sites, the method comprising sequencing genetic material in a tissue sample from a subject to generate a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories; and storing, in one or more data structures, an association between the subject and the one or more cancer origin site classifications.
Claim 3 recites wherein a feature set for the predictive model comprises one or more categories selected from a group consisting of mutations, indels, focal amplifications and deletions, broad copy number gains and losses, structural rearrangements, mutation signatures, mutation rate, and sex.
Claim 6 recites generating the training dataset.
Claim 7 recites generating the training dataset further comprises acquiring, from a sequencing device, the sequence reads corresponding to the genetic material from the cohort of study subjects, and using the sequence reads to generate the training dataset.
Claim 8 recites the cohort excludes study subjects with rare cancers not in the top 30 most common cancer types.
Claim 9 recites the training dataset comprises gene alteration categories comprising one or more selected from a group consisting of gene amplification (AMP), chromosome gain, homozygous deletion, hotspot, allele, chromosome loss, promoter, signature, structural variant (SV), truncation, and variant of unknown significance (VUS).
Claim 10 recites the one or more labels indicate whether a set of genes in the training dataset is from a cancer subject in the cohort of study subjects.
Claim 11 recites the predictive model is configured to accept data on genes and gene alterations as inputs and to provide one or more cancer origin site classifications as output.
Claim 12 recites the one or more cancer origin site classifications identify at least one of an internal organ of the subject or a cancer type.
Claim 14 recites each confidence score corresponds with a likelihood of a cancer origin site for a tumor.
Claim 15 recites A system for classifying tumor origin sites, the system comprising a computing device having one or more processors configured to: acquire, from a sequencing device, sequence reads corresponding to genetic material in a tissue sample from a subject; generate, using the sequence reads, a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories; and store, in one or more data structures, an association between the subject and the one or more cancer origin site classifications.
Claim 17 recites the one or more processors are further configured to train the predictive model such that it is configured to accept data on genes and gene alterations as inputs and to provide one or more cancer origin site classifications as output.
Claim 18 recites the one or more processors are further configured to generate the training dataset using the sequence reads corresponding to the genetic material from the study subjects in the cohort.
Claim 20 recites A system for determining sites of origin for cancer based on sequencing of genes, the system comprising one or more processors configured to: obtain a training dataset comprising a plurality of sample-derived genetic sequences corresponding to a plurality of cancer subjects, each sample defining a set of genes and a category, the category of each sample defining at least one alteration to the set of genes and/or at least one genomic alteration in the sample; and acquire, via a sequencer, a genetic sequence corresponding to a subject, the genetic sequence including a set of genes and a category, the category of the genetic sequence defining a nature of alteration to the set of genes in the genetic sequence.

There are no limitations that indicate that the claimed modeling, training, and calibrating require anything other than generic computing systems. As such, these limitations equate to mere instructions to implement the abstract idea on a generic computer that the courts have stated does not render an abstract idea eligible in Alice Corp., 573 U.S. at 223, 110 USPQ2d at 1983. See also 573 U.S. at 224, 110 USPQ2d at 1984. There is no indication that these steps are affected by the judicial exception in any way and thus do not integrate the recited judicial exception into a practical application. As such, claims 1-20 are directed to an abstract idea and natural law (Step 2A, Prong 2: NO).
Claims found to be directed to a judicial exception are then further evaluated to determine if the claims recite an inventive concept that provides significantly more than the judicial exception itself (Step 2B). The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims recite conventional additional elements that equate to mere instructions to apply the recited exception in a generic way or in a generic computing environment. The claims also recite conventional additional elements that represent insignificant extra-solution activities. The instant claims recite the following additional elements:
Claim 1 recites a method for classifying tumor origin sites, the method comprising sequencing genetic material in a tissue sample from a subject to generate a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories; and storing, in one or more data structures, an association between the subject and the one or more cancer origin site classifications.
Claim 3 recites wherein a feature set for the predictive model comprises one or more categories selected from a group consisting of mutations, indels, focal amplifications and deletions, broad copy number gains and losses, structural rearrangements, mutation signatures, mutation rate, and sex.
Claim 6 recites generating the training dataset.
Claim 7 recites generating the training dataset further comprises acquiring, from a sequencing device, the sequence reads corresponding to the genetic material from the cohort of study subjects, and using the sequence reads to generate the training dataset.
Claim 8 recites the cohort excludes study subjects with rare cancers not in the top 30 most common cancer types.
Claim 9 recites the training dataset comprises gene alteration categories comprising one or more selected from a group consisting of gene amplification (AMP), chromosome gain, homozygous deletion, hotspot, allele, chromosome loss, promoter, signature, structural variant (SV), truncation, and variant of unknown significance (VUS).
Claim 10 recites the one or more labels indicate whether a set of genes in the training dataset is from a cancer subject in the cohort of study subjects.
Claim 11 recites the predictive model is configured to accept data on genes and gene alterations as inputs and to provide one or more cancer origin site classifications as output.
Claim 12 recites the one or more cancer origin site classifications identify at least one of an internal organ of the subject or a cancer type.
Claim 14 recites each confidence score corresponds with a likelihood of a cancer origin site for a tumor.
Claim 15 recites A system for classifying tumor origin sites, the system comprising a computing device having one or more processors configured to: acquire, from a sequencing device, sequence reads corresponding to genetic material in a tissue sample from a subject; generate, using the sequence reads, a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories; and store, in one or more data structures, an association between the subject and the one or more cancer origin site classifications.
Claim 17 recites the one or more processors are further configured to train the predictive model such that it is configured to accept data on genes and gene alterations as inputs and to provide one or more cancer origin site classifications as output.
Claim 18 recites the one or more processors are further configured to generate the training dataset using the sequence reads corresponding to the genetic material from the study subjects in the cohort.
Claim 20 recites A system for determining sites of origin for cancer based on sequencing of genes, the system comprising one or more processors configured to: obtain a training dataset comprising a plurality of sample-derived genetic sequences corresponding to a plurality of cancer subjects, each sample defining a set of genes and a category, the category of each sample defining at least one alteration to the set of genes and/or at least one genomic alteration in the sample; and acquire, via a sequencer, a genetic sequence corresponding to a subject, the genetic sequence including a set of genes and a category, the category of the genetic sequence defining a nature of alteration to the set of genes in the genetic sequence.

As discussed above, there are no additional limitations to indicate that the claimed modeling, training, and calibrating require anything other than generic computer components in order to carry out the recited abstract idea in the claims. Claims that amount to nothing more than an instruction to apply the abstract idea or natural law using a generic computer do not render an abstract idea or natural law eligible. Alice Corp., 573 U.S. at 223, 110 USPQ2d at 1983. See also 573 U.S. at 224, 110 USPQ2d at 1984. MPEP 2106.05(f) discloses that mere instructions to apply the judicial exception cannot provide an inventive concept to the claims. As specified in MPEP 2106.05(g), extra-solution activities can be understood as incidental to the primary process or product that are merely a nominal or tangential addition to the claim. Insignificant extra-solution activities include mere data gathering, selecting a particular data source or type of data to be manipulated, and displaying information. Additionally, Shendure et al. (2017, Nature, Vol. 550: 345–353) teaches that sequencing DNA (Page 346, Column 2, Paragraph 4: By 2004, creating additional improvements was an increasingly marginal exercise) and manipulating or storing data derived from sequenced DNA (Page 345, Column 2, Paragraph 4: Sequence data grew exponentially, motivating the creation of central data repositories that engendered a spirit of data sharing) are conventional laboratory practices.
The additional elements do not comprise an inventive concept when considered individually or as an ordered combination that transforms the claimed judicial exception into a patent-eligible application of the judicial exception. Therefore, the claims do not amount to significantly more than the judicial exception itself (Step 2B: No). As such, claims 1-20 are not patent eligible.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-3, 6, and 9-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Marquard et al. (2015, BMC Medical Genomics, Vol. 8, No. 58: 1-13, IDS document 05/12/2022). Italicized text from reference art.

Below the list of applicable claims is reproduced:
Claim 1. A method for classifying tumor origin sites, the method comprising: i. sequencing genetic material in a tissue sample from a subject to generate a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories; ii. applying a predictive model to the subject sample dataset to generate one or more cancer origin site classifications, the predictive model having been trained using a training dataset generated from sequence reads corresponding to genetic material from a cohort of study subjects with known cancers, the training dataset comprising one or more genes, one or more gene alteration categories corresponding to the one or more genes, and one or more labels characterizing tumor origin sites for the known cancers of the study subjects in the cohort; and iii. storing, in one or more data structures, an association between the subject and the one or more cancer origin site classifications.
Claim 2. The method of claim 1, wherein the predictive model is a random forest classification model.
Claim 3. The method of claim 2, wherein a feature set for the predictive model comprises one or more categories selected from a group consisting of mutations, indels, focal amplifications and deletions, broad copy number gains and losses, structural rearrangements, mutation signatures, mutation rate, and sex.
Claim 6. The method of claim 1, further comprising generating the training dataset.
Claim 9. The method of claim 1, wherein the training dataset comprises gene alteration categories comprising one or more selected from a group consisting of gene amplification (AMP), chromosome gain, homozygous deletion, hotspot, allele, chromosome loss, promoter, signature, structural variant (SV), truncation, and variant of unknown significance (VUS).
Claim 10. The method of claim 1, wherein the one or more labels indicate whether a set of genes in the training dataset is from a cancer subject in the cohort of study subjects.
Claim 11. The method of claim 1, wherein the predictive model is configured to accept data on genes and gene alterations as inputs and to provide one or more cancer origin site classifications as output.
Claim 12. The method of claim 11, wherein the one or more cancer origin site classifications identify at least one of an internal organ of the subject or a cancer type.
Claim 13. The method of claim 11, wherein the predictive model is further configured to generate a confidence score for each cancer origin site classification.
Claim 14. The method of claim 13, wherein each confidence score corresponds with a likelihood of a cancer origin site for a tumor.
Claim 15. A system for classifying tumor origin sites, the system comprising a computing device having one or more processors configured to: i. acquire, from a sequencing device, sequence reads corresponding to genetic material in a tissue sample from a subject; ii. generate, using the sequence reads, a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories; iii. apply a predictive model to the subject sample dataset to generate one or more cancer origin site classifications, the predictive model having been trained using a training dataset generated using sequence reads corresponding to genetic material from a cohort of study subjects with known cancers, the training dataset comprising one or more genes, one or more gene alteration categories corresponding to the one or more genes, and one or more labels characterizing tumor origin sites for the known cancers of the study subjects in the cohort; and iv. store, in one or more data structures, an association between the subject and the one or more cancer origin site classifications.
Claim 16. The system of claim 15, wherein the predictive model is a random forest classification model.
Claim 17. The system of claim 15, wherein the one or more processors are further configured to train the predictive model such that it is configured to accept data on genes and gene alterations as inputs and to provide one or more cancer origin site classifications as output.
Claim 18. The system of claim 15, wherein the one or more processors are further configured to generate the training dataset using the sequence reads corresponding to the genetic material from the study subjects in the cohort.
Claim 19. The system of claim 15, wherein the predictive model is further configured to generate a confidence score for each cancer origin site classification, wherein each confidence score corresponds to a likelihood of a cancer origin site for a tumor.
Claim 20. A system for determining sites of origin for cancer based on sequencing of genes, the system comprising one or more processors configured to: i. obtain a training dataset comprising a plurality of sample-derived genetic sequences corresponding to a plurality of cancer subjects, each sample defining a set of genes and a category, the category of each sample defining at least one alteration to the set of genes and/or at least one genomic alteration in the sample; ii. train, using the plurality of sample genetic sequences, a classification model configured to generate likelihoods for corresponding cancer origin sites; iii. acquire, via a sequencer, a genetic sequence corresponding to a subject, the genetic sequence including a set of genes and a category, the category of the genetic sequence defining a nature of alteration to the set of genes in the genetic sequence; and iv. apply the classification model to the genetic sequence to determine a set of likelihoods for a corresponding set of origin sites of cancers, each likelihood indicating a probability measure that the genetic sequence correlates with a presence of cancer at a corresponding origin site.

Regarding Claim 1, Marquard et al. teaches (Claim 1.i) sequencing genetic material in a tissue sample from a subject to generate a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories (Page 4, Column 1, Paragraph 2: Samples were paired-end multiplex sequenced. Somatic variants between tumour and matched germ-line were determined). Marquard et al. also teaches (Claim 1.ii) applying a predictive model to the subject sample dataset to generate a cancer origin (Page 7, Column 2, Paragraph 3: we applied the classifier to point mutation calls from whole exome sequencing non-small cell lung cancer patients in a cohort study). Marquard et al. also teaches (Claim 1.ii) the predictive model having been trained using a dataset generated from sequences from a cohort of study subjects with known cancers, the training dataset comprising genes, gene alteration categories, labels characterizing tumor origin sites (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position). Marquard et al. also teaches (Claim 1.iii) storing an association between the subject and a cancer origin in data structures (Page 8, Column 1, Paragraph 1: When specimens were analysed individually, we found that the majority of the subregions and metastases were proposed to be of the same origin as the pooled specimens).
Regarding Claim 2, Marquard et al. teaches the predictive model is a random forest classification model (Page 3, Column 1, Paragraph 6: We considered four commonly used machine learning methods including random forests. We found that random forests provided the best performance). 
Regarding Claim 3, Marquard et al. teaches the features for the predictive model include mutations (non-synonymous mutations), indels (non-synonymous mutations), focal amplifications and deletions (copy number aberrations), broad copy number gains and losses (copy number aberrations), and mutation rate (base substitution frequency) (Page 3, Column 1, Paragraphs 2-5).
Regarding Claim 6, Marquard et al. teaches generating the training dataset (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position).
Regarding Claim 9, Marquard et al. teaches the training dataset comprises gene alteration categories including chromosome gain (copy number aberrations), hotspot (base substitution frequency), chromosome loss (copy number aberrations), structural variant (non-synonymous mutations), truncation (trinucleotide base substitution frequency), and variant of unknown significance (non-synonymous mutations) (Page 3, Column 1, Paragraphs 2-5).
Regarding Claim 10, Marquard et al. teaches the labels indicate whether a set of genes in the training dataset is from a cancer subject in the cohort of study subject (Page 2, Column 2, Paragraph 3: Sample ID - the same sample name matched to more than one tumor ID).
Regarding Claim 11, Marquard et al. teaches the predictive model is configured to accept data on genes and gene alterations as inputs (Page 5, Figure 1: Somatic point mutation data is used to determine the mutation status of a set of cancer genes and to calculate classes of base substitutions) and to provide one or more cancer origin site classifications as output (Page 3, Column 2, Paragraph 1: When applied to a new data sample, we define the classification score as the proportion of the trees that voted for the given primary site).
Regarding Claim 12, Marquard et al. teaches the cancer origin site classifications identify at least one of an internal organ of the subject or a cancer type (Page 4, Column 2, Paragraph 1: Primary sites covered by both classifiers are breast, endometrium, kidney, large intestine, lung and ovary).
Regarding Claim 13, Marquard et al. teaches the predictive model is further configured to generate a confidence score for each cancer origin site classification (Page 6, Column 2, Paragraph 1: We found that the confidence score was indeed a strong indicator of accuracy, and that a large fraction of tumors could be classified with high confidence).
Regarding Claim 14, Marquard et al. teaches that each confidence score corresponds with a likelihood of a cancer origin site for a tumor (Page 6, Column 2, Paragraph 1: We found that the confidence score was indeed a strong indicator of accuracy, and that a large fraction of tumors could be classified with high confidence).
Regarding Claim 15, Marquard et al. teaches (Claim 15.i) a system to acquire, from a sequencing device, sequence reads corresponding to genetic material in a tissue sample from a subject (Page 4, Column 1, Paragraph 2: Raw paired end reads in FastQ format generated by the Illumina pipeline were aligned – software listed indicates the implementation on a computer). Marquard et al. also teaches (Claim 15.ii) a system to generate, using the sequence reads, a dataset comprising subject genes and gene alteration categories (Page 4, Column 1, Paragraph 2: Somatic variants between tumor and matched germ-line were determined – software listed indicates the implementation on a computer). Marquard et al. also teaches (Claim 15.iii) a system to apply a predictive model to the subject sample dataset to generate cancer origin sites, the predictive model having been trained using a training dataset generated using sequence reads corresponding to genetic material from a cohort of study subjects with known cancers (Page 7, Column 2, Paragraph 3: we applied the classifier to point mutation calls from whole exome sequencing non-small cell lung cancer patients in a cohort study – software listed in the methods indicates the implementation on a computer). Marquard et al. also teaches (Claim 15.iii) a system wherein the training dataset comprises genes, alteration categories corresponding to the genes, and labels characterizing tumor origin sites (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position. Software listed in the methods indicate the implementation on a computer). Marquard et al. also teaches (Claim 15.iv) store, in one or more data structures, an association between the subject and the one or more cancer origin site classifications (Page 8, Column 1, Paragraph 1: When specimens were analysed individually, we found that the majority of the subregions and metastases were proposed to be of the same origin as the pooled specimens. Software listed in the methods indicate the implementation on a computer).
Regarding Claim 16, Marquard et al. teaches the predictive model is a random forest classification model (Page 3, Column 1, Paragraph 6: We considered four commonly used machine learning methods including random forests. We found that random forests provided the best performance. Software listed indicates the implementation on a computer). 
Regarding Claim 17, Marquard et al. teaches the processors are configured to train the predictive model such that it is configured to accept data on genes and gene alterations as inputs (Page 5, Figure 1: Somatic point mutation data is used to determine the mutation status of a set of cancer genes and to calculate classes of base substitutions - software listed indicates the implementation on a computer) and to provide one or more cancer origin site classifications as output (Page 3, Column 2, Paragraph 1: When applied to a new data sample, we define the classification score as the proportion of the trees that voted for the given primary site - software listed indicates the implementation on a computer).
Regarding Claim 18, Marquard et al. teaches the processors are configured to generate the training dataset using the sequence reads corresponding to the genetic material from the study subjects in the cohort (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position. Software listed in the methods indicate the implementation on a computer).
Regarding Claim 19, Marquard et al. teaches the predictive model is further configured to generate a confidence score for each cancer origin site classification, wherein each confidence score corresponds to a likelihood of a cancer origin site for a tumor (Page 6, Column 2, Paragraph 1: We found that the confidence score was indeed a strong indicator of accuracy, and that a large fraction of tumors could be classified with high confidence – software indicates the implantation on a computer).
Regarding Claim 20, Marquard et al. teaches (Claim 20.i) a system to obtain a training dataset comprising genetic sequences corresponding to cancer subjects, each sample defining a set of genes and a category, the category includes alteration to the set of genes or at least one genomic alteration in the sample (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position. Software listed in the methods indicate the implementation on a computer). Marquard et al. also teaches (Claim 20.ii) a system to train, using the plurality of sample genetic sequences, a classification model configured to generate likelihoods for corresponding cancer origin sites (Page 3, Column 2, Paragraph 3: Random forest classifiers were trained. When applied to a new data sample, classification score is defined as the proportion of the trees that voted for the given primary site). Marquard et al. also teaches (Claim 20.iii) a system to acquire, via a sequencer, a genetic sequence corresponding to a subject, the genetic sequence including a set of genes and a category, the category of the genetic sequence defining a nature of alteration to the set of genes in the genetic sequence (Page 4, Column 1, Paragraph 2: Samples were paired-end multiplex sequenced. Somatic variants between tumour and matched germ-line were determined). Marquard et al. also teaches (Claim 20.iv) a system to apply the classification model to the genetic sequence to determine a set of likelihoods for a set of origin sites, each likelihood indicating a probability measure that the genetic sequence correlates with a presence of cancer at a corresponding origin site (Page 3, Column 2, Paragraph 1: When applied to a data sample, the classification score is defined as the proportion of the trees that voted for the given primary site).
Therefore, claims 1-3, 6, and 9-20 are rejected under 35 U.S.C. 102(a)(1).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-6 and 9-20 are rejected under 35 U.S.C. 103 as being unpatentable over Marquard et al. (2015, BMC Medical Genomics, Vol.8, No 58: 1-13, IDS document 05/12/2022), as applied to Claims 1-3, 6, 9-20 in the 35 USC 102 rejection above, in view of Osisanwo et al. (2017, International Journal of Computer Trends and Technology, Vol. 48, No. 3: 128-138). Italicized text from reference art.

Below the list of applicable claims is reproduced:
Claim 1. A method for classifying tumor origin sites, the method comprising: i. sequencing genetic material in a tissue sample from a subject to generate a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories; ii. applying a predictive model to the subject sample dataset to generate one or more cancer origin site classifications, the predictive model having been trained using a training dataset generated from sequence reads corresponding to genetic material from a cohort of study subjects with known cancers, the training dataset comprising one or more genes, one or more gene alteration categories corresponding to the one or more genes, and one or more labels characterizing tumor origin sites for the known cancers of the study subjects in the cohort; and iii. storing, in one or more data structures, an association between the subject and the one or more cancer origin site classifications.
Claim 2. The method of claim 1, wherein the predictive model is a random forest classification model.
Claim 3. The method of claim 2, wherein a feature set for the predictive model comprises one or more categories selected from a group consisting of mutations, indels, focal amplifications and deletions, broad copy number gains and losses, structural rearrangements, mutation signatures, mutation rate, and sex.
Claim 4. The method of claim 3, wherein classifier scores for the predictive model were calibrated using multinomial logistic regression to match empirically observed classification probabilities. 
Claim 5. The method of claim 1, further comprising training the predictive model using supervised or unsupervised learning.
Claim 6. The method of claim 1, further comprising generating the training dataset.
Claim 9. The method of claim 1, wherein the training dataset comprises gene alteration categories comprising one or more selected from a group consisting of gene amplification (AMP), chromosome gain, homozygous deletion, hotspot, allele, chromosome loss, promoter, signature, structural variant (SV), truncation, and variant of unknown significance (VUS).
Claim 10. The method of claim 1, wherein the one or more labels indicate whether a set of genes in the training dataset is from a cancer subject in the cohort of study subjects.
Claim 11. The method of claim 1, wherein the predictive model is configured to accept data on genes and gene alterations as inputs and to provide one or more cancer origin site classifications as output.
Claim 12. The method of claim 11, wherein the one or more cancer origin site classifications identify at least one of an internal organ of the subject or a cancer type.
Claim 13. The method of claim 11, wherein the predictive model is further configured to generate a confidence score for each cancer origin site classification.
Claim 14. The method of claim 13, wherein each confidence score corresponds with a likelihood of a cancer origin site for a tumor.
Claim 15. A system for classifying tumor origin sites, the system comprising a computing device having one or more processors configured to: i. acquire, from a sequencing device, sequence reads corresponding to genetic material in a tissue sample from a subject; ii. generate, using the sequence reads, a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories; iii. apply a predictive model to the subject sample dataset to generate one or more cancer origin site classifications, the predictive model having been trained using a training dataset generated using sequence reads corresponding to genetic material from a cohort of study subjects with known cancers, the training dataset comprising one or more genes, one or more gene alteration categories corresponding to the one or more genes, and one or more labels characterizing tumor origin sites for the known cancers of the study subjects in the cohort; and iv. store, in one or more data structures, an association between the subject and the one or more cancer origin site classifications.
Claim 16. The system of claim 15, wherein the predictive model is a random forest classification model.
Claim 17. The system of claim 15, wherein the one or more processors are further configured to train the predictive model such that it is configured to accept data on genes and gene alterations as inputs and to provide one or more cancer origin site classifications as output.
Claim 18. The system of claim 15, wherein the one or more processors are further configured to generate the training dataset using the sequence reads corresponding to the genetic material from the study subjects in the cohort.
Claim 19. The system of claim 15, wherein the predictive model is further configured to generate a confidence score for each cancer origin site classification, wherein each confidence score corresponds to a likelihood of a cancer origin site for a tumor.
Claim 20. A system for determining sites of origin for cancer based on sequencing of genes, the system comprising one or more processors configured to: i. obtain a training dataset comprising a plurality of sample-derived genetic sequences corresponding to a plurality of cancer subjects, each sample defining a set of genes and a category, the category of each sample defining at least one alteration to the set of genes and/or at least one genomic alteration in the sample; ii. train, using the plurality of sample genetic sequences, a classification model configured to generate likelihoods for corresponding cancer origin sites; iii. acquire, via a sequencer, a genetic sequence corresponding to a subject, the genetic sequence including a set of genes and a category, the category of the genetic sequence defining a nature of alteration to the set of genes in the genetic sequence; and iv. apply the classification model to the genetic sequence to determine a set of likelihoods for a corresponding set of origin sites of cancers, each likelihood indicating a probability measure that the genetic sequence correlates with a presence of cancer at a corresponding origin site.

Regarding Claim 1, Marquard et al. teaches (Claim 1.i) sequencing genetic material in a tissue sample from a subject to generate a subject sample dataset comprising one or more subject genes and one or more subject gene alteration categories (Page 4, Column 1, Paragraph 2: Samples were paired-end multiplex sequenced. Somatic variants between tumour and matched germ-line were determined). Marquard et al. also teaches (Claim 1.ii) applying a predictive model to the subject sample dataset to generate a cancer origin (Page 7, Column 2, Paragraph 3: we applied the classifier to point mutation calls from whole exome sequencing non-small cell lung cancer patients in a cohort study). Marquard et al. also teaches (Claim 1.ii) the predictive model having been trained using a dataset generated from sequences from a cohort of study subjects with known cancers, the training dataset comprising genes, gene alteration categories, labels characterizing tumor origin sites (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position). Marquard et al. also teaches (Claim 1.iii) storing an association between the subject and a cancer origin in data structures (Page 8, Column 1, Paragraph 1: When specimens were analysed individually, we found that the majority of the subregions and metastases were proposed to be of the same origin as the pooled specimens).
Regarding Claim 2, Marquard et al. teaches the predictive model is a random forest classification model (Page 3, Column 1, Paragraph 6: We considered four commonly used machine learning methods including random forests. We found that random forests provided the best performance). 
Regarding Claim 3, Marquard et al. teaches the features for the predictive model include mutations (non-synonymous mutations), indels (non-synonymous mutations), focal amplifications and deletions (copy number aberrations), broad copy number gains and losses (copy number aberrations), and mutation rate (base substitution frequency) (Page 3, Column 1, Paragraphs 2-5).
Regarding Claim 6, Marquard et al. teaches generating the training dataset (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position).
Regarding Claim 9, Marquard et al. teaches the training dataset comprises gene alteration categories including chromosome gain (copy number aberrations), hotspot (base substitution frequency), chromosome loss (copy number aberrations), structural variant (non-synonymous mutations), truncation (trinucleotide base substitution frequency), and variant of unknown significance (non-synonymous mutations) (Page 3, Column 1, Paragraphs 2-5).
Regarding Claim 10, Marquard et al. teaches the labels indicate whether a set of genes in the training dataset is from a cancer subject in the cohort of study subject (Page 2, Column 2, Paragraph 3: Sample ID - the same sample name matched to more than one tumor ID).
Regarding Claim 11, Marquard et al. teaches the predictive model is configured to accept data on genes and gene alterations as inputs (Page 5, Figure 1: Somatic point mutation data is used to determine the mutation status of a set of cancer genes and to calculate classes of base substitutions) and to provide one or more cancer origin site classifications as output (Page 3, Column 2, Paragraph 1: When applied to a new data sample, we define the classification score as the proportion of the trees that voted for the given primary site).
Regarding Claim 12, Marquard et al. teaches the cancer origin site classifications identify at least one of an internal organ of the subject or a cancer type (Page 4, Column 2, Paragraph 1: Primary sites covered by both classifiers are breast, endometrium, kidney, large intestine, lung and ovary).
Regarding Claim 13, Marquard et al. teaches the predictive model is further configured to generate a confidence score for each cancer origin site classification (Page 6, Column 2, Paragraph 1: We found that the confidence score was indeed a strong indicator of accuracy, and that a large fraction of tumors could be classified with high confidence).
Regarding Claim 14, Marquard et al. teaches that each confidence score corresponds with a likelihood of a cancer origin site for a tumor (Page 6, Column 2, Paragraph 1: We found that the confidence score was indeed a strong indicator of accuracy, and that a large fraction of tumors could be classified with high confidence).
Regarding Claim 15, Marquard et al. teaches (Claim 15.i) a system to acquire, from a sequencing device, sequence reads corresponding to genetic material in a tissue sample from a subject (Page 4, Column 1, Paragraph 2: Raw paired end reads in FastQ format generated by the Illumina pipeline were aligned – software listed indicates the implementation on a computer). Marquard et al. also teaches (Claim 15.ii) a system to generate, using the sequence reads, a dataset comprising subject genes and gene alteration categories (Page 4, Column 1, Paragraph 2: Somatic variants between tumor and matched germ-line were determined – software listed indicates the implementation on a computer). Marquard et al. also teaches (Claim 15.iii) a system to apply a predictive model to the subject sample dataset to generate cancer origin sites, the predictive model having been trained using a training dataset generated using sequence reads corresponding to genetic material from a cohort of study subjects with known cancers (Page 7, Column 2, Paragraph 3: we applied the classifier to point mutation calls from whole exome sequencing non-small cell lung cancer patients in a cohort study – software listed in the methods indicates the implementation on a computer). Marquard et al. also teaches (Claim 15.iii) a system wherein the training dataset comprises genes, alteration categories corresponding to the genes, and labels characterizing tumor origin sites (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position. Software listed in the methods indicate the implementation on a computer). Marquard et al. also teaches (Claim 15.iv) store, in one or more data structures, an association between the subject and the one or more cancer origin site classifications (Page 8, Column 1, Paragraph 1: When specimens were analysed individually, we found that the majority of the subregions and metastases were proposed to be of the same origin as the pooled specimens. Software listed in the methods indicate the implementation on a computer).
Regarding Claim 16, Marquard et al. teaches the predictive model is a random forest classification model (Page 3, Column 1, Paragraph 6: We considered four commonly used machine learning methods including random forests. We found that random forests provided the best performance. Software listed indicates the implementation on a computer). 
Regarding Claim 17, Marquard et al. teaches the processors are configured to train the predictive model such that it is configured to accept data on genes and gene alterations as inputs (Page 5, Figure 1: Somatic point mutation data is used to determine the mutation status of a set of cancer genes and to calculate classes of base substitutions - software listed indicates the implementation on a computer) and to provide one or more cancer origin site classifications as output (Page 3, Column 2, Paragraph 1: When applied to a new data sample, we define the classification score as the proportion of the trees that voted for the given primary site - software listed indicates the implementation on a computer).
Regarding Claim 18, Marquard et al. teaches the processors are configured to generate the training dataset using the sequence reads corresponding to the genetic material from the study subjects in the cohort (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position. Software listed in the methods indicate the implementation on a computer).
Regarding Claim 19, Marquard et al. teaches the predictive model is further configured to generate a confidence score for each cancer origin site classification, wherein each confidence score corresponds to a likelihood of a cancer origin site for a tumor (Page 6, Column 2, Paragraph 1: We found that the confidence score was indeed a strong indicator of accuracy, and that a large fraction of tumors could be classified with high confidence – software indicates the implantation on a computer).
Regarding Claim 20, Marquard et al. teaches (Claim 20.i) a system to obtain a training dataset comprising genetic sequences corresponding to cancer subjects, each sample defining a set of genes and a category, the category includes alteration to the set of genes or at least one genomic alteration in the sample (Page 2, Column 2, Paragraphs 3-4: Somatic mutation training data - The resulting data set consisted of 7,769 specimens from 28 different primary sites. Each row corresponded to a single unique mutation identified by its genomic position. Software listed in the methods indicate the implementation on a computer). Marquard et al. also teaches (Claim 20.ii) a system to train, using the plurality of sample genetic sequences, a classification model configured to generate likelihoods for corresponding cancer origin sites (Page 3, Column 2, Paragraph 3: Random forest classifiers were trained. When applied to a new data sample, classification score is defined as the proportion of the trees that voted for the given primary site). Marquard et al. also teaches (Claim 20.iii) a system to acquire, via a sequencer, a genetic sequence corresponding to a subject, the genetic sequence including a set of genes and a category, the category of the genetic sequence defining a nature of alteration to the set of genes in the genetic sequence (Page 4, Column 1, Paragraph 2: Samples were paired-end multiplex sequenced. Somatic variants between tumour and matched germ-line were determined). Marquard et al. also teaches (Claim 20.iv) a system to apply the classification model to the genetic sequence to determine a set of likelihoods for a set of origin sites, each likelihood indicating a probability measure that the genetic sequence correlates with a presence of cancer at a corresponding origin site (Page 3, Column 2, Paragraph 1: When applied to a data sample, the classification score is defined as the proportion of the trees that voted for the given primary site).

Marquard et al. does not teach the classifier scores for the predictive model were calibrated using multinomial logistic regression to match empirically observed classification probabilities (Claim 4). Marquard et al. also does not teach training the predictive model using supervised or unsupervised learning (Claim 5).

Regarding Claim 4, Osisanwo et al. teaches calibrating the classifier scores for the predictive model using multinomial logistic regression to match empirically observed classification probabilities (Page 129, Column 2, Paragraph 5 and Page 130, Column 1, Paragraph 1: The logistic regression function is used for class probabilities and makes stronger more detailed predictions).
Regarding Claim 5, Osisanwo et al. teaches training the predictive model using supervised (Page 129, Column 2, Paragraph 3: supervised machine learning algorithms which deal with classification include random forest) and unsupervised learning (Page 130, Column 2, Paragraph 2: K means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem).
It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to modify Marquard et al. with Osisanwo et al. because Marquard et al. suggests calibrating classifiers to reflect tissue specific probability of the tumor being detected (Page 11, Column 2, Paragraph 3). Osisanwo et al. teaches a method of calibrating (logistic regression) that is tailored for probabilities (see “Regarding Claim 4” of the present rejection). Additionally, Marquard et al. tests several machine learning algorithms to predict cancer origin (Page 3, Column 1, Paragraph 6). Osisanwo et al. teaches multiple learning algorithms, and includes classifying them into supervised and unsupervised (see “Regarding Claim 5” of the present rejection), which are applicable to making predictions. Therefore, it would have been obvious to someone of ordinary skill in the art the time of the effective filling date to combine the methods from both of the references indicated above. 
Furthermore, one of ordinary skill in the art would predict that the method taught by Osisanwo et al. could be readily added to the methods of Marquard et al.  with a reasonable expectation of success because they utilize both utile the same data analysis approach, machine learning algorithms, that can be used for making predictions. Accordingly, claims 1-6 and 9-20 taken as a whole would have been prima facie obvious before the effective filing date and are rejected under 35 U.S.C. 103.

Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Marquard et al., as applied to Claims 1-6 and 9-20 above, in view of Soh et al. (2017, Genome Medicine, Vol. 9, No. 104: 1-11, IDS document 05/12/2022). Italicized text from reference art.

Below the list of applicable claims is reproduced:
Claim 7. The method of claim 6, wherein generating the training dataset further comprises acquiring, from a sequencing device, the sequence reads corresponding to the genetic material from the cohort of study subjects, and using the sequence reads to generate the training dataset.
Claim 8. The method of claim 1, wherein the cohort excludes study subjects with rare cancers not in the top 30 most common cancer types.


Marquard et al. teaches claims 1 and 6 as applied to Claims 1-6 and 9-20 above.
Marquard et al. does not teach generating the training dataset further comprises acquiring, from a sequencing device, the sequence reads corresponding to the genetic material from the cohort of study subjects, and using the sequence reads to generate the training dataset (Claim 7). Marquard et al. also does not teach the cohort excludes study subjects with rare cancers not in the top 30 most common cancer types (Claim 8).

Regarding Claim 7, Soh et al. teaches generating the training dataset further comprises acquiring, from a sequencing device, the sequence reads corresponding to the genetic material from the cohort of study subjects, and using the sequence reads to generate the training dataset (Page 4, Column 2, Paragraph 1: To minimise this dependency of gene selection on a single random partitioning of the data into training and test sets, we derived our results from an ensemble of training and test sets (i.e., one dataset split to accomplish training and testing)).
Regarding Claim 8, Soh et al. teaches excluding study subjects with rare cancers not in the top 30 most common cancer types (Page 2, column 2, Paragraph 4: We first identified 28 cancer types for our study based on availability).

It would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention to modify Marquard et al. with Soh et al. because  both references utilize the same data analysis approach, machine learning, to make predictions on cancer type/origin. Additionally, Marquard et al. suggests the possibility of bias in their training data set (Page 11, Column 1, Paragraph 2). Soh et al. teaches the use an additionally machine learning algorithm (SVMs with a linear kernel) not employed by Marquard et al., which could have led to a reduction in bias, that was found to be the most effective algorithm tested (Page 5, Column 2, Paragraph 3). Therefore, it would have been obvious to someone of ordinary skill in the art the time of the effective filling date to combine the methods from both of the references indicated above. 
Furthermore, one of ordinary skill in the art would predict that the method taught by Soh et al. could be readily added to the method of Marquard et al. with a reasonable expectation of success because they utilize the same data analysis approach, machine learning, to make the same type of prediction (identify cancer type/origin). Accordingly, claims 7 and 8 taken as a whole would have been prima facie obvious before the effective filing date and are rejected under 35 U.S.C. 103.

Conclusion


No Claims are allowed.



Any inquiry concerning this communication or earlier communications from the examiner should be directed to BLAKE H ELKINS whose telephone number is (571)272-2649. The examiner can normally be reached Monday-Friday 8-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karlheinz Skowronek can be reached at (571) 272-9047. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.H.E./Examiner, Art Unit 1687                                                                                                                                                                                                        
/Karlheinz R. Skowronek/Supervisory Patent Examiner, Art Unit 1687
Read full office action
CLASSIFIER MODELS TO PREDICT TISSUE OF ORIGIN FROM TARGETED TUMOR DNA SEQUENCING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

CLASSIFIER MODELS TO PREDICT TISSUE OF ORIGIN FROM TARGETED TUMOR DNA SEQUENCING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email