DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365(c) is acknowledged. Priority of US application 62/607,007 filed 12/18/2017 is acknowledged.
Information Disclosure Statement
The IDS filed 12/22/2025 has been considered and entered. Consequently, a 1449 form is attached.
Claim Rejections - 35 USC § 112 –First Paragraph
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claim 40 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim 40 recites “analyzing the patient sample by amplifying DNA from the sample to generate multiple copies of nucleic acid sequences and performing targeted sequencing analysis and variant calling on the multiple copies of nucleic acid sequences to identify a candidate variant, wherein the identifying the candidate variant further provides features associated with the targeted sequencing analysis and the variant calling” as part of the configured machine learning classifier. “Analyzing the patient sample by amplifying DNA from the sample to generate multiple copies of nucleic acid sequences and performing targeted sequencing analysis” is considered a wet lab experiment and “to identify a candidate variant” is considered to be achieved via another software other than the machine learning classifier. The disclosure has no description to these features for the machine learning classifiers. Such a limitation is also against the understanding of the relationship between a machine learning classifier and a set of biological experiments.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 21-40 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Step 1: Process, Machine, Manufacture or Composition
Claims 21-39 are drawn to a process, here a “method” for identifying somatic mutations from DNA isolated from a patient sample.
Claim 40 is drawn to a machine or manufacturer, here a “system” for identifying somatic mutations from DNA isolated from a patient sample, comprising: a computer system having a processor, memory and a plurality of lines of instructions; and a machine learning classifier executed by the processor of the computer system.
Step 2A Prong One: Identification of an Abstract Idea
The claim(s) recite(s):
Variant calling on the multiple copies of nucleic acid sequences to identify a candidate variant, wherein the identifying the candidate variant further provides features associated with the targeted sequencing analysis and the variant calling (claims 21 and 40);
Variant calling will require sequence data analysis and decision-making. Therefore this step equates to an abstract idea of mental process.
Evaluating, using a computer having a machine learning classifier, the candidate variant and the features against a random forest classifier comprising at least one thousand decision trees trained to detect somatic mutations in the candidate variant, wherein each decision tree evaluates a unique combination of selected information supporting the candidate variant and classifies the candidate variant as somatic or not somatic (claims 21 and 40);
This step recites a assessing/judging activity that can be achieved in human mind. Except that the step is performed on in a computer with decision trees, nothing would prevent this step from being accomplished in the human mind. Therefore, this step equates to an abstract idea of mental processes.
Wherein the machine learning classifier generates a confidence score that represents a proportion of the decision trees that would classify the candidate variant as somatic and a final classification of the candidate variant as somatic or not somatic based on the confidence score (claims 21 and 40).
This step recites a mathematical operation of generating a confidence score and a decision making process based on the confidence score. Therefore, this step equates to an abstract idea of mathematics concepts and mental processes.
Dependent claims provide further limits on biological samples (claim 30, 32-34; decision trees (claims 22, 31, 35); model training (claims 23-24, 27-29); features (claims 25, 36-37); classification output (claims 26, 38-39). Additional abstract ideas are identified at least in the model training claims.
Step 2A Prong Two: Consideration of Practical Application
The claims result in a process of assessing and reporting the identified variants as somatic or not somatic mutations with a confidence score. The claims do not recite any additional elements that integrate the abstract idea/judicial exception into a practical application.
This judicial exception is not integrated into a practical application because the claims do not meet any of the following criteria:
An additional element reflects an improvement in the functioning of a computer, or an improvement to other technology or technical field;
an additional element that applies or uses a judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition;
an additional element implements a judicial exception with, or uses a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim;
an additional element effects a transformation or reduction of a particular article to a different state or thing; and
an additional element applies or uses the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than
a drafting effort designed to monopolize the exception.
Step 2B: Consideration of Additional Elements and Significantly More
The claimed method also recites "additional elements" that are not limitations drawn to an abstract idea. The recited additional elements are drawn to:
Analyzing the patient sample by amplifying DNA from the sample to generate multiple copies of nucleic acid sequences and performing targeted sequencing analysis (claims 21 and 40).
A random forest classifier comprising at least one thousand decision trees trained to detect somatic mutations in the candidate variant (claims 21 and 40).
Generating, using the computer, a report that describes the candidate variant as somatic or not somatic based on the evaluating (claims 21 and 40).
A computer system having a processor, memory and a plurality of lines of instructions (claim 40).
Displaying on a user interface connected to an output device, a report that describes the candidate variant as including the mutation or structural alteration (claim 39).
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the above identified additional elements are conventional and well-known.
The claims do not include additional elements that are sufficient to amount of significantly more than the judicial exception because it is routine and conventional to perform the acts of sequencing samples and analyzing nucleic acids. Random forest is routinely used in biomedical data modeling. Random forest of at least one thousand decision trees is also reported as routine, conventional and well understood. Other elements of the method include analyzing and reporting the analytic results which is a recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea recited in the instantly presented claims into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself.
The additional elements are similar to these courts recognized laboratory techniques as well-understood, routine, conventional activity in the life science arts when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity (MPEP §2106.05(d).II):
v. Analyzing DNA to provide sequence information or detect allelic variants, Genetic Techs. Ltd., 818 F.3d at 1377; 118 USPQ2d at 1546; and
vii. Amplifying and sequencing nucleic acid sequences, University of Utah Research Foundation v. Ambry Genetics, 774 F.3d 755, 764, 113 USPQ2d 1241, 1247 (Fed. Cir. 2014).
The conventionality of recited additional elements as a whole is further demonstrated in the following references:
Spinella et al. ("SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing." BMC genomics 17.1 (2016): 912. Newly cited);
Hagemann, Ian S., Catherine E. Cottrell, and Christina M. Lockwood. "Design of targeted, capture-based, next generation sequencing tests for precision cancer therapy." Cancer genetics 206.12 (2013): 420-431. Newly cited); and
Princen et al. (“Methods for Improving Inflammatory Bowel Disease Diagnosis”, US 20130225439 A1, Date Published 2013-08-29. Newly cited).
Spinella discloses SNooPer, a machine learning approach that uses Random Forest classification models to accurately call somatic variants in low-depth sequencing data, wherein a Fisher's exact test (page 4, Fig. 1) is used to call somatic variant candidates. Interestingly, Spinella’s Random Forest increased linearly (in time cost) with the number of trees up to 1,000 trees (page 7, col 2, last para).
Hagemann (page 423, col 1, 5th para) teaches targeted sequencing with amplification of DNA from sample in clinical cancer testing.
Princen disclosed a machine learning method for Improving Inflammatory Bowel Disease Diagnosis. Particularly, Princen’s model comprises two separate random forest classifiers, one for IBD vs. non-IBD and the other for UC vs. CD” [006], and the random forest can have thousands of decision trees [350].
These references proved the conventionality of acquiring DNA sequences and modeling/predicting sample sequences through random forest classifiers comprising at least 1000 decision trees.
Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 21-25, 26-28, 29, 30-31 and 40 are rejected on the ground of non-statutory double patenting as being unpatentable over claims 1-5, 7-9, 6, 10-11 and 16 of U.S. Patent No. US11972841B2. Although the claims at issue are not identical, they are not patentably distinct from each other because the only difference between the instant claim 1 and the reference claim 1 is that the reference claim 1 recites exome sequencing while the instant claim 1 recites targeted sequencing. However, the reference disclosure does teach “we have focused on coding regions within exome or targeted analyses with >150× coverage” (col 28, section “Discussion”, 2nd para). Hence, compare to the reference claims, the instant claims recite another embodiment in sample handing and sequence acquiring.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 21-40 are rejected under 35 U.S.C. 103 as being unpatentable over Spinella et al. ("SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing." BMC genomics 17.1 (2016): 912. Newly cited), in view of Hagemann, Ian S., Catherine E. Cottrell, and Christina M. Lockwood. "Design of targeted, capture-based, next generation sequencing tests for precision cancer therapy." Cancer genetics 206.12 (2013): 420-431. Newly cited) and Princen et al. (“Methods for Improving Inflammatory Bowel Disease Diagnosis”, US 20130225439 A1, Date Published 2013-08-29. Newly cited).
Claim 21 is directed to a method for identifying somatic mutations from a patient sample through variant calling first then validating the variant candidate through a random forest classifier. Regarding claim 21, Spinella discloses SNooPer, a machine learning approach that uses Random Forest classification models to accurately call somatic variants in low-depth sequencing data, wherein a Fisher's exact test (page 4, Fig. 1) is used to call somatic variant candidates.
Particularly, Spinella provides “for the development and assessment of SNooPer, we used a series of real NGS datasets from 40 unrelated childhood acute lymphoblastic leukemia (cALL) patients (Fig. 2). All study subjects were French-Canadians of European descent from the established Quebec cALL (QcALL) cohort [39]. For each patient, bone marrow and blood samples were collected at diagnosis prior to treatment (patient tumor) and at remission (matched patient normal). DNA was extracted using standard protocols [40] and sequenced on the Life Technologies SOLiD 4 System to constitute Dataset 1 (mean coverage on targeted region =30X). 12 cALL patient genomes (6 tumor normal), overlapping Dataset 1, were also sequenced by Illumina, Inc. on the HiSeq 2000 (mean coverage =90X) and considered as orthogonal validation for Dataset 1. Finally, 2 samples sequenced at higher depth on the Illumina system (HiSeq 2500, mean coverage of 200X), overlapping Datasets 1 and 2, were used as validation for Dataset 2 (Fig. 2 and Additional file 2 for details)” (page 3, col 2, last para through page 5, col 1, 1st para).
Spinella provides “SNooPer expects both normal and tumor files in SAMtools
mpileup format (Pileup T vs. Pileup N in Fig. 1). To call variants as somatic, a Fisher's exact test is applied to compare the distribution of reads supporting the reference and the alternative allele between normal and tumor samples” page 3, col 1, 4th para), which teaches analyzing patient sequence data and identifying variants.
Spinella discloses a machine learning approach that uses Random Forest classification models to evaluate and to validate the somatic variants called in previous step (page 4, Fig. 1).
Spinella discloses the prediction output (class label) of the Random Forest classification models are somatic variants (or not) with a p-value (page 4, Fig. 1).
Spinella provides “SNooPer Models 1A and 1B, trained using 300 and 1,000 trees respectively, showed very similar performances with AUCs of 0.6310 and 0.6517”, (page 7, col 2, last para), which teaches a random forest model comprises one thousands of decision trees.
Since Spinella also provides “the time taken by the Random Forest increased linearly with the number of trees built: 0.58, 8.43, 24.45, 50.45 and 83.22 min were needed to build 10, 100, 300, 600 and 1,000 trees, respectively (these periods of time excluded the time taken for the calculation of features which relies on the size of the training dataset, not on the number of trees used)” (page 7, col 2, last para), the range of decision tree numbers by Spinella overlaps with the claim limitation “a random forest classifier comprising at least one thousand decision trees”.
Spinella provides “at each node, Log2(total number of attributes) + 1 features are randomly selected. The oob error rate is used as an unbiased estimate of the classification error as trees are added to the forest during training. The classification error rate is also controlled by default using a 10-fold cross-validation estimator. Informative features for the classification are selected by measuring information gain or Kullback–Leibler divergence. ROC and PR curves (Training curves) and the related Areas Under the Curves (AUCs) are calculated for each training run (Additional file 3)” (page 3, col 1, last line through col 2, 1st para), which teaches each decision tree evaluates a unique combination of selected information.
Spinella discloses the prediction output (class label) of the Random Forest classification models are somatic variants (or not) with a p-value (page 4, Fig. 1. “Circles represent the output following either the training or calling phases”), which teaches reporting model outputs.
However, Spinella’s DNA sequencing is whole genome sequencing, not targeted sequencing with amplification of DNA from sample. Spinella does not teach “confidence scores” based on decision tree voting counting.
Hagemann provides “this review focuses on capture-based methods,
amplicon sequencing is an alternative methodology that has found clinical applications, including in cancer testing (15). Whereas capture methods enrich for sequences of interest and amplify them with a limited number of PCR cycles using common primers, amplicon methods instead begin with exponential amplification of the regions of interest using sequence-specific primers” (page 423, col 1, 5th para), which teaches targeted sequencing with amplification of DNA from sample in clinical cancer testing.
Princen disclosed a method for Improving Inflammatory Bowel Disease Diagnosis. Particularly, Princen provides “the present invention provides a diagnostic model that comprises two separate random forests, one for IBD vs. non-IBD and the other for UC vs. CD” [006], which teaches random forest classifiers. Princen provides “In some aspects, random forest modeling of the present invention comprises an ensemble method, wherein the forest comprises thousands of decision trees” [350], which teaches the random forest comprises at least one thousand decision trees.
Princen provides “Once the forest is built during training, new examples are classified by taking a vote across all the decision trees. In the simplest case for a 2-class random forest classifier, the class with the most votes wins. In other instances, the cutoff for a winning number of votes, as established during training is preset to optimize performance measures. In some instances, if false positives are more costly than false negatives, the cutoff might be set higher” [348], which suggests a confidence score based on decision tree votes.
Regarding claim 22, Princen provides “the present invention provides a diagnostic model that comprises two separate random forests, one for IBD vs. non-IBD and the other for UC vs. CD” [006], which teaches random forest classifiers. Princen provides “In some aspects, random forest modeling of the present invention comprises an ensemble method, wherein the forest comprises thousands of decision trees” [350], and “a random forest is generally a classifier made up of many decision trees with a random component to building each tree. Each decision tree addresses the same classification problem, but with a different collection of examples and a different subset of features randomly selected from the dataset of examples provided” [347]. It is obvious that the at least one thousands of decision trees comprise a plurality of decision trees for each mutations/variants.
Regarding claim 23, Spinella provides “an orthogonal validation (re-sequencing) dataset is used to train the RF classification model that is subsequently used to call somatic variations in the test dataset.” (page 4, Fig. 1), which teaches using a training data set of sequences that include known mutations and/or structural alterations for model training.
Regarding claim 24, Spinella provides “SNooPer does not rely on user-defined parameters and in doing so, allows versatility and flexibility to cope with complex datasets. Here, the model is directly built around the data itself therefore limiting any bias or subjectivity in somatic mutation calling. Firstly, although systematic errors in the training dataset are likely to exist, the use of an independently sequenced (different technology, mapping) validation dataset will teach SNooPer to recognize systematic errors from the original dataset and to classify them as false positive. Therefore, this method leads to a by-default elimination of systematic errors associated to each sequencing platform. Furthermore, rather than using standardized filters, the importance of each feature for variant classification is directly measured from the data. While any RF algorithm includes by default attribute selection, we also provided the possibility for users to perform a dimensionality reduction of features based on information gain. In doing so we reduce the chance of false positive occurrence due to a strong yet biased feature, which may, in part, explain SNooPer’s superior performance compared to other tested callers. Moreover, SNooPer can accommodate reduced training datasets, such as the one constituted of 250 false and true positives used here, compensate the balance bias using cost sensitive training, and still outperform other commonly used somatic variant callers” (page 8, col 2, last para through page 9, col 10, 1st para), which teaches optimizing parameters in order to achieve high accuracy in classification of variants.
Regarding claim 25, Spinella provides “for each variant position, 15 features expected to be informative for the identification of true somatic mutations are extracted and/or calculated from the mpileup files. The complete list of features and their descriptions are presented in Additional file 1: Table S1). These features are divided into five main groups: i) quality bias of alternative bases (related to base and mapping phred quality values), ii) coverage and VAF, iii) location along the read, iv) strand bias, and v) others. When appropriate, features are evaluated with respect to reference bases at the same position (vs_ref )” (Spinella: page 2, col 2, 2nd para), which teaches selection of features categories.
Regarding claim 26, Spinella discloses the prediction output (class label) of the Random Forest classification models are somatic variants (or not) with a p-value (page 4, Fig. 1. “Circles represent the output following either the training or calling phases”), which teaches a report describing candidate variants.
Regarding claim 27, Spinella provides “SNooPer expects both normal and tumor files in SAMtools mpileup format (Pileup T vs. Pileup N in Fig. 1). To call variants as somatic, a Fisher's exact test is applied to compare the distribution of reads supporting the reference and the alternative allele between normal and tumor samples” (page 3, col 1, 4th para) and Fig. 1 (page 4. Fig. 1), which teaches comparing sequencing reads to a reference to detect an indicia of the structural alteration and validating the structural alternation detected using a random forest classification model.
Regarding claim 28, Spinella provides “for each variant position, 15 features expected to be informative for the identification of true somatic mutations are extracted and/or calculated from the mpileup files. The complete list of features and their descriptions are presented in Additional file 1: Table S1). These features are divided into five main groups: i) quality bias of alternative bases (related to base and mapping phred quality values), ii) coverage and VAF, iii) location along the read, iv) strand bias, and v) others. When appropriate, features are evaluated with respect to reference bases at the same position (vs_ref )” (Spinella: page 2, col 2, 2nd para), which teaches quality score and read coverage as parameters in training data.
Regarding claim 29, Spinella provides “an orthogonal validation (re-sequencing) dataset is used to train the RF classification model that is subsequently used to call somatic variations in the test dataset.” (page 4, Fig. 1), which teaches the training data has known SNVs, and the method contains detecting at least one SNV in the DNA; validating the detected SNV as present in the DNA using a classification model; and providing a report that describes the DNA as including the SNV.
Regarding claim 30, Hagermann provides “Several types of DNA sequence variants have been described (Table 1). It is mandatory to select a sequencing platform capable of detecting the breadth of alterations appropriate for the clinical indication. For example, paired-end sequencing greatly facilitates detection of some structural variants, including translocations that are prevalent and may be clinically actionable in cancer” (page 421, col 1, last para), which teaches paired-end sequencing and suggesting reference mapping for structural boundary.
Regarding claim 31, Spinella provides “optionally, SNooPer can integrate two additional filters input as BED format files (Bedtools step in Fig. 1) to exclude overlaps with any provided germline dataset (e.g. common polymorphisms from 1000 Genomes dataset [37]) or blacklisted genomic regions (e.g. poorly mappable regions from the RepeatMasker sequence [38])”, which suggest the decision tree classifier to differentiate the variates as germline (vs somatic).
Regarding claim 32-33, Spinella provides “for the development and assessment of SNooPer, we used a series of real NGS datasets from 40 unrelated childhood acute lymphoblastic leukemia (cALL) patients (Fig. 2). All study subjects were French-Canadians of European descent from the established Quebec cALL (QcALL) cohort [39]. For each patient, bone marrow and blood samples were collected at diagnosis prior to treatment (patient tumor) and at remission (matched patient normal). DNA was extracted using standard protocols [40] and sequenced on the Life Technologies SOLiD 4 System to constitute Dataset 1 (mean coverage on targeted region =30X). 12 cALL patient genomes (6 tumor normal), overlapping Dataset 1, were also sequenced by Illumina, Inc. on the HiSeq 2000 (mean coverage =90X) and considered as orthogonal validation for Dataset 1” (page 3, col 2, last para through page 5, col 1, 1st para), which teaches getting blood samples from patients with cancer.
Regarding claim 34, Hagermann provides “recurrent translocations, such as the BCR-ABL1 fusion in chronic myelogenous leukemia and acute lymphoblastic leukemia and ALK rearrangements in non-small-cell lung cancer, are well-known to play significant roles in cancer and in many cases are clinically actionable” (page 426, col 1, 2nd para), which teaches lung cancer.
Regarding claim 35, Spinella provides “at each node, Log2(total number of attributes) + 1 features are randomly selected. The oob error rate is used as an unbiased estimate of the classification error as trees are added to the forest during training. The classification error rate is also controlled by default using a 10-fold cross-validation estimator. Informative features for the classification are selected by measuring information gain or Kullback–Leibler divergence. ROC and PR curves (Training curves) and the related Areas Under the Curves (AUCs) are calculated for each training run (Additional file 3)” (page 3, col 1, last line through col 2, 1st para), which teaches each decision tree evaluates a unique combination of selected information, which teaches each decision tree evaluates a unique combination of selected information.
Regarding claim 36, Spinella provides “for each variant position, 15 features expected to be informative for the identification of true somatic mutations are extracted and/or calculated from the mpileup files. The complete list of features and their descriptions are presented in Additional file 1: Table S1). These features are divided into five main groups: i) quality bias of alternative bases (related to base and mapping phred quality values), ii) coverage and VAF, iii) location along the read, iv) strand bias, and v) others. (Spinella: page 2, col 2, 2nd para), which teaches that variation detection need mapping or alignments information, and SNV detection relies on NGS sequencing error rate.
Regarding claim 37, Spinella provides “for each variant position, 15 features expected to be informative for the identification of true somatic mutations are extracted and/or calculated from the mpileup files. The complete list of features and their descriptions are presented in Additional file 1: Table S1). These features are divided into five main groups: i) quality bias of alternative bases (related to base and mapping phred quality values), ii) coverage and VAF, iii) location along the read, iv) strand bias, and v) others. (Spinella: page 2, col 2, 2nd para), and “optionally, SNooPer can integrate two additional filters input as BED format files (Bedtools step in Fig. 1) to exclude overlaps with any provided germline dataset (e.g. common polymorphisms from 1000 Genomes dataset [37]) or blacklisted genomic regions (e.g. poorly mappable regions from the RepeatMasker sequence [38]). Using the default parameters of quality filters, the algorithm only considers positions presenting at least one read (mapping quality value - MQV ≥10) supporting the alternative allele (base quality value – BQV ≥20), and requires a minimum coverage of 8X in both the tumor sample and its normal counterpart. Features extraction (S1 Table) is then make for each putative somatic variants that passed these filters.(page 3, col 1, 4th para), which teaches the plurality of alignment characteristics comprises mapping quality and mismatches, the sequence quality information comprises coverage and base quality, and the information related to alterations comprise allele frequency, nearby sequence complexity, and presence of an alteration in a matching normal specimen.
Regarding claim 38, Princen provides “In some embodiments, a single tree might be built using a random ⅔ of the available samples (e.g., training set), with a random ⅔ of the features selected to make a decision split at each node of the tree. Once the forest is built during training, new examples are classified by taking a vote across all the decision trees. In the simplest case for a 2-class random forest classifier, the class with the most votes wins. In other instances, the cutoff for a winning number of votes, as established during training is preset to optimize performance measures. In some instances, if false positives are more costly than false negatives, the cutoff might be set higher” [348], which suggests a confidence score between 0.5 and 1.0, and overlaps with the claim limitation of > 0.75 for the confidence score.
Regarding claim 39, Spinella discloses the prediction output (class label) of the Random Forest classification models are somatic variants (or not) with a p-value plus the “Training report” (page 4, Fig. 1. “Circles represent the output following either the training or calling phases”), which teaches a report that describes the candidate variant as including the mutation or structural alteration.
Claim 40 is the “system” version of the claim 21 method. Since Hagemann also discloses a sample analyzing sequencing system with a computer implemented bioinformatic pipeline (page 424, Fig. 1), the art applied to claim 21 also teaches claim 40.
It would have been prima facie obvious to modify Spinella’s SNooPer pipeline, a machine learning approach that uses Random Forest classification models to accurately call somatic variants in low-depth sequencing data, wherein the cancer NGS datasets are generated by whole-genome sequencing (page 5, Fig. 2), with Hagemann’s capture-based methods, whereas capture methods enrich for sequences of interest and amplify them with a limited number of PCR cycles using common primers (page 423, col 1, 5th para). Because “although whole-genome sequencing has been used for cancer testing, the majority of the sequence obtained is intronic and/or is derived from genes with no known significance in cancer. Furthermore, the low depth of coverage achieved in typical whole-genome sequencing (5-30X coverage of each nucleotide position
on average) (1) is insufficient to identify variants present in a subset of cells-for example, in the face of nontumor cell admixture or tumor heterogeneity. Therefore, there is emerging interest in target enrichment sequencing strategies, an approach whereby genes of interest are selected (“targeted”) for sequencing ” (Hagemann: page 420, col 1, 1st para through col 2, 1st para).
One would reasonably expect success as “targeted sequencing has emerged as a cost effective approach to tumor genetic profiling” (Hagemann: page 420, col 2, 2nd para), and to Spinella’s SNooPer pipeline, the input data (from targeted exon sequencing vs WGS) has no difference.
It would have been prima facie obvious to modify the combined Spinella’s and Hagemann’s pipeline, a machine learning approach that uses Random Forest classification models to accurately call somatic variants, wherein the Random Forest classification models has up to 1000 decision trees (Spinella: page 7, col 2, last para ), with Princen‘s decision trees which implants a confidence score based on decision tree votes (Princen: [347]). Because Priecen’s method provides a straight forward and objective confidence score for the variant called.
One would reasonably expect success as combined Spinella’s and Hagemann’s pipeline, as well as Princen use the same machine learning approach (Random Forest classification) and Princen’s method of counting tree votes for a confidence score is a natural part available to the combined Spinella’s and Hagemann’s pipeline of random forest classification model that can be executed by an ordinary skilled programmer in art.
Conclusion
No claims are allowed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GUOZHEN LIU whose telephone number is (571)272-0224. The examiner can normally be reached Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Larry D Riggs can be reached at (571) 270-3062. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GL/
Patent Examiner
Art Unit 1686
/Anna Skibinsky/
Primary Examiner, AU 1635