DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant’s amendment and response, filed 10/28/2025 has been entered and carefully considered, but is not completely persuasive.
Claims 1-7 and 9-17 are pending in this application. Claim 7 stands withdrawn from consideration as being drawn to a non-elected invention; election was made without traverse 6/18/2025. Claim 8 has been canceled. Claim 17 is newly added and falls within the elected group.
Claims 1-6 and 9-17 are under examination.
The rejections under 35 USC 102 over Van der Auwera, Makutumulli or Stokes have been withdrawn, however new prior art rejections under 35 USC 103 have been set forth below.
Claim Interpretation
The claims in this application are given their broadest reasonable interpretation (BRI) using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.
The term “a machine learning unit” as now recited, has the broadest reasonable interpretation as: a collection of statistical algorithms which learn from training data, and generalize to unseen data, by processing input data, applying weights to certain parameters, and using activations functions to generate an output. Machine learning focuses on prediction tasks, based on known properties learned from training data. Classification models attempt to provide labels to instances of data, and are trained to predict the preassigned labels of a set of examples. (Wikipedia, 2026).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-6, 9-17 remain rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
The claims have been heavily amended, but some indefiniteness remains, or was introduced by amendment.
In claim 1, the claim fails to particularly point out and distinctly claim how the ML is trained with the dataset at hand, and how or why the training acts to change weights in any particular direction, and how that process affects any desired results. The claim fails to point out and distinctly claim a type of ML model, and how that model acts to perform “base-calling.” The actual processes carried out in claim 1 are unclear with respect to predicting whether a label of “germline/ somatic” should be applied to a particular base, and the ultimate limitation in step (b) to “base calling”. Base calling is a distinct process separate from identifying the germline/ somatic source of a particular polynucleotide. Base calling is the process of converting signal data into a corresponding string of nucleotide sequence data comprising A, T, G, C or X. Identifying the source of a nucleotide within a sequence read is not the same as identifying the nucleotide itself. Therefor the nature of the actual method being claimed is unclear. It is entirely unclear how the identification of a label, is then applied to the unspecified process of base calling, in any way that may influence the ultimate bases called. The claim fails to particularly point out and distinctly claim how the identification of the label affects, changes, or supports any particular base call at any particular location. Claim 1 ends with “using” the “trained” ML to “call the base at each of the genomic base positions in the test sample”, but without any validation or error correction, interpretation of results, or analysis of the ultimate classifications of the source of each nucleotide in the sequence reads. The claim fails to point out how the quantitative measure and the label values are determined and applied to the ML model, and why that application changes the weights and coefficients of specific parameters of that model based on the information present in the training dataset. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
The term “the one or more classification models further…” in claims 3-4 now lacks antecedent basis in claim 1 from which each depends. Claim 1 now only recites a single classification model. It is unclear if these limitations refer to additional models to be added to the method of claim 1, or whether the model of claim 1 somehow performs these limitations. The metes and bounds of claim 3-4 are unclear with respect to how the trained classification model “further calls” a relative frequency or probability of a particular call, or all calls in the output of the classifier based on the data at hand in claim 1. It is unclear if this is applied to all sequence bases in the sequence read data, or only for bases showing a difference from the training set (reference data), or particular known SNV sites associated with disease. While breadth is not the same as indefiniteness, the claim lacks the minimally sufficient set of steps required to achieve the desired result.
The metes and bounds of claim 6 are unclear with respect to how the ML acts on the test sequence read data, with certain characteristics, to carry out base-calling. Claim 1 ends with the using the trained ML, but without any validation or error correction, interpretation of results, or analysis of the ultimate classifications. The claim fails to point out how the quantitative measure and the label values are determined and applied to the ML model, and how that application uses the weights and coefficients of specific parameters of that model to perform accurate base calling. The actual processes carried out in claim 6 are unclear with respect to predicting whether a label of “germline/ somatic” should be applied to a particular base, and the ultimate limitation in step (d) to “base calling”. Base calling is a distinct process separate from identifying the germline/ somatic source of a particular polynucleotide. Base calling is the process of converting signal data into a corresponding string of nucleotide sequence data comprising A, T, G, C or X. Identifying the source of a nucleotide within a sequence read is not the same as identifying the nucleotide itself. It is entirely unclear how the identification of a label, is then applied to the unspecified process of base calling, in any way that may influence the ultimate bases called. The claim fails to particularly point out and distinctly claim how the identification of the label affects, changes, or supports any particular base call at any particular location. Therefor the nature of the actual method being claimed is unclear.
With respect to claims 9-10, neither claim 9 nor claim 1 provide reference data from samples from persons with a disease, or cancer samples. It is unclear if the cfDNA is from the test subject with the disease, or the training data set has data from cfDNA of persons with a disease. The ML of claim 1 is not trained to detect differences present in base calling procedures for samples from individuals with any disease or more particularly cancer. The ML is only trained to identify that a given nucleotide in a given sequence read should have a label of “germline or somatic”. This is not commensurate with identifying the presence of a disease, such as cancer. The claim fails to particularly point out and distinctly claim how the results of the base calling on a test sample are implemented in any way to make any associations or conclusions, nor does the claim clarify how the source of the sample affects any process of claim 1. It is entirely unclear how the identification of a label, is then applied to the unspecified process of base calling, in any way that may influence the ultimate bases called, and fails to particularly point out how to use that information to make any association, correlation, or diagnosis of a disease, more particularly cancer. The claim fails to particularly point out and distinctly claim how the identification of the label affects, changes, or supports any particular base call at any particular location and fails to particularly point out how any particular base call that results from the method should be assessed for correlation with a disease. Therefor the nature of the actual method being claimed is unclear.
The metes and bounds of claim 11 are unclear, with respect to how any detected SNV/ SNP are to be associated with somatic or germline DNA. The ML of claim 1 is not trained to detect differences present in SNV that would indicate whether an identified SNV is germline or not. The claim fails to particularly point out and distinctly claim how the results of the base calling on a test sample are implemented in any way to make this association and provide the desired result.
The metes and bounds of claim 12-13 are unclear with respect to how any detected SNV are to be associated with a change in any biological pathway activity. The ML of claim 1 is not trained to detect changes in biological activity in any known pathway. The ML is not trained to detect any cancer-specific biological activity pathway differences or physiological changes. The claim fails to particularly point out and distinctly claim how the base calling output of the ML of claim 1 is to be acted upon in any way to achieve these goals.
The metes and bounds of claim 16 are unclear with respect to how the confidence levels are obtained from the results which are the calling of a base at a position of a sequence read. Claim 1 does not address any confidence or error estimations, and claim 6 fails to particularly point out and distinctly claim how these levels are to be calculated or obtained. It is unclear whether the data gathered in claim 1 provides the required information with which the confidence could be calculated.
Applicant’s arguments:
Applicant’s arguments and amendments have been considered with respect to this rejection, however some indefiniteness remains, or was introduced by amendment.
MPEP 2173: “During prosecution, applicant has an opportunity and a duty to amend ambiguous claims to clearly and precisely define the metes and bounds of the claimed invention. The claim places the public on notice of the scope of the patentee’s right to exclude. See, e.g., Johnson & Johnston Assoc. Inc. v. R.E. Serv. Co., 285 F.3d 1046, 1052, 62 USPQ2d 1225, 1228 (Fed. Cir. 2002) (en banc). As the Federal Circuit stated in Halliburton Energy Servs., Inc. v. M-I LLC, 514 F.3d 1244, 1255, 85 USPQ2d 1654, 1663 (Fed. Cir. 2008).”
“The essential inquiry pertaining to this requirement is whether the claims set out and circumscribe a particular subject matter with a reasonable degree of clarity and particularity. "As the statutory language of ‘particular[ity]' and 'distinct[ness]' indicates, claims are required to be cast in clear—as opposed to ambiguous, vague, indefinite—terms. It is the claims that notify the public of what is within the protections of the patent, and what is not." Packard, 751 F.3d at 1313, 110 USPQ2d at 1788. Definiteness of claim language must be analyzed, not in a vacuum, but in light of:
(A) The content of the particular application disclosure;
(B) The teachings of the prior art; and
(C) The claim interpretation that would be given by one possessing the ordinary level of skill in the pertinent art at the time the invention was made.”
While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-6, 9-17 is/are/ remain rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea of mental steps, mathematic concepts, organizing human activity, or a natural law without significantly more.
The claims have been significantly amended, but the basis of the rejection is essentially the same.
Applicant is directed to MPEP 2106 and the Federal Register notice (FR89, no 137 (7/17/2024) p 58128-58138) for the most current and complete guidelines in the analysis of patent- eligible subject matter. The current MPEP is the primary source for the USPTO’s patent eligibility guidance.
With respect to step (1): YES, the claims are drawn to one of the statutory categories: methods. The claimed methods are not limited to computer-implemented embodiments.
With respect to step (2A) (1): YES, the claims recite an abstract idea, law of nature and/or natural phenomenon. The claims explicitly recite elements that, individually and in combination, constitute one or more judicial exceptions (JE). MPEP 2106.04(a).
Mathematic concepts, Mental Processes or Elements in Addition (EIA) in the claim(s) include:
1. (Currently amended) A method, comprising:
obtaining sequence read data from a test sample having cell-free DNA (cfDNA) molecules comprising germline DNA and somatic DNA and
(EIA- data gathering step, and a description of the data provided. Specification [0018-0026]: The sequencing and analysis of the mixture occurs outside the bounds of the claim, and is the source of the test data.)
using a classification model generated by a machine learning unit to generate a plurality of base calls, the plurality of base calls comprising a base called from the germline DNA or the somatic DNA at each of a plurality of genomic base positions in the sequence read data of the test sample,
(Mathematic concept of applying the test data set to a mathematic classification model, trained outside the bounds of the claim. The trained model only distinguishes between germline/somatic contributions and non-germline/somatic contributions. The ML is not specifically identified. The type of classification model is not specifically identified. How or why the ML changes any weights is unspecified. In the specification at [0026-0027] ML include neural networks (MLP, RBNN, neuro-fuzzy) and SVM. The trained models are classification models. Base calling described at [0035]; MPEP 2106.04(a) section I.)
the classification model generated based on:
(a) a training data set provided to the machine learning unit, the training dataset comprising, for each mixture in a plurality of mixtures, wherein each mixture in the plurality comprises polynucleotides from a plurality of different subjects, values indicating:
(i) a quantitative measure of each of a plurality of bases at each of a plurality of genomic base positions from sequence reads of a plurality of polynucleotides in the mixture, and
(ii) a plurality of class labels, each class label classifying a particular base as representative of a germline contribution or a somatic contribution in the mixture at a particular genomic base position; and
(EIA related to data gathering of training data from a plurality of subjects, and a description of the data gathered.)
(b) an iterative learning process comprising, in each iteration, one or more adjustments to one or more weights associated with the values of the training data set provided to the machine learning unit to generate the classification model configured for calling [[a]] the base at each of [[a]] the plurality of genomic base positions in [[a]] the test sample.
(Mathematic concept of training a machine learning algorithm by applying the values of the training set to the selected ML, to adjust the weights and coefficients of the relevant parameters, for the intended use of base calling. “Base calling” is not the same as identifying whether a base is germline/ somatic. Base calling is the identification of A, T, G, C, or X as the nature of a nucleotide at a particular position in a polynucleotide sequence. The type of ML is unlimited. No limitation on how or why the ML operates on the provided data to adjust any particular weight or parameter. No application of test or validation data to the trained ML is required. No error or confidence parameters are provided. In the specification at [0026-0027] ML include neural networks (MLP, RBNN, neuro-fuzzy) and SVM. The trained models are classification models. Base calling described at [0035]; MPEP 2106.05(a) section I.)
2. (EIA- modifying the data gathering step by providing different information. The combining of polynucleotides takes place outside the limitations of the claim, and describes how the data was made. [0032])
3. (Mathematic concept of calculating a frequency value of one or more bases in a test sample, by any means. No specificity is provided as to how the frequency is determined, or what to do with the calculated information. [0016, 0034, 0053])
4. (Mathematic concept of calculating a probability of the presence of one or more bases at a site in the test sample. No specificity is provided as to how the probability is determined, or what to do with the calculated information. [0016, 0039, 0053])
5. (Mathematic concept modification, setting forth the use of known ML without particular detail as to how each network acts on the provided data, in training, validation or testing. Fig 2, [0026-0028, 0034, 0039-0040])
6. (Mathematic concept of applying the test data to the trained classification model, to perform base calling. [0053])
9. (EIA- data gathering modification, limiting the source of the test sample, as one with unspecified disease. No indication how this affects the ML. [0016, 0018, 0030])
10. (EIA- data gathering modification, modifying aspects of the source of the test sample. [0016, 0018, 0030, 0054])
11.(Mental Process in a computing environment, of observation of SNV in the output of the ML by any means, and making the correlation and judgement as to whether a given SNV is associated with cancer, without restriction as to how the association is made. This recites a natural law: naturally occurring changes in genetic data are correlated to naturally occurring phenotypes: cancer. This is a genotype/ phenotype relationship. [0016, 0018, 0030, 0054])
12. (Mental Process in a computing environment, of associating an observed SNV with known activity/ involvement in known processes, without restriction as to how the association is made. This recites a natural law: naturally occurring changes in genetic data are correlated to naturally occurring phenotypes, such as pathway activity. This is a genotype/ phenotype relationship. [0055])
13.(Mental Process in a computing environment of observing cancer-related changes in pathway activities, without restriction as to how the association to cancer is made. This recites a natural law: naturally occurring changes in genetic data are correlated to naturally occurring phenotypes, such as cancer. This is a genotype/ phenotype relationship. [0055])
14. (Mental Process in a computing environment of observation of SNV, and making correlations and judgements as to whether the SNV is linked to a phenotype, without restriction as to how the association is made. This recites a natural law: naturally occurring changes in genetic data are correlated to naturally occurring phenotypes, such as cancer. This is a genotype/ phenotype relationship. [0030, 0054, 0055])
15.(Mathematic concept of comparing result values of the base-calling to result values of other methods and calculating an accuracy, without restriction as to how the calculation is made. Alternatively, a mental process in a computing environment of observing values for the ML output and “other method” output, and making a judgment as to which is more accurate without restriction on how the comparison is made. [0007, 0017, 0051])
16. (Mathematic concept of calculating confidence levels for the output of the test results. [0017, 0051])
17. (EIA- modifying the data gathering step by providing different information. The combining of polynucleotides takes place outside the limitations of the claim, and describes how the data was made. [0032])
With respect to step 2A (2): NO. The claims were examined further to determine whether they integrated any JE into a practical application (MPEP 2106.04(d)). The claimed additional elements are analyzed alone, or in combination to determine if the JE is integrated into a practical application (MPEP 2106.05(a-c, e, f and h)).
Claim(s) 1, 2, 6, 9-10, 17 recite the additional non-abstract element(s) (EIA) of data gathering:
The data gathering limitations merely receive previously generated data. Receipt of data, necessary to carry out the JE, does not change how the JE is performed. The JE can act on any set of sequence read data, that meets the description of having come from a sample of cfDNA that also comprises germline and somatic DNA.
Data gathering steps are not an abstract idea, they are extra-solution activity, as they collect the data needed to carry out the JE. The data gathering does not impose any meaningful limitation on the JE, or how the JE is performed. The additional limitation (data gathering) must have more than a nominal or insignificant relationship to the identified judicial exception. (MPEP 2106.04/.05, citing Intellectual Ventures LLC v. Symantec Corp, McRO, TLI communications, OIP Techs. Inc. v. Amason.com Inc., Electric Power Group LLC v. Alstrom S.A.).
Dependent claim(s) 3-6, 11-16 each recite(s) an abstract limitation to the JE reciting additional mathematic concepts, or mental processes. Additional abstract limitations cannot provide a practical application of the JE as they are a part of that JE.
In combination, the limitations of data gathering, for the purpose of carrying out the JE, merely provide extra-solution activity, and fail to integrate the JE into a practical application.
With respect to step 2B: NO, the claims do not provide a specific inventive concept. The claims recite a JE, do not integrate that JE into a practical application, and thus are probed for a specific inventive concept. The judicial exception alone cannot provide that inventive concept or practical application (MPEP 2106.05). The additional elements were considered individually and in combination to determine if they provide significantly more than the judicial exception. (MPEP 2106.05.A i-vi).
With respect to claim(s) 1, 2, 6, 9-10, 17: The limitation(s) identified above as non-abstract elements (EIA) related to data gathering do not rise to the level of significantly more than the judicial exception.
Stokes (2014- of record in parent) provides training and test set data meeting the conditions. Stokes discloses providing sequence read data from any source, including cell-free DNA sequence reads, with quantitative elements and labels. Test cfDNA is also provided by Stokes.
Manning (2013- of record in parent) provides sequence read training and test set data for use in ML classifiers.
Ehrich (2013- of record in parent) provides cell free DNA training data and test data, for use in ML classifiers. The samples of Ehrich are those from several individuals, and comprise cfDNA, cfDNA sequence read data and information.
Matukumalli (2006- of record in parent) provides training data from DNA sequence read information, for use in training ML to detect SNP/ SNV.
Van Der Auwera (2013) provides test and training data, comprising sequence read data having certain labels and information.
Eltoukhy (filed 2014) provides test and training data, comprising sequence read data having labels and information.
Talasaz (2014) provides test and training data, comprising sequence read data having qualitative and quantitative information.
These elements meet the BRI of the identified data gathering limitations. As such, the prior art recognizes that this data gathering element was routine, well understood and conventional in the biotechnology and bioinformatics (as in Alice Corp., CyberSource v. Retail Decisions, Parker v. Flook).
In the specification at [0055, 0057] it is disclosed that the steps identified as data gathering can be met using the well-known and commercially available protocols, kits and equipment of:
“digital sequences from Guardant Health,”
Sanger sequencing machines,
Illumina HiSeq2500,
hand-held or desktop sequencers, et al.
Previously sequenced sequence read data can be acquired from public databases.
These elements meet the BRI of the identified data gathering limitations, and underscore the assertion that these elements were well-understood, routine and conventional in biotechnology and bioinformatics.
Activities such as data gathering do not improve the functioning of a computer itself, or comprise an improvement to any other technical field. The limitations do not require or set forth a particular machine, they do not effect a transformation of matter, nor do they provide an unconventional step (citing McRO and Trading Technologies Int’l v. IBG). Data gathering steps constitute a general link to a technological environment. Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception are insufficient to provide significantly more (as discussed in Alice Corp.,).
Dependent claim(s) 3-6, 11-16 each recite a limitation requiring additional mathematic concepts or mental processes. Additional abstract limitations cannot provide significantly more than the JE as they are a part of that JE (MPEP 2106.05).
In combination, the data gathering steps providing the information required to be acted upon by the JE, fail to rise to the level of significantly more than that JE. The data gathering steps provide the data for the JE. No non-routine step or element has clearly been identified.
The claims have all been examined to identify the presence of one or more judicial exceptions. Each additional limitation in the claims has been addressed, alone and in combination, to determine whether the additional limitations integrate the judicial exception into a practical application. Each additional limitation in the claims has been addressed, alone and in combination, to determine whether those additional limitations provide an inventive concept which provides significantly more than those exceptions. For these reasons, the claims, when the limitations are considered individually and as a whole, are rejected under 35 USC § 101 as being directed to non-statutory subject matter.
Applicant’s arguments:
Applicant’s arguments have been carefully considered but are not completely persuasive.
Applicant argues the categorization or identification of abstract ideas, and/or a natural law in the claims. The Examiner has specifically identified each limitation in the claim, and what category of judicial exception is encompassed. The abstract ideas identified in the independent claims are the same as those identified as mathematic correlations, mathematic calculations, and mathematical relationships or as mental processes, concepts performed in the human mind including observations, evaluations, judgements and opinions, in MPEP 2106.04.
It is noted that the claims are not limited to a computer-implemented method. Thus, any arguments with respect to improvements in computer technology are unpersuasive. Similarly, arguments alleging “integration into a computing environment” are unpersuasive.
The examiner acknowledges Applicant’s arguments which set forth that the claims lead to an improvement in the technology of “base calling”. According to the guidance set forth in MPEP 2106, this is an improvement to the judicial exception itself, and is not reflected back into a specific technological environment or practically applied process.
An improvement in the judicial exception itself is not an improvement in the technology. For example, in In re Board of Trustees of Leland Stanford Junior University, 989 F.3d 1367, 1370, 1373 (Fed. Cir. 2021) (Stanford I), Applicant argued that the claimed process was an improvement over prior processes because it ‘‘yields a greater number of haplotype phase predictions,’’ but the Court found it was not ‘‘an improved technological process’’ and instead was an improved ‘‘mathematical process.’’ The court explained that such claims were directed to an abstract idea because they describe ‘‘mathematically calculating alleles’ haplotype phase,’’ like the ‘‘mathematical algorithms for performing calculations’’ in prior cases. Notably, the Federal Circuit found that the claims did not reflect an improvement to a technological process, which would render the claims eligible (FR89 no.137, p58137, 7/17/2024).
Here, Applicant has provided an improved mathematical process of calculating statistical measures indicating whether a particular nucleotide in a sequence read has a germline/somatic source, which is ultimately used in a mathematic process of base calling.
In the claims, the EIA identified as data gathering steps do not affect how the steps of the abstract idea are performed, they provide the data which is acted upon by the limitations of the JE. These data gathering steps do not apply, rely on, or use the steps identified as making up the JE. Rather, the mathematic calculation steps avail themselves of the data gathered. The data gathering in the claims constitutes insignificant pre-solution activity. See MPEP § 2106.05(g):
MPEP2106.05(g). “The term "extra-solution activity” can be understood as activities incidental to the primary process or product that are merely a nominal or tangential addition to the claim...”
“An example of pre-solution activity is a step of gathering data for use in a claimed process, e.g., a step of obtaining information about credit card transactions, which is recited as part of a claimed process of analyzing and manipulating the gathered information by a series of steps in order to detect whether the transactions were fraudulent.”
See also CyberSource Corp. v. Retail Decisions, Inc., 654 F.3d 1366, 1372 (Fed. Cir. 2011) ("[E]ven if some physical steps are required to obtain information from the database ... such data-gathering steps cannot alone confer patentability.").
Applicant is encouraged to review the original disclosure and the priority document to determine the minimally sufficient set of steps required to achieve the desired results, including structural elements of the ML (inputs, layers, outputs) and how the ML acts on the data gathered to achieve those results. (For example, the priority document [62/213448] discloses neural networks having specific labeled inputs, layers, and particular outputs: one each of A, T, C and G. The training data has known “expected outcomes.” The training can have particular mixtures of known data sources with known outcomes. Certain vectors are generated: for each base at each locus tested, a vector for each mixture, vectors representing GC content, entropy, read direction etc.
“Using the training set, the machine learning algorithm will generate a model to classify the sample according to base identity at one or more loci. This is also referred to as "calling" a base. The model developed may employ information from any part of a test vector. That is, it may use not only information about tally vectors from the locus in question, but tally vectors from other loci proximal or distal to the test locus or non-sequence read information included as a feature of the vector.” P7
“As shown in FIG. 2B, a neuron in an artificial neural network is a set of input values (xi) and associated weights (wi) and a function (g) that sums the weights and maps the results to an output (y). A bias (constant term) is also provided to each neuron. Neurons are organized into layers. The input layer is composed not of full neurons, but rather consists simply of the values in a data record, that constitute inputs to the next layer of neurons. The next layer is called a hidden layer; there may be several hidden layers. The final layer is the output layer, where (in a preferred mode for this invention) there is one node for each class. A single sweep forward through the network results in the assignment of a value to each output node, and the record is assigned to whichever class's node had the highest value.
In the training phase, the correct class for each record is known (this is termed supervised training), and the output nodes can therefore be assigned "correct" values -- "1" for the node corresponding to the correct class, and "0" for the others. (In practice it has been found better to use values of 0.9 and 0.1, respectively.) It is thus possible to compare the network's calculated values for the output nodes to these "correct" values, and calculate an error term for each node (the "Delta" rule). These error terms are then used to adjust the weights in the hidden layers so that, hopefully, the next time around the output values will be closer to the "correct" values.
The neural networks uses an iterative learning process in which data cases (rows) are presented to the network one at a time, and the weights associated with the input values are adjusted each time. After all cases are presented, the process often starts over again. During this learning phase, the network learns by adjusting the weights so as to be able to predict the correct class label of input samples… Once a network has been structured for a particular application, that network is ready to be trained. To start this process, the initial weights are chosen randomly. Then the training, or learning, begins.
The network processes the records in the training data one at a time, using the weights and functions in the hidden layers, then compares the resulting outputs against the desired outputs. Errors are then propagated back through the system, causing the system to adjust the weights for application to the next record to be processed. This process occurs over and over as the weights are continually tweaked. During the training of a network the same set of data is processed many times as the connection weights are continually refined… Finally, the process applies the neural network to perform calling of SNVs from somatic sources, in a mixture of somatic and germline cell-free DNA data (8). This can be done by applying the input values in a feed-forward manner through the neural network to arrive at the SNV callings.” P14-16)
Further, with respect to the arguments regarding the alleged improvement, it is unclear that the independent claims recite all the necessary and sufficient steps required to achieve that improvement. MPEP 2106.05(a): “An important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. McRO, 837 F.3d at 1314-15, 120 USPQ2d at 1102- 03; DDR Holdings, 773F.3d at 1259, 113 USPQ2d at 1107.”
The MPEP sets forth that “if the examiner concludes the disclosed invention does not improve technology, the burden shifts to applicant to provide persuasive arguments supported by any necessary evidence to demonstrate that one of ordinary skill in the art would understand that the disclosed invention improves technology. Any such evidence submitted under 37 CFR 1.132 must establish what the specification would convey to one of ordinary skill in the art and cannot be used to supplement the specification.” Applicant’s arguments cannot take the place of evidence.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-6, 9-17 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Eltoukhy (2018).
The earliest effective filing date for the rejected claims is 9/2/2015.
Eltoukhy, H. et al. Methods and systems for detecting genetic variants. US 9,920,366 B2, published 3/20/2018, with priority to at least 12/24/2014, and possibly as early as 12.28.2013.
The applied reference has a common inventor (Eltoukhy) with the instant application. Based upon the earlier effectively filed date of the reference, it constitutes prior art under 35 U.S.C. 102(a)(2). This rejection under 35 U.S.C. 102(a)(2) might be overcome by: (1) a showing under 37 CFR 1.130(a) that the subject matter disclosed in the reference was obtained directly or indirectly from the inventor or a joint inventor of this application and is thus not prior art in accordance with 35 U.S.C. 102(b)(2)(A); (2) a showing under 37 CFR 1.130(b) of a prior public disclosure under 35 U.S.C. 102(b)(2)(B) if the same invention is not being claimed; or (3) a statement pursuant to 35 U.S.C. 102(b)(2)(C) establishing that, not later than the effective filing date of the claimed invention, the subject matter disclosed in the reference and the claimed invention were either owned by the same person or subject to an obligation of assignment to the same person or subject to a joint research agreement.
With respect to claim 1 as amended, Eltoukhy obtains cfDNA samples comprising germline and somatic mutations, as set forth at columns 16-17.
“For example, a sample of about 30 ng DNA can contain about 10,000 (10.sup.4) haploid human genome equivalents and in the case of cfDNA, about 200 billion (2×10.sup.11) individual polynucleotide molecules. Similarly, a sample of about 100 ng of DNA can contain about 30,000 haploid human genome equivalents and, in the case of cfDNA, about 600 billion individual molecules.
A sample can comprise nucleic acids from different sources. For example, a sample can comprise germline DNA or somatic DNA. A sample can comprise nucleic acids carrying mutations. For example, a sample can comprise DNA carrying germline mutations and/or somatic mutations. A sample can also comprise DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations).
Eltoukhy obtains samples from different individuals, sequences them, and obtains sequence read data. (Col 16-17)
Eltoukhy generates and provides training set data from sequence reads, which comprise quantitative measures, and class labels (tags, barcodes), meeting the BRI of the test set limitation of claim 1. The source of the training sequence reads can be from individuals that are healthy, or who have cancer, or other diseases. Cell Free DNA is specifically disclosed. (col 5). Known minor allele frequencies can be used as a label. Quantitative measures, such as the statistical significance level, a binomial distribution, exponential distributions et al. are disclosed in col 6. Paired vs unpaired reads can also be a quantitative measure, as set forth in col 9-10. These meet the requirements of the test set, step (a)(1) and (2).
Tags are further discussed at col 17, and can comprise nucleic acids, chemical compounds, florescent probes or radioactive probes. Tags can be oligonucleotides. Tags can have known or unknown sequences.
With respect to claim 1 and training a ML model to call a base in a test sample, Eltoukhy provides base calling, using machine learning, starting at column 32.
“After the sequence read coverage ratios have been determined, a stochastic modeling algorithm can be optionally applied to convert the normalized ratios for each window region into discrete copy number states. In some cases, this algorithm may comprise a Hidden Markov Model. In other cases, the stochastic model may comprise dynamic programming, support vector machine, Bayesian modeling, probabilistic modeling, trellis decoding, Viterbi decoding, expectation maximization, Kalman filtering methodologies, or neural networks.” Col 36.
“Once merged, a consensus sequence can be called at a given genomic locus among the genomic loci. At any given genomic loci, any of the following can be determined: i) genetic variants among the calls; ii) frequency of a genetic alteration among the calls; iii) total number of calls; and iv) total number of alterations among the calls. The calling can comprise calling at least one nucleic acid base at the given genomic locus. The calling can also comprise calling a plurality of nucleic acid bases at the given genomic locus. In some cases, the calling can comprise phylogenetic analysis, voting (e.g., biased voting), weighing, assigning a probability to each read at the locus in a family, or calling the base with the highest probability. The consensus sequence can be generated by evaluating a quantitative measure or a statistical significance level for each of the sequence reads. If a quantitative measure is performed, the method can comprise use of a binomial distribution, exponential distribution, beta distribution, or empirical distribution. However, frequency of the base at the particular location can also be used for calling, for example, if 51% or more of the reads is a “A” at the location, then the base may be called an “A” at that particular location. The method can further comprise mapping a consensus sequence to a target genome.” Col 43
The methods of Eltoukhy assess whether a variant is more likely present at a germline level, or whether it resulted from a somatic mutation, at col 37.
“The methods disclosed herein can be used to determine whether a sequence variant is more likely present in the germline level or resulted from a somatic cell mutation, e.g., in a cancer cell. For example, a sequence variant in a gene detected at levels arguably consistent with heterozygosity in the germline is more likely the product of a somatic mutation if CNV is also detected in that gene. In some cases, to the extent we expect that a gene duplication in the germline bears a variant consistent with genetic dose (e.g., 66% for trisomy at a locus), detection gene amplification with a sequence variant dose that deviates significantly from this expected amount indicates that the CNV is more likely present as a result of somatic cell mutation.”
Associations of disease phenotypes with somatic or germline variants is disclosed at col 38, and Example 3.
As such, claim 1 is anticipated.
With respect to claim 2, and 17 mixtures of samples are disclosed.
With respect to claim 3, allele frequencies can be used.
With respect to claim 4, a probability can be determined.
With respect to claim 5, neural networks and SVM are disclosed.
With respect to claim 6, test data sets are generated in the same way as above, and applied to the trained model.
With respect to claim 9-10, the individual could have a disease, such as cancer.
With respect to claim 11, the called bases are used to determine the presence of SNV associated with cancer.
With respect to claims 12-13, biological relevance of called SNV can be determined.
With respect to claim 14, clinical phenotypes can be determined.
With respect to claim 15, the methods were more accurate than hand curation.
With respect to claim 16, confidence levels can be provided.
Applicant’s arguments:
Applicant’s arguments that Eltoukhy does not discriminate between contributions from germline or somatic sources are not persuasive, as Eltoukhy does indeed make such discriminations.
New Grounds of Rejection
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-6, 9-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Van der Auwera (2013) and Talasaz (2014).
The earliest filing date for the pending claims in this application is 9/2/2015.
Van der Auwera et al. (Oct 2013) From FastQ data to high confidence variant calls: The Genome Analysis Toolkit Best Practices Pipeline. Current Protocols in Bioinformatics, 11.10.1-11.10-33.
Talasaz et al systems and methods to detect rare mutations and copy number variations. WO 2014/039556 A1 published 13 March 2014 (with priority available as early as 4 Sept 2013).
Van der Auwera is directed to a pipeline or workflow for analysis of sequence read data, from samples that have undergone sequencing. “The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.” (abstract). Van der Auwera uses trained machine learning models in both base calling processes, and in variant calling processes. Fig 1.
Talasaz is directed to the identification and analysis of rare mutations in cell free DNA (cfDNA) using neural networks, to identify germline variants and distinguish them from somatic variants. “The present disclosure provides a system and method for the detection of rare mutations and copy number variations in cell free polynucleotides… and application of bioinformatics tools to detect rare mutations and copy number variations as compared to a reference.” (abstract)
With respect to claim 1 and obtaining sequence read data from a test sample, having cfDNA which include germline and somatic DNA, Talasaz obtains these sequence reads from cell-free samples as set forth at [0003, 202-204, 0224-0252, 0293-0298]. Talasaz notes that in base calling, technical limitations can result in the presence of incorrect base calls at a locus, which can complicate the analysis of cell free samples that contain germline DNA as well as somatic DNA at [0207-0208].
To generate the sequence reads, Talasaz discloses high throughput sequencing of the cell free sample, at [00253-00257]. Base calling is disclosed at [00258-00267] including the generation of confidence scores for each called base, and the frequency of each base.
Talasaz indicates that germline and somatic linked variations can be distinguished multiple ways using one or more parameters, including: comparisons of quality scores [0294], mapping locations, mapping scores [0294], frequencies of alleles in wild-type/ healthy genomes [0258], tagging, manipulation of depth of coverage [0297], consensus reads, analysis of clonality, confidence scoring, variant base scoring/ frequency [0298].
This data is applied to statistical model, algorithm or machine learning process to make the most likely base call based on the calculated parameters. Talasaz discloses the use of neural networks, SVM, HMM, dynamic programming, and other strategies in analysis of the sequence reads [0275-0276, 0297-0303]. Talasaz applies these methods to detection of cancer in a sample, beginning at [00305] and Examples 1-8. Example 7 is directed to digital sequencing technologies able to identify and quantify rare tumor-derived nucleic acids among germline fragments [0374]. Talasaz provides some ground truth information for somatic mutations from the COSMIC database. Example 8 illustrates the ability to detect somatic mutations down to 0.1% sensitivity in certain predefined mixtures of germline and somatic DNA [0381]. Talasaz also demonstrated concordance between mutations identified using cell free DNA samples, and mutations identified from matched tumor biopsy samples [0386]. The calling of the bases uses the measure of a contribution from germline or somatic sources in the mixture as required for the labels of step (a)(ii).
Talasaz does not specifically set forth the structure of the machine learning model, or the specific training set comprising quantitative measures and class labels per se, or an iterative learning process.
In the same field of bioinformatics: base calling, with respect to claim 1, the protocol / pipeline/ workflow of Van der Auwera is a method, acting on sequence read data (FastQ and BAM/SAM files). Fig 1. Pooled samples are contemplated, for example in the description of UnifiedGenotyper, p3.
With respect to claim 1 and providing a training set, that comprises quantitative measures of bases at a plurality of base positions from sequence reads in a mixture, and a plurality of class labels, Van der Auwera provides:
In a base quality score recalibration process, original base quality scores for each base for each sequence read are obtained. Running the BaseRecalibrator provides a recalibrated quality score. This meets the BRI of the quantitative measure of step (a)(i). P8. After recalibration, a file is generated which allows comparison of before and after recalibration quality scores. Running the GenomeAnalysisTK command creates a file “recal-reads.bam” with “exquisitely accurate base substitution, insertion and deletion quality scores.” Depth of sequencing is an additional quantitative measure provided by the sequence reads. GATK determines the presence of alternate alleles, either de novo, or using VCF files from reference material. P11.
With respect to class labels, Van der Auwera provides:
GATK provides confidence threshold values (phred-scales). The phred score meets the BRI of a class label. The tags of Van der Auwera meet the BRI of a plurality of class labels.
“For each training set, we use key-value tags to qualify whether the set contains known sites, training sites, and/or truth sites. We also use a tag to specify the prior likelihood that those sites are true (using the Phred scale).” P13
The tags meet the BRI of the class labels, as they contain a label “classifying a base as representative of a germline contribution or a somatic contribution.” (the likelihood that a base call is true, ground truth information, known information) The processed sequence data is the test set, meeting the BRI of the test set limitation.
With respect to claim 1 and training a ML model to generate one or more classification models for iteratively calling a base at each position, Van der Auwera provides:
“The GATK callers (HaplotypeCaller and UnifiedGenotyper) are by design very lenient in calling variants in order to achieve a high degree of sensitivity. This is a good thing because it minimizes the chance of missing real variants, but it does mean that we need to refine the call set to reduce the amount of false positives, which can be quite large. The best way to perform this refinement is to use variant quality score recalibration (VQSR). In the first step of this two-step process, the program uses machine-learning methods to assign a well calibrated probability to each variant call in a raw call set. We can then use this variant quality score in the second step to filter the raw call set, thus producing a subset of calls with our desired level of quality, fine-tuned to balance specificity and sensitivity.” P12
“To calculate the variant quality score, the program builds an adaptive error model using training sets (explained further below). The model provides a continuous, covarying estimate of the relationship between variant call annotations and the probability that a variant call is a true genetic variant, rather than a sequencing or data-processing artifact. The program then applies this adaptive error model to both known and novel variation discovered in the call set of interest, and annotates each variant with a quality score called VQSLOD. This is the log odds ratio of the variant call being a true positive versus being a false positive according to the training model.” P12.
“We have found that SNPs and indels, being different classes of variation, can have different “signatures” that indicate whether they are real or artifactual. For that reason, we run the variant recalibration process separately on SNPs and indels, in a sequential manner. This allows the program to build separate error models for each, ensuring that their specific signatures do not get mixed up. Conveniently, the variant recalibration tools are capable of analyzing and recalibrating one type of variation without affecting the other, so we do not need to separate the variants into different files.” P13
“The Unified Genotyper calls SNPs and indels separately by considering each variant locus independently. The model it uses to do so has been generalized to work with data from organisms of any ploidy.”
In KSR Int 'l v. Teleflex, the Supreme Court, in rejecting the rigid application of the teaching, suggestion, and motivation test by the Federal Circuit, indicated that “The principles underlying [earlier] cases are instructive when the question is whether a patent claiming the combination of elements of prior art is obvious. When a work is available in one field of endeavor, design incentives and other market forces can prompt variations of it, either in the same field or a different one. If a person of ordinary skill can implement a predictable variation, § 103 likely bars its patentability.” KSR Int'l v. Teleflex lnc., 127 S. Ct. 1727, 1740 (2007).
Applying the KSR standard of obviousness to Van der Auwera and Talasaz, the examiner concluded that the combination of the protocol of Van der Auwera for base calling, with the analysis of germline and somatic variants, using labels and information related to allele frequency and labels representing contribution from germline or somatic variants as taught by Talasaz, represented a combination of known elements which yield the predictable result of trained ML models, which are able to perform more accurate base calling, even in the presence of rare variants. The use of the workflow of Van der Auwera provided all the pre-processing and ML analysis required to achieve the requisite specificity and selectivity, and would easily have included allelic frequency or ground truth information provided by Talasaz. Such a combination is merely a "predictable use of prior art elements according to their established functions." KSR Int’l 7, 127 S. Ct. at 1740.
With respect to claim 2, and 17 Talasaz provides pluralities of mixtures of samples from different subjects as set forth above.
With respect to claim 3, the variant quality score of Van der Auwera provides relative frequency values. Talasaz also addresses allelic frequency.
With respect to claim 4, the VQSLOD of Van der Auwera provides a log odds probability that the call is a true positive.
With respect to claim 5, the labels and ground truth data used by Van der Auwera make the machine learning methods supervised methods. Talasaz also discloses supervised methods, including neural networks, SVM as set forth above.
With respect to claim 6, test data from samples can be analyzed the same way and applied to the trained model in both references.
With respect to claims 9-10 Talasaz analyses cfDNA from subjects with a disease, which can be cancer.
With respect to claim 11, Talasaz identifies SNV associated with somatic sources.
With respect to claim 12-14, Talasaz notes that affected molecular/ biological pathways can be identified and analyzed, including changes as a result of cancer, or associations between variants and diseases. [0071, 0156, ]
With respect to claim 15, the protocol of Van Der Auwera was shown to be more accurate that previous methods.
With respect to claim 16, confidence levels are provided.
Claim(s) 1-6, 9-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stokes (2014) and Talasaz (2014).
The earliest filing date for the pending claims in this application is 9/2/2015.
Stokes, M. E. Novel extensions of label propagation for biomarker discovery in genomic data. Dissertation, University of Pittsburgh, 2014. 134 pages. Of record in parent application.
Talasaz et al systems and methods to detect rare mutations and copy number variations. WO 2014/039556 A1 published 13 March 2014 (with priority available as early as 4 Sept 2013).
Stokes is directed to machine learning based methods of detecting SNV relevant to a disease in a DNA sample of any kind wherein cell free sequencing read data is obtained (Section 2.2, 2.3 GWAS data, 2.4 etc.) .
Talasaz is directed to the identification and analysis of rare mutations in cell free DNA (cfDNA) using neural networks, to identify germline variants and distinguish them from somatic variants. “The present disclosure provides a system and method for the detection of rare mutations and copy number variations in cell free polynucleotides… and application of bioinformatics tools to detect rare mutations and copy number variations as compared to a reference.” (abstract)
With respect to claim 1 and obtaining sequence read data from a test sample, having cfDNA which include germline and somatic DNA, Talasaz obtains these sequence reads from cell-free samples as set forth at [0003, 202-204, 0224-0252, 0293-0298]. Talasaz notes that in base calling, technical limitations can result in the presence of incorrect base calls at a locus, which can complicate the analysis of cell free samples that contain germline DNA as well as somatic DNA at [0207-0208].
To generate the sequence reads, Talasaz discloses high throughput sequencing of the cell free sample, at [00253-00257]. Base calling is disclosed at [00258-00267] including the generation of confidence scores for each called base, and the frequency of each base.
Talasaz indicates that germline and somatic linked variations can be distinguished multiple ways using one or more parameters, including: comparisons of quality scores [0294], mapping locations, mapping scores [0294], frequencies of alleles in wild-type/ healthy genomes [0258], tagging, manipulation of depth of coverage [0297], consensus reads, analysis of clonality, confidence scoring, variant base scoring/ frequency [0298].
This data is applied to statistical model, algorithm or machine learning process to make the most likely base call based on the calculated parameters. Talasaz discloses the use of neural networks, SVM, HMM, dynamic programming, and other strategies in analysis of the sequence reads [0275-0276, 0297-0303]. Talasaz applies these methods to detection of cancer in a sample, beginning at [00305] and Examples 1-8. Example 7 is directed to digital sequencing technologies able to identify and quantify rare tumor-derived nucleic acids among germline fragments [0374]. Talasaz provides some ground truth information for somatic mutations from the COSMIC database. Example 8 illustrates the ability to detect somatic mutations down to 0.1% sensitivity in certain predefined mixtures of germline and somatic DNA [0381]. Talasaz also demonstrated concordance between mutations identified using cell free DNA samples, and mutations identified from matched tumor biopsy samples [0386]. The calling of the bases uses the measure of a contribution from germline or somatic sources in the mixture as required for the labels of step (a)(ii).
Talasaz does not specifically set forth the structure of the machine learning model, or the specific training set comprising quantitative measures and class labels per se, or an iterative learning process.
In the same field of bioinformatics, and base calling, Stokes provides the analysis of cell free DNA sequence reads, using quantitative and qualitative labels in a trained ML model.
Cell free DNA sequence data is obtained by Stokes, and each base at each position in each sequence read is assigned a weight. Multiple sources of data for the training are disclosed at section 3.4.2. Quantitative measures and class labels include minor allele frequency, substitution effect, conservation scores such as Phylo-P and GERP++. See also Section 4, datasets. Stokes also provides synthetic training sets with known proportions of alleles, modeled either as common, or rare SNV. Section 4.1.1. A semi-synthetic training set is disclosed in 4.1.2. These meet the BRI of the elements of step (a) 1 and 2.
With respect to claim 1, and training the ML model with the training set, for the purpose of calling a base, Stokes provides applying the training sets to their LP model: a feed forward neural network. Figures 2 and 3 demonstrate the training structure configured accordingly with the prior knowledge being applied as training data described in section 3.4.2, 3.4.2.2.
Stokes trains the models using label propagation, LP (section 3.2, Fig 2, 3.3, 3.4. Stokes then assigns class labels which can come from a separate training set or the same set. Stokes parameter alpha reads on the predetermined weight, or alternatively defines a set of predetermined weights for each position. Any value greater than zero and less than 1, results in the generation of a set of values that weights the reads/positions in the training data.
“Instead of using the counts derived from direct observations, I use partial pseudocounts derived from the LP method to fill in the contingency table. By running LP before filling in the contingency table, I allow some information to diffuse around the network, softening the hard labels. The amount of diffusion is controlled by the parameter α. At α = 0, the algorithm relies solely on the initial labeling and allows no diffusion. This setting keeps the hard labeling, and produces a count table that is derived only from observations. At α = 1, the propagation process dominates, resulting in diffuse, uniform labeling. This leads to an uninformative count table where every column has the same distribution of counts. An intermediate setting for α between 0 and 1 allows for some diffusion, while being sensitive to the initial labeling.” (p37)
The output are “pseudocounts” which are equivalent to tally vectors. These pseudocounts are used to adjust the LP SVM resulting in SNP identification in the sample.
“The soft labeling method results in a contingency table based on partial pseudocounts, to which I applied the chi squared test. The result is a likelihood that the phenotype distributions are different across the SNP states. This approach provides a single score for each SNP, as well as a readily interpretable probability value is a measure of a SNP’s ability to discriminate between cases and controls.” (p37, Fig 3).
In KSR Int 'l v. Teleflex, the Supreme Court, in rejecting the rigid application of the teaching, suggestion, and motivation test by the Federal Circuit, indicated that “The principles underlying [earlier] cases are instructive when the question is whether a patent claiming the combination of elements of prior art is obvious. When a work is available in one field of endeavor, design incentives and other market forces can prompt variations of it, either in the same field or a different one. If a person of ordinary skill can implement a predictable variation, § 103 likely bars its patentability.” KSR Int'l v. Teleflex lnc., 127 S. Ct. 1727, 1740 (2007).
Applying the KSR standard of obviousness to Stokes and Talasaz, the examiner concluded that the combination of the protocol of Stokes for base calling, with the analysis of germline and somatic variants, using labels and information related to allele frequency and labels representing contribution from germline or somatic variants as taught by Talasaz, represented a combination of known elements which yield the predictable result of trained ML models, which are able to perform more accurate base calling, even in the presence of rare variants. The use of the workflow of Stokes provided all the pre-processing and ML analysis required to achieve the requisite specificity and selectivity, and would easily have included allelic frequency or ground truth information provided by Talasaz. Such a combination is merely a "predictable use of prior art elements according to their established functions." KSR Int’l 7, 127 S. Ct. at 1740.
With respect to claim 2, and 17 synthetic data combinations of training sets are provided by both Stokes and Talasaz.
With respect to claim 3, relative allele frequency is calculated in both references.
With respect to claim 4, a probability or likelihood is provided in both references.
With respect to claim 5, Stokes provides a neural network, that is semi-supervised. Talasaz provides neural networks, machine learning, SVM, etc.
With respect to claim 6, both Stokes and Talasaz generate a test data set in the same manner as above, and applies it to the trained model.
With respect to claim 8, cell free DNA is disclosed by Talasaz and Stokes.
With respect to claim 9, cfDNA molecules from individuals with a disease are provided at Table 4 and section 4 of Stokes, and throughout Talasaz.
With respect to claim 12, related pathway information can be linked and retrieved as set forth by Talasaz and Stokes.
With respect to claim 14, clinical phenotypes can be revealed by both references.
With respect to claim 15, accuracy is measured and, in some cases, improved in both references.
With respect to claim 16, confidence levels can be obtained, and are specifically provided by Talasaz.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-6, 9-17 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 11,302,416. Although the claims at issue are not identical, they are not patentably distinct from each other because both sets of claims: 1) obtain the same type of sequence read data from cell-free DNA samples, which have germline and somatic DNA, 2) apply a trained machine learning unit to the data to generate base calls. The training used a training set comprising quantitative and qualitative measures related to each base of the sequence read, including contributions from germline or somatic sources. Both ML processes are iterative, and then detect SNV and classify them as likely to have originated from a germline or somatic source. The instant application is generic with respect to the structure of the ML.
The addition of the germline/ somatic limitations redirects the invention to the same as that issued in the parent application which became the ‘416.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARY K ZEMAN whose telephone number is 5712720723. The examiner can normally be reached on 8am-2pm M-F. Email may be sent to mary.zeman@uspto.gov if the appropriate permissions have been filed.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Larry Riggs can be reached on 571 270-3062. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MARY K ZEMAN/ Primary Examiner, Art Unit 1686