DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Status
Claims 1, 2, 7-9, 13-14, 17-18, 21, 24-27, 30-35 are pending and examined on the merit.
Claims 3-6, 10-12, 15-16, 19-20, 22-23, and 28-29 are canceled.
Priority
The instant application is the National Stage entry of PCT/US2019/058646, International Filing Date: 10/29/2019. As such, the effective filing date assigned to each of claims 1, 2, 7-9, 13-14, 17-18, 21, 24-27, 30-35 is 10/29/2019.
In this action, all claims are examined as though they had an effective filing date of 10/29/2019. In future actions, the effective filing date of one or more claims may change, due to amendments to the claims, or further analysis of the disclosure(s) of the priority application(s).
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/12/2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the list of cited references was considered in full by the examiner. A signed copy of the corresponding 1449 form has been included with this Office action.
Drawings
The drawings filed 07/27/2022 are accepted.
Specification
The amendments to the specification filed 01/12/2023 have been accepted.
Claim rejection - 35 USC§ 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 9 and 24 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 9 recites " The method of claim 8, further comprising: foregoing adding, using the one or more processors, a variant to the set of one or more genomic variants based on the path " in lines 1-3. However, it is unclear what is encompassed by recitation of “foregoing adding”, specifically it is unclear to add or not to add a variant, because none of the previous claims from which claim 9 depends, adds a variant. For purpose of examination, said claim will be interpreted to mean "adding, …, a variant”. See MPEP 2173.06 II.
Claim 24 recites “The method of claim 1, wherein the plurality of categories comprises an insertion, a deletion, a substitution, a rearrangement, or any combination thereof”, in lines 1-2, which has unclear antecedence. Claim 1, from which claim 24 depends, does not instantiate a “plurality of categories”. As such, failing to particularly point out and distinctly claim the subject matter.
Claim Rejections - 35 USC § 112(d)
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS. - Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claim 9 is rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Claim 9 recites “The method of claim 8, further comprising: foregoing adding, using the one or more processors, a variant to the set of one or more genomic variants based on the path”, which does not further limit claim 8 because claim 9 recites “…foregoing adding …". Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 2, 7-9, 13-14, 17-18, 21, 24-27, 30-35 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
The Supreme Court has established a two-step framework for this analysis, wherein a claim does not satisfy § 101 if (1) it is “directed to” a patent-ineligible concept, i.e., a law of nature, natural phenomenon, or abstract idea, and (2), if so, the particular elements of the claim, considered “both individually and as an ordered combination,” do not add enough to “transform the nature of the claim into a patent-eligible application.” Elec. Power Grp., LLC v. Alstom S.A., 830 F.3d 1350, 1353 (Fed. Cir. 2016) (quoting Alice, 134 S. Ct. at 2355). Applicant is also directed to MPEP 2106.
Step 1: The instantly claimed invention (claim(s) 1, 2, 7-9, 13-14, 17-18, 21, and 24-27 being representative) is directed to a method, (claim(s) 30-35 being representative) is directed to a system. Therefore, the instantly claimed invention falls into one of the four statutory categories. [Step 1: YES]
Step 2A: First it is determined in Prong One whether a claim recites a judicial exception, and if so, then it is determined in in Prong Two if the recited judicial exception is integrated into a practical application of that exception.
Step 2A, Prong 1: Under the MPEP § 2106.04, the Step 2A (Prong 1) analysis requires determining whether a claim recites an abstract idea, law of nature, or natural phenomenon.
Claim(s) 1, 2, 7-9, 13-14, 17-18, 21, 24-27, and 30-35 recite the following steps which fall under the mathematical concepts, mental processes, and/or certain methods of organizing human activity groupings of abstract ideas:
Claims 1, 30, and 31 recite identifying a subset of the plurality of sequence reads associated with the individual; the limitation “identifying” given the plain meaning of identifying encompasses observation, evaluation, judgment, and opinion (See MPEP 2106.04(a)(2), subsection III.) performable by human mind (mental process), since human mind is capable of identifying based on known information/sequences.
Claims 1, 30, and 31 further recite constructing a first graph representation of at least a portion of the reference sequence; constructing a second graph representation based on the subset of the plurality of sequence reads associated with the individual and the first graph representation; the limitation constructing a representation is considered a mathematical calculation, as claimed in claim 2 (the first and the second graph representations is a De Bruijn graph), and as such, falls into mathematical concepts groupings of abstract ideas. Said limitation is also considered mental process of constructing a representation using mathematical operations, for example by using a pen and paper or performing on a generic computer MPEP 2106.04 (a) (2) III. C.
Claims 1, 30, and 31 further recite identifying a set of one or more genomic variants one or more candidate variants based on the second graph representation; the limitation “identifying” given the plain meaning of identifying encompasses observation, evaluation, judgment, and opinion (See MPEP 2106.04(a)(2), subsection III.) performable by human mind (mental process), since human mind is capable of identifying based on known information/sequences.
Claim 2 recites that the first and the second graph representations is a De Bruijn graph (mathematical calculation/mathematical concepts).
Claim 7 recites traversing a path diverging from a reference node in the second graph representation until a traversal termination condition of a plurality of traversal termination conditions is met; the limitation traversing is considered a mental process of moving along a path, for example by using a pen and paper, until one or more criteria is met.
Claims 9 and 14 recite adding a variant to the set of one or more genomic variants based on the path (mental process of adding data to a set).
Claim 17 recites clustering the plurality of candidate variants when one or more predefined criteria are met; the limitation clustering is considered a mental process of grouping data based on criteria.
Claim 18 recites updating the plurality of candidate variants by removing candidate variants belonging to a problematic cluster or by decomposing one or more candidate variants in the plurality of candidate variants (mental process of updating by removing and decomposing data).
Claim 25 recites identifying a variant of interest from the set of genomic variants (mental process of identifying data).
Claim 26 recites directing a treatment based on the variant of interest (mental process of directing/guiding a treatment).
Claim 32 recites identifying (mental process of identifying data based on known data) one or more candidate variants based on the second graph representation; and adding (mental process of adding data) the identified one or more candidate variants to the set of one or more genomic variants.
Claim 33 recites determining if one or more of the plurality of termination conditions is met; incrementing the node length by a predefined value; and repeat steps (g)-(j); when one or more of the plurality of termination conditions are met: foregoing repeating steps (g)-(j) (mental process of determining whether criteria is met).
Claim 34 recites identifying (mental process) one or more candidate variants based on the second graph representation; determining (mental process) if the one or more candidate variants were identified by a previously-iterated second graph representation; and when the one or more candidate variants were previously identified, determining (mental process) that the set of genomic variants is not to be updated to include the previously identified one or more candidate variants.
Claims 8, 13, 21, 24, and 34 provides further information about abstract ideas of traversal termination conditions, categories (rejected for lack of antecedent basis above), and cluster.
Additionally, claims 1, 2, 7-9, 13-14, 17-18, 21, 24-27, and 30-35 recite a correlation between nucleic acid molecules of an individual and identification of one or more variants, and as such, falls into judicial exception of Laws of nature and natural phenomena. See MPEP 2106(b) I.
The identified claims recite a law of nature, a natural phenomenon (product of nature) or fall into one of the groups of abstract ideas of mathematical concepts, mental processes, and/or certain methods of organizing human activity for the reasons set forth above. See MPEP 2106.04 (a)(2) III and MPEP 2106.04 (b) I. Therefore, claims are directed to one or more judicial exception(s) and require further analysis in Prong Two. [Step 2A, Prong 1: YES]
Step 2A: Prong 2: Under the MPEP § 2106.04, the Step 2A, Prong 2 analysis requires identifying whether there are any additional elements recited in the claim beyond the judicial exception(s), and evaluating those additional elements to determine whether they integrate the exception into a practical application of the exception. This judicial exception is not integrated into a practical application for the following reasons.
The additional elements of claim(s) 1, 2, 7-9, 13-14, 17-18, 21, 24-27, and 30-35 include the following:
Claim 1 recites providing a plurality of nucleic acid molecules obtained from a sample from an individual; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules; sequencing, by a sequencer, the captured nucleic acid molecules to obtain a plurality of sequence reads that represent the captured nucleic acid molecules.
Claim 17 recites obtaining a plurality of candidate variants.
Claim 27 recites providing an output indicative of a diagnosis based on the variant of interest.
Claim 30 recites an electronic device, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: receiving a plurality of sequence reads.
Claim 31 recites a non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device having a display, cause the electronic device to: receive a plurality of sequence reads.
The additional elements of a electronic device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions, a display, and a non-transitory computer-readable storage medium are generic computer components and/or processes. There are no limitations that indicate that the processor, input module, processing module, or output module in the computer-implemented system require anything other than generic computing systems. The courts have found the use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application. See MPEP 2106.05(f).
Furthermore, the additional elements of obtaining variants, receiving reads and providing an output amount to necessary data gathering and outputting. The courts have found the limitations that amount to necessary data gathering and outputting are insignificant extra-solution activity that do not integrate a recited judicial exception into a practical application in Mayo, 566 U.S. at 79, 101 USPQ2d at 1968 and O/P Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015) (see MPEP 2106.05(g)).
Furthermore, the additional elements of the providing a plurality of nucleic acid molecules obtained from a sample from an individual; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules; sequencing, by a sequencer, the captured nucleic acid molecules to obtain a plurality of sequence reads that represent the captured nucleic acid molecules, as recited in claim 1, only serves to collect the information for use by the abstract idea.
Therefore, the additionally recited elements amount to generic computer components and/or insignificant extra-solution activity and, as such, the claims as a whole do no integrate the abstract idea into practical application. See MPEP 2106.05(g). Thus, claims 1, 2, 7-9, 13-14, 17-18, 21, 24-27, and 30-35 are directed to an abstract idea. [Step 2A, Prong 2: NO]
Step 2B: In the second step it is determined whether the claimed subject matter includes additional elements that amount to significantly more than the judicial exception. An inventive concept cannot be furnished by an abstract idea itself. See MPEP § 2106.05.
The additional elements of claim(s) 1, 2, 7-9, 13-14, 17-18, 21, 24-27, and 30-35 include the following:
Claim 1 recites providing a plurality of nucleic acid molecules obtained from a sample from an individual; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules; sequencing, by a sequencer, the captured nucleic acid molecules to obtain a plurality of sequence reads that represent the captured nucleic acid molecules.
Claim 17 recites obtaining a plurality of candidate variants.
Claim 27 recites providing an output indicative of a diagnosis based on the variant of interest.
Claim 30 recites an electronic device, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: receiving a plurality of sequence reads.
Claim 31 recites a non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device having a display, cause the electronic device to: receive a plurality of sequence reads.
The additional elements of a electronic device comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions, a display, and a non-transitory computer-readable storage medium are conventional computer components and/or processes. The courts have found the use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general-purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TU Communications LLC v. AV Auto, LLC, 823 F.3d 607,613,118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit).
Furthermore, the additional elements of obtaining variants, receiving reads and providing an output amount to necessary data gathering and outputting. The courts have found the limitations that amount to necessary data gathering and outputting are insignificant extra-solution activity that do not amount to significantly more (see MPEP 2106.05(g)).
Furthermore, the additional elements of the providing a plurality of nucleic acid molecules obtained from a sample from an individual; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules; sequencing, by a sequencer, the captured nucleic acid molecules to obtain a plurality of sequence reads that represent the captured nucleic acid molecules, as recited in claim 1, amount to conventional methods and systems for next-generation sequencing library preparation.
This position is supported by Head et al. (Library construction for next-generation sequencing: Overviews and challenges, Published in final edited form as: Biotechniques. 2014 Feb 1; 56(2):61, pages 1-31). Head reviews NGS library preparation from DNA and discloses that “DNA is extracted from sample tissue/cells and fragmented. RNA is converted to cDNA by reverse transcription. DNA Fragments are converted into the library by ligation to sequencing adapters containing specific sequences designed to interact with the NGS platform, either the surface of the flow-cell (Illumina) or beads (Ion Torrent). The next step involves clonal amplification of the library, by either cluster generation for Illumina or microemulsion PCR for Ion Torrent. The final step generates the actual sequence via the chemistries for each technology. One difference between the two technologies is that Illumina allows sequencing from both ends of the library insert (i.e., paired end sequencing)” (pg. 9 last para-pg. 16; Figure 1).
This position is further supported by Bohannon et al. (Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables, Computational and Structural Biotechnology Journal, 2019 Apr 8;17:561–569). Bohannon reviews sample preparation, library preparation, and sequencing, alongside diverse biological and clinical variables, and evaluate their effect on variant caller selection and optimization (abstract) and discloses that before a sample can be sequenced, it must be processed to convert whole genomic DNA molecules into a collection of DNA fragments that are appropriate for accurate genomic sequencing and analysis. Bohannon further discloses Library preparation workflows can vary depending on experimental goals, and some of the most important steps that might affect variant caller performance include DNA amplification, isolation of specific genomic regions of interest (e.g., in the case of exome or targeted sequencing), and ligation of index sequences and/or UMIs (Fig. 3). Bohannon further discloses that library preparation for WES and targeted sequencing, when compared to WGS, often includes several extra steps related to capturing their specific fragments of interest. In particular, in WES, biotinylated probes hybridize to (or “capture”) DNA fragments associated with known exons, which are then isolated using streptavidin beads (pg. 563-567), such that the combination of the above additional elements with conventional computer components and/or processes is also well-understood, routine, and conventional.
Therefore, the additional element is not sufficient to amount to significantly more than the judicial exception.
Taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception(s). Even when viewed as a combination, the additional elements fail to transform the exception into a patent-eligible application of that exception. Thus, the claims as a whole do not amount to significantly more than the exception itself. [Step 2B: NO]
Therefore, the instantly rejected claims are not drawn to eligible subject matter as they are directed to an abstract idea without significantly more.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 2, 7-9, 13-14, 17-18, 21, 24-27, 30-35 are rejected under 35 U.S.C. 103 as being unpatentable over Ashutosh et al. (JP 6882373 B2), in view of Van Rooyen et al. (US 20170270245 A1), and further in view of Peng et al. (IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth, Bioinformatics, Vol. 28 no. 112012, pages 1420–1428, April 11, 2012).
Regarding claims 1, 30, and 31, Ashutosh discloses a method for finding variants from targeted sequencing panels or identifying a sequence variant in an enriched sample [0009]. Ashutosh further discloses that the method is a computer implemented method comprising a processor, memory executable program, and a monitor [0010]; reading on limitations of a computer enabled method for identifying a set of genomic variants in an individual.
Ashutosh further discloses (a) obtaining (i) a plurality of sequence reads from a sample that has been enriched for a genomic region (claim 1), where the genomic region is a genomic region of the human genome (claim 5); reading on limitations of: (a) providing a plurality of nucleic acid molecules obtained from a sample from an individual
Ashutosh further discloses that enriched genomic regions may be enriched from the initial genomic sample using any convenient method, for example using hybridization to oligonucleotide probes or using ligation-based methods [0041]; reading on limitations of (b) ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules.
Ashutosh further discloses that the genomic DNA is amplifies [0041-0042]: reading on limitations of ;(c) amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules.
Ashutosh further discloses capturing amplifies nucleic acid molecules (pg. 27, first para.); reading on imitations of (d) capturing amplified nucleic acid molecules from the amplified nucleic acid molecules.
Ashutosh further discloses sequencing the DNA by ligation [0046]; reading on limitations of (e) sequencing, by a sequencer, the captured nucleic acid molecules to obtain a plurality of sequence reads that represent the captured nucleic acid molecules.
Ashutosh further discloses that sequence reads can be assembled to obtain a plurality of discrete sequence assemblies that each corresponds to a potential variant sequence reads can be assembled by aligning each read to a reference sequence, such as a reference genome [0049]. Ashutosh further discloses that method is a computer implemented method (claims 17-19). Ashutosh further discloses targeted sequencing/ enrichment, which focuses on a subset genomic region [0046]. Ashutosh further discloses that genomic regions are composed of a genomic region of interest (e.g., a cancer-associated gene) [0057]; reading on limitations of identifying, using one or more processors, a subset of the plurality of sequence reads associated with the individual.
Ashutosh further discloses assembling the sequence reads may comprise making a directed graph, such as a de Bruijn graph collecting overlapping k-mers from the sequencing reads, including subsequences of length k within the reads, in a target region; splitting each k-mer into two overlapping (k-1)-mers; and assigning a vertex or node of the graph to each (k-1)-mer and an edge connecting the two nodes in the graph to the k-mer… Thus, each sequence read is represented in the graph as a path through the k-mers, and a potential sequence contig may be represented in the graph by joining multiple paths through the k-mers [0050].
Further regarding claims 30 and 31, Ashutosh discloses a computer system which includes one or more processors, memory, program, instructions, and display [0076].
Van Rooyen discloses a genomics analysis platform for executing a sequence analysis pipeline (claim 1) where the processing engine includes a variant call module for processing the mapped, aligned, and/or sorted reads, such as with respect to a reference genome, to thereby produce an HMI readout and/or variant call file for use with and/or detailing the variations between the sequenced genetic data and the reference genomic reference data [0011].
Van Rooyen further discloses that the variant call module employs De Bruijn graph function [0251], where for each given active region, all the overlapping reads may be assembled into a “De Bruijn graph” (DBG) matrix [0252]. Van Rooyen further discloses that in performing a variant call function, as disclosed herein, a De Bruijn Graph may be formulated, and when all of the reads in a pile up are identical (for example, identical to the reference), the DBG will be linear. However, where there are differences (between sample and reference), the graph will form “bubbles” that are indicative of regions of differences resulting in multiple paths diverging from matching the reference alignment and then later re-joining in matching alignment… From this DBG, various paths may be extracted, which form candidate haplotypes … likelihood calculation of each read against candidate haplotypes using HMM … calculating absolute probability of each genotype (for example, identifying a variant) [0287].
Van Rooyen further discloses given all of the reads, the DBG may be generated using each reference region as a backbone (for example, first graph representation). Then all of the candidate variant positions can be aligned to universal coordinates. Specifically, FIG. 20 illustrates a flow chart setting forth the process of generating a DBG and using the same to produce candidate haplotypes. More specifically, the De Bruijn graph may be employed in order to create the candidate variant list of SNPs and INDELs. Given that there are N regions that are being jointly processed by MRJD, N de-bruijn graphs may be constructed. In such an instance, every graph may use one reference region as a backbone and all of the reads (for example, second graph representation) corresponding to the N regions [0455]. Van Rooyen further discloses that only the bubbles need be extracted from the graphs that are representative of the candidate variants [0446].
Van Rooyen further discloses that a graph matrix may be formed by taking all possible N base k-mers, e.g., 10 base k-mers, which can be generated from each given read by sequentially walking the length of the read in ten base segments, where the beginning of each new ten base segment is offset by one base from the last generated 10 base segment. This procedure may then be repeated by doing the same for every read in the pile up within the active window (for example, iteratively constructing the second graph). … The generated k-mers may then be aligned with one another such that areas of identical matching between the generated k-mers are matched to the areas where they overlap so as to build up a data structure, e.g., graph, that may then be scanned and the percentage of matching and mismatching may be determined. (for example, same node length for reference sequence) [0265] (FIG. 20), see also, stopping criteria of [0478-0479]. Van Rooyen further discloses that a candidate variant list may be created, where, a joint pileup may be formed and a De Bruijn graph (DBG) or other assembly graph may be constructed ([0444], FIG. 20); reading on limitations of constructing, using the one or more processors, a first graph representation of at least a portion of the reference sequence, wherein the first graph representation comprises a plurality of reference nodes and wherein each reference node of the plurality of reference nodes has a same node length; (h)(d- constructing, using the one or more processors, a second graph representation based on the subset of the plurality of sequence reads associated with the individual and the first graph representation. (i) identifying, using the one or more processors, a set of one or more genomic variants based on the second graph representation.
Peng discloses IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. IDBA-UD algorithm is based on the de Bruijn graph approach for assembling reads from single cell sequencing or metagenomic sequencing technologies with uneven sequencing depths (abstract). Peng further discloses IDBA-UD iterates the value of k from kmin to kmax. In each iteration, an ‘accumulated de Bruijn graph ’Hk for a fixed k is constructed from the set of input reads and the contigs (Ck−s and LCk−s) constructed in previous iterations, i.e. these contigs are treated as input reads for constructing Hk. In each iteration, IDBA-UD also progressively increases the value of depth cutoff thresholds for removing some low-depth contigs so as to get longer confident contigs (Ck) in Hk.(pg. 1421, col. 2, para. 3- pg. 1422, col. 1, para. 1-2; Figure 1) (for example, representation is constructed iteratively until one or more of a plurality of termination conditions is met).
Regarding claim 2, Ashutosh further discloses assembling the sequence reads may comprise making a directed graph, such as a de Bruijn graph collecting overlapping k-mers from the sequencing reads [0050]. Additionally, Van Rooyen further discloses performing variant calling using “De Bruijn graph” (DBG) [0252]; reading on limitations of wherein each of the first and the second graph representations is a De Bruijn graph.
Regarding claim 7, Van Rooyen discloses that where the same k-mers are generated from a multiplicity of reads, e.g., where each k-mer has the same sequence pattern, they may be accounted for in the graph by increasing the count for that pathway where the k-mer overlaps an already existing k-mer pathway. Hence, where the same k-mer is generated from a multiplicity of overlapping reads having the same sequence, the pattern of the pathway between the graph will be repeated over and over again and the count for traversing this pathway through the graph will be increased incrementally in correspondence therewith. In such an instance, the pattern is only recorded for the first instance of the k-mer, and the count is incrementally increased for each k-mer that repeats that pattern, for example, termination condition is k-mer overlap detection [0264] [0272] [0381]; reading on limitations of traversing a path diverging from a reference node in the second graph representation until a traversal termination condition of a plurality of traversal termination conditions is met.
Regarding claims 8 and 13, Van Rooyen discloses operations start at cell (0,0), with M state calculations beginning 10 clock cycles before I and D state calculations begin. The next cell to traverse should be cell (1,0) (for example, the ambiguous node). However, there is a ten cycle latency after the start of I and D calculations before the results from cell (0,0) will be available. The hardware, therefore, inserts nine “dead” cycles into the compute pipeline … Such processing may continue until the end of the last full diagonal in the swath 35 a, which, in this example (that has a read length of 35 and haplotype length of 14), will occur after the diagonal that begins with the cell at (hap, rd) coordinates of (13,0) is completed, for example the determination that the path includes a cycle or the path includes a dead end (for the dead end determination, see also, FIG. 16 “ending cell”)[0381-0386]. Van Rooyen further discloses if there are a lot of divergent pathways (bubbles) instead of extracting complete source to sink haplotypes from start to finish, e.g., from the beginning of the sequence to the end, only the sequences associated with the individual bubbles need be extracted, e.g., only the bubbles need to be aligned to the reference. Accordingly, the bubbles are extracted from the DBG, the sequences aligned to the reference, and from these alignments, specific SNPs, insertions, deletions, and the like may be determined, with respect as to why the sequences of the various bubbles differ from the reference [0417]; reading on limitations of wherein the traversal termination comprises at least one of (1) a determination that the path includes an ambiguous node, or (2) a determination that the path includes a cycle in claim, and wherein the traversal termination comprises at least one of (1) a determination that the path includes a dead end, or (2) a determination that the path joins a reference node that is not an ambiguous node, in claim 13.
Regarding claim 9, Van Rooyen discloses that the De Bruijn graph may be employed in order to create the candidate variant list of SNPs and INDELs [0445] [0452]; reading on limitations of adding variant to the list of set of one or more genomic variants based on the path.
Regarding claim 14, Van Rooyen discloses creating a candidate variant list after constructing a De Bruijn graph (DBG), where the DBG is used to extract the candidate variants from the joint pileups. The construction of the DBG is performed in such a manner as to generate bubbles, indicating variations, representing alternate pathways through the graph where each alternate path is a candidate haplotype [0444]. See, for instance, FIGS. 20 and 21; reading on limitations of adding a variant to the set of one or more genomic variants based on the path.
Regarding claim 17, Van Rooyen discloses after a list of active/candidate variant positions (SNPS/INDELs) is generated, each of these candidate variants may be processed and evaluated by the MRJD pre-processing engine(s) (an algorithm to determine obsereved differences) [0438]. Van Rooyen further discloses that MRJD may be employed to determine an initial answer, with respect to one or more, e.g., all, homologous regions, and then single region detection may be applied back to one or more, e.g., all, single or non-homologous regions, e.g., employing the same basic analysis, and thus, better accuracy may be achieved…a maximum cutoff may be introduced to keep the calculations manageable [0415-0416].
Van Rooyen further discloses a connection matrix may be computed so as to define the order of processing of identified candidate variants that are obtained from the De Bruijn graph. This matrix may be constructed and employed in conjunction with or as a sorting function to determine which candidate variants to process first. This connection matrix, therefore, may be a function of the mean read length and the insert size of the paired-end reads. Accordingly, for a given candidate variant, other candidate variant positions that are at integral multiples of the insert size or within the read length have higher weights compared to the candidate variants at other positions [0450]; reading on limitations of wherein identifying the set of one or more genomic variants one or more candidate variants based on the second graph representation comprises: obtaining a plurality of candidate variants; and clustering the plurality of candidate variants when one or more predefined criteria are met.
Regarding claim 18, Van Rooyen discloses only the bubbles need be extracted from the graphs that are representative of the candidate variants [0446]… all candidate variants may be mapped, such as to a universal coordinate system, so as to produce the candidate list, and the candidate variant list may be sent as an input to a pruning module [0448].Van Rooyen discloses that the final genotype matrix may be updated based on a user-defined confidence metric of variants which is computed using the intermediate genotype matrix. The various steps of these processes are set forth in the process flow diagram of FIG. 24 [0461-0462]. Van Rooyen further discloses a growing the tree algorithm where a branching tree graph of joint haplotypes/diplotypes may be built in such a manner that as the tree grows, the underlying algorithm functions to both grow and prune the tree at the same time as more and more calculations are made… Hence, as the tree grows and is pruned, not all of the hypothesized haplotypes need to be calculated [0418]. Van Rooyen further discloses that with respect to the growing of the tree function, when there is disagreement between two references, or between the references and the reads, as to what base is present at given positions being resolved, it must be determined which base actually belongs in which position, and in view of such disagreements it must be determined which differences may be caused by SNPs, Indels, or the like, versus which are machine errors [0419] FIG. 27; reading on limitations of updating the plurality of candidate variants by removing candidate variants belonging to a problematic cluster or by decomposing one or more candidate variants in the plurality of candidate variants.
Regarding claim 21, Van Rooyen discloses a graph matrix may be formed by taking all possible N base k-mers, e.g., 10 base k-mers, which can be generated from each given read by sequentially walking the length of the read in ten base segments, where the beginning of each new ten base segment is offset by one base from the last generated 10 base segment. This procedure may then be repeated by doing the same for every read in the pile up within the active window [0265]. Van Rooyen further discloses analyzing specific regions (active window) using De Bruijn graph, where only the portions of reads falling within the "active window" boundaries are considered by the DBG, while external portions are discarded to focus on the area of interest [0261-0262]. Peng discloses removing dead-ends with length that exceeds a threshold (pg. 1422, col. 2, Algorithm 1); reading on limitations of wherein the plurality of termination conditions comprises a determination that node length exceeds a threshold, a determination that no nodes or edges are associated with a reconsideration list, or any combination thereof.
Regarding claim 24, Van Rooyen discloses determining one or more transition probabilities for the sequence of nucleotides of the read of genomic sequence going from one state to another, such as from a match state to an indel state, or match state to a delete state, and/or back again such as from an insert or delete state back to a match state [0036]. Van Rooyen further discloses that the De Bruijn graph may be employed in order to create the candidate variant list of SNPs and INDELs [0445]; reading on limitations of wherein the plurality of categories comprises an insertion, a deletion, a substitution, a rearrangement, or any combination thereof.
Regarding claims 25 and 26, Van Rooyen discloses calling, variant quality score recalibration, variant filtering, variant annotation from known variant databases, for example, method steps of identifying variant of interest [0581]. Van Rooyen further discloses that once the subject's genome has been reconstructed and/or a VCF has been generated, such data may then be subjected to tertiary processing (for example, calling, variant quality score recalibration, variant filtering, variant annotation from known variant databases to identify variant of interest) so as to interpret it, such as for determining what the data means with respect to identifying what diseases this person may or may have the potential for suffer from and/or for determining what treatments or lifestyle changes this subject may want to employ so as to ameliorate and/or prevent a diseased state. For example, the subject's genetic sequence and/or their variant call file may be analyzed to determine clinically relevant genetic markers that indicate the existence or potential for a diseased state and/or the efficacy of a proposed therapeutic or prophylactic regimen may have on the subject. This data may then be used to provide the subject with one or more therapeutic or prophylactic regimens so as to better the subject's quality of life, such as treating and/or preventing a diseased state [0151]; reading on limitations of identifying a variant of interest from the set of genomic variants; directing a treatment based on the variant of interest.
Regarding claim 27, Van Rooyen further discloses that once one or more of an individual's genetic variations are determined, such variant call file information can be used to develop medically useful information, which in turn can be used to determine, e.g., using various known statistical analysis models, health related data and/or medical useful information, e.g., for diagnostic purposes, e.g., diagnosing a disease or potential therefore, clinical interpretation (e.g., looking for markers that represent a disease variant) [0152]; reading on limitations of providing an output indicative of a diagnosis based on the variant of interest.
Regarding claim 32, Ashutosh discloses identifying sequence variation and outputting a report indicating a sequence variant (claim 1). Van Rooyen disxloses a candidate variant list may be created as follows. First, a joint pileup may be formed and a De Bruijn graph (DBG) or other assembly graph may be constructed [0444], FIG. 20. Van Rooyen further discloses comparing each base of a subject's sequenced genetic code, such as in read format, with the various known or generated candidate haplotypes of a reference sequence and determining the probability that any given base at a position being considered either matches or doesn't match the relevant haplotype, e.g., the read includes an SNP, an insertion, or a deletion, thereby resulting in a variation of the base at the position being considered [0299]. Van Rooyen further discloses calculating the probability of every candidate variant [0464-0470]. Van Rooyen further discloses that the bubbles are extracted from the DBG, the sequences aligned to the reference, and from these alignments, specific SNPs, insertions, deletions, and the like may be determined, with respect as to why the sequences of the various bubbles differ from the reference [0417]. ; reading on limitations of identifying the one or more genomic variants includes: (j) identifying, using the one or more processors, one or more candidate variants based on the second graph representation; and (k) adding, using the one or more processors, the identified one or more candidate variants to the set of one or more genomic variants.
Regarding claim 33, Van Rooyen discloses that given all of the reads, the DBG may be generated using each reference region as a backbone. Then all of the candidate variant positions can be aligned to universal coordinates. Specifically, FIG. 20 illustrates a flow chart setting forth the process of generating a DBG and using the same to produce candidate haplotypes. More specifically, the De Bruijn graph may be employed in order to create the candidate variant list of SNPs and INDELs. Given that there are N regions that are being jointly processed by MRJD, N de-bruijn graphs may be constructed. In such an instance, every graph may use one reference region as a backbone and all of the reads corresponding to the N regions [0445]. Van Rooyen further discloses that in performing a variant call function, as disclosed herein, an active region identification operation may be implemented, such as for identifying places where multiple reads in a pile up within a given region disagree with the reference, and for generating a window around the identified active region, so that only these regions may be selected for further processing. Additionally, localized haplotype assembly may take place, such as where, for each given active region, all the overlapping reads in the pile up may be assembled into a “De Bruijn graph” (DBG) matrix. From this DBG, various paths through the matrix may be extracted, where each path constitutes a candidate haplotype, e.g., hypotheses, for what the true DNA sequence may be on at least one strand [0273]. Van Rooyen further discloses that the initial starting point is with the root of the tree being G0, and the joint diplotype having all events unresolved. Then a particular event, e.g., an initial bubble, is selected as the origin for determination, whereby the initial event is to be resolved for all of the haplotypes, where the event may be a first point of divergence from the reference, such as with respect to the potential presence of an SNP or Indel at position one…. representing the potential resolutions for the event at position one [0422].
Peng discloses IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. IDBA-UD algorithm is based on the de Bruijn graph approach for assembling reads from single cell sequencing or metagenomic sequencing technologies with uneven sequencing depths (abstract). Peng further discloses IDBA-UD iterates the value of k from kmin to kmax. In each iteration, an ‘accumulated de Bruijn graph ’Hk for a fixed k is constructed from the set of input reads and the contigs (Ck−s and LCk−s) constructed in previous iterations, i.e. these contigs are treated as input reads for constructing Hk. In each iteration, IDBA-UD also progressively increases the value of depth cutoff thresholds for removing some low-depth contigs so as to get longer confident contigs (Ck) in Hk.(pg. 1421, col. 2, para. 3- pg. 1422, col. 1, para. 1-2; Figure 1); reading on limitations of determining, using the one or more processors, if one or more of the plurality of termination conditions is met; when none of the plurality of termination conditions is met: incrementing, using the one or more processors, the node length by a predefined value; and repeat steps (g)-(j); when one or more of the plurality of termination conditions are met: foregoing repeating steps (g)-(j).
Regarding claim 34, Van Rooyen discloses that when there is disagreement between two references, or between the references and the reads, as to what base is present at given positions being resolved, it must be determined which base actually belongs in which position, and in view of such disagreements it must be determined which differences may be caused by SNPs, Indels, or the like, versus which are machine errors [0419]. Van Rooyen further discloses elimination of equivalent mirror-image representations across regions [0488] FIG. 27; reading on limitations of wherein the problematic cluster includes a reference node having more than one instance in the first graph representation.
Regarding claim 35, Van Rooyen discloses that as the tree grows and is pruned, not all of the hypothesized haplotypes need to be calculated [0418]. if certain nodes end up having small probabilities, as compared to the true node, it is because they are not supported by a majority of the reads, e.g., in the pileup, and thus, these nodes may be pruned off, thereby discarding the nodes of low probabilities, but in a manner that preserves the true node(s) … once the event one position has been determined, the next event position may be determined, and the processes herein described may then be repeated for that new position with respect to any of the surviving nodes that have not heretofore been pruned. [0425-0426]; reading on limitation of identifying one or more candidate variants based on the second graph representation; determining if the one or more candidate variants were identified by a previously-iterated second graph representation; and when the one or more candidate variants were previously identified that the set of genomic variants is not to be updated to include the previously identified one or more candidate variants.
In KSR Int 'l v. Teleflex, the Supreme Court, in rejecting the rigid application of the teaching, suggestion, and motivation test by the Federal Circuit, indicated that “The principles underlying [earlier] cases are instructive when the question is whether a patent claiming the combination of elements of prior art is obvious. When a work is available in one field of endeavor, design incentives and other market forces can prompt variations of it, either in the same field or a different one. If a person of ordinary skill can implement a predictable variation, § 103 likely bars its patentability.” KSR Int'l v. Teleflex lnc., 127 S. Ct. 1727, 1740 (2007).
Applying the KSR standard to Ashutosh, Van Rooyen and Peng, the examiner concludes that the combination of Ashutosh, Van Rooyen and Peng, represents the use of known techniques to improve similar methods. Ashutosh, Van Rooyen and Peng use De Bruijn graph representations for sequencing data. Ashutosh discloses the specifics of library preparation and sequencing of genomic data and using such data in constructing De Bruijn Graph. In the same field of research, Van Rooyen and Peng provided the specifics of the De Bruijn Graph representations. Specifically, Van Rooyen provided the first and second graph representations and Peng provided the known conditional iteration of representation. Combining the sequencing and graphical representation of Ashutosh with representations of Van Rooyen and known representation’s conditional termination of Peng would have handled sequencing errors by resolving repeats and reducing disassembly and iteratively pruning bubbles. One ordinary skilled in the art before he effective filing data of the claimed invention would have had a reasonable expectation of success at combining the method of Ashutosh, Van Rooyen and Peng more accurate identification of complex variants. Therefore, the invention would have been prima facie obvious to one of skill in the art before the effective filing date of the claimed invention, absent evidence to the contrary.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GHAZAL SABOUR whose telephone number is (703)756-1289. The examiner can normally be reached M-F 7:30-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Larry D. Riggs can be reached at (571) 270-3062. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/G.S./ Examiner, Art Unit 1686
/LARRY D RIGGS II/ Supervisory Patent Examiner, Art Unit 1686