Office Action Analysis: 17697301 — SYSTEMS AND METHODS FOR GENERATING GRAPH REFERENCES

Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 08/05/2022, 08/05/2022, and 02/26/2023 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Status
Claims 1-38 are pending.
Claims 1-38 are rejected.

Priority
	The instant application claims benefit of priority to U.S. Provisional Application 63/162,400 filed Mach 17, 2021. As such, the effective filing date of claims 1-38 is 03/17/2021.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-38 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract ideas without significantly more. The claims recite a method, system and CRM for generating a graph reference construct using variant information from genomic information. The judicial exception is not integrated into a practical application because while claims 1-38 attempt to integrate the exception into a practical application, said application is either generically recited computer elements that do not add a meaningful limitation to the abstract idea or they are insignificant extra solution activities and simply implementing the abstract idea on a computer. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the computer elements only store and retrieve information in memory as well as perform basic calculations that are known to be well-understood, routine and conventional computer functions as recognized by the decisions in MPEP § 2105.05(d).

Framework with which to Analyze Subject Matter Eligibility:
Step 1: Are the claims directed to a category of statutory subject matter (a process, machine, manufacture, or composition of matter)? [see MPEP § 2106.03]
Claims are directed to statutory subject matter, specifically a method (claims 1-36), system (claim 37), and CRM (claim 38).

Additionally, claims 2, 6, 8, 10-11, 13-17, 24-25, 27, and 32 are contingent limitations, with MPEP § 2111.04 providing the following direction, The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met. While the claims are contingent, examiner has provided examination of the claims in order to facilitate compact prosecution.

Step 2A Prong One: Do the claims recite a judicially recognized exception, i.e., an abstract idea, a law of nature, or a natural phenomenon? [see MPEP § 2106.04(a)]
	The claims herein recite abstract ideas, mental processes and mathematical concepts.
With respect to the Step 2A Prong One evaluation, the instant claims are found herein to recite abstract ideas that fall into the grouping of mental processes and mathematical concepts.
Claim 1: Generating a graph reference construct, filtering variants, identifying a first subset of variants, and identifying a filtered set of variants from the first subset are processes of calculating, comparing/contrasting, and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 2: Determining whether a first length exceeds a threshold, and excluding variants based on said threshold are processes of calculating, comparing/contrasting, and selecting information that can be performed with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 3: The structural variant being an insertion, and determining whether the first length is at least 5000 bp are processes of calculating, and comparing/contrasting information that can be performed with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 4: The structural variant being a deletion, and determining whether the first length is at least 90000 bp are processes of calculating, and comparing/contrasting information that can be performed with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 5: Aligning the first structural variant to the reference sequence is a process of comparing/contrasting information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 6: Determining whether the reference sequence includes a subsequence, and excluding the first structural variant if it does are processes of comparing/contrasting and identifying that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 7: Aligning the first structural variant to one or more variants is a process of comparing/contrasting information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 8: Determining whether a second structural variant includes a subsequence, and excluding the first structural variant or second structural variant if it does are processes of comparing/contrasting and identifying that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 9: Aligning the first structural variant to a decoy sequence is a process of comparing/contrasting information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 10: Determining whether a decoy sequence includes a subsequence, and masking the decoy sequence if it does are processes of comparing/contrasting and identifying that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 11: Determining whether the reference sequence construct includes a subsequence, and excluding the first structural variant if it does are processes of comparing/contrasting and identifying that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 12: Determining whether the first subsequence has a length that is greater than a second specified threshold is a process of comparing/contrasting information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 13: Determining whether a second structural variant includes a second subsequence, and excluding one of the first or second structural variant if it does are processes of comparing/contrasting and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 14: Determining whether the second subsequence has a length greater than a second threshold is a process of comparing/contrasting information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 15: The second specified threshold being at least 150 bp is merely further limiting the data itself which is an abstract idea, specifically a mental process.
Claim 16: Identifying the shortest variant, and excluding the shortest variant are processes of comparing/contrasting, identifying, and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 17: Determining whether a decoy sequence associated with the reference sequence construct includes a third sequence, and masking the decoy sequence if it does are processes of comparing/contrasting, identifying, and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 18: Generating an initial graph reference construct is a process of calculating that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 19: Generating a plurality of graph reads using the initial graph reference construct is a process of comparing/contrasting and calculating information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 20: Generating the first and second subsets of graph reads are processes of comparing/contrasting, identifying, selecting, and calculating information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 21: Traversing the initial graph reference construct with a sliding window with a skip is a process of selecting and identifying information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 22: Aligning at least some of the graph reads to the initial graph reference construct, determining a quality of alignment, and determining whether that quality exceeds a threshold are processes of comparing/contrasting and calculating that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 23: Identifying a first group of the at least some of the plurality of graph reads, is a process of identifying information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 24: The first group including a first and second graph read is merely further limiting the data itself which is an abstract idea, specifically a mental process. Excluding at least one multiply-alignable variant from the filtered set when the quality of a first of second alignment does not exceed a threshold is a process of identifying and selecting information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 25: Including the multiply-alignable variant in the first combination is merely further limiting the data itself which is an abstract idea, specifically a mental process.
Claim 26: Generating an initial graph reference construct, traversing the initial graph reference construct, aligning the plurality of graph reads to the initial graph reference, and excluding at least some of the first set of variants based on the quality of alignment, are processes of comparing/contrasting, identifying, selecting, and calculating information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 27: The one or more graph reads being associated with a same combination of first subset of variants is merely further limiting the data itself which is an abstract idea, specifically a mental process. Determining whether the qualities of alignment are above or below a threshold, and excluding at least one variant if they are below are processes of comparing/contrasting, identifying, and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes. 
Claim 28: Processing some of the alternative sequences, aligning the first alternative sequences to the reference sequence, identifying one or more difference between the first alternative sequence and the reference, and including at least some of the one or more differences as first variants, are processes of comparing/contrasting, calculating, identifying, and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 29: Constructing an updated reference sequence to not include the alternative sequences is a process of selecting/updating information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 30: Obtaining an alternative aligned position for the inverted sequence patch is a process of comparing/contrasting, selecting, and calculating information that can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 31: Normalizing the first variants with respect to the reference sequence is a verbal articulation of a mathematical process and is therefore and abstract idea, specifically a mathematical concept.
Claim 32: The differences including first and second differences, the first difference associated with a first subsequence of the first alternative sequence and the second difference associated with a second subsequence of the reference sequence construct are merely further limiting the data itself which is an abstract idea, specifically a mental process. Processing the first and second differences, determining whether the first subsequence includes one or more regions that are included in the second subsequence, and removing the one or more regions if they are, are processes of comparing/contrasting, identifying, and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 33: The first and second differences comprising insertion and deletion events, is merely further limiting the data itself, which is an abstract idea, specifically a mental process.
Claim 34: Including second variants in the plurality of variants is merely further limiting the data itself, which is an abstract idea, specifically a mental process.
Claim 35: Annotating second variants with information is a process of appending information which can be done with pen and paper or within the human mind and is therefore an abstract idea, specifically a mental process.
Claim 36: Averaging the first and second allele frequencies to obtain an averaged allele frequency is a verbal articulation of a mathematical process and is therefore and abstract idea, specifically a mathematical concept.
Claim 37: Generating a graph reference construct, filtering variants, identifying a first subset of variants, and identifying a filtered set of variants from the first subset are processes of calculating, comparing/contrasting, and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.
Claim 38: Generating a graph reference construct, filtering variants, identifying a first subset of variants, and identifying a filtered set of variants from the first subset are processes of calculating, comparing/contrasting, and selecting information that can be done with pen and paper or within the human mind and are therefore abstract ideas, specifically mental processes.

Step 2A Prong Two: If the claims recite a judicial exception under prong one, then is the judicial exception integrated into a practical application? [see MPEP § 2106.04(d) and MPEP §
2106.05(a)-(c) & (e)-(h)]
Because the claims do recite judicial exceptions, direction under Step 2A Prong Two provides that the claims must be examined further to determine whether they integrate the abstract ideas into a practical application.

The following claims recite the following additional elements in the form of non-abstract elements:
Claim 1: A computing device is a generic and nonspecific element of computers that does not improve the functioning of any computer or technology described herein [See MPEP § 2106.04(d)(1) and MPEP § 2106.05(d)]. Obtaining a plurality of variants associated with a reference sequence construct is an insignificant extra solution activity, specifically necessary data gathering (See Performing clinical tests on individuals to obtain input for an equation, In re Grams, 888 F.2d 835, 839-40; 12 USPQ2d 1824, 1827-28 (Fed. Cir. 1989) and Determining the level of a biomarker in blood, Mayo, 566 U.S. at 79, 101 USPQ2d at 1968. See also PerkinElmer, Inc. v. Intema Ltd., 496 Fed. App'x 65, 73, 105 USPQ2d 1960, 1966 (Fed. Cir. 2012) (assessing or measuring data derived from an ultrasound scan, to be used in a diagnosis)) [See MPEP § 2106.05(g)].
Claim 34: Obtaining second variants associated with the reference sequence construct is an insignificant extra solution activity, specifically necessary data gathering (See Performing clinical tests on individuals to obtain input for an equation, In re Grams, 888 F.2d 835, 839-40; 12 USPQ2d 1824, 1827-28 (Fed. Cir. 1989) and Determining the level of a biomarker in blood, Mayo, 566 U.S. at 79, 101 USPQ2d at 1968. See also PerkinElmer, Inc. v. Intema Ltd., 496 Fed. App'x 65, 73, 105 USPQ2d 1960, 1966 (Fed. Cir. 2012) (assessing or measuring data derived from an ultrasound scan, to be used in a diagnosis)) [See MPEP § 2106.05(g)].
Claim 37: A system, computer hardware processor, non-transitory computer readable storage medium, and instructions are generic and nonspecific elements of computers that do not improve the functioning of any computer or technology described herein [See MPEP § 2106.04(d)(1) and MPEP § 2106.05(d)]. Obtaining a plurality of variants associated with a reference sequence construct is an insignificant extra solution activity, specifically necessary data gathering (See Performing clinical tests on individuals to obtain input for an equation, In re Grams, 888 F.2d 835, 839-40; 12 USPQ2d 1824, 1827-28 (Fed. Cir. 1989) and Determining the level of a biomarker in blood, Mayo, 566 U.S. at 79, 101 USPQ2d at 1968. See also PerkinElmer, Inc. v. Intema Ltd., 496 Fed. App'x 65, 73, 105 USPQ2d 1960, 1966 (Fed. Cir. 2012) (assessing or measuring data derived from an ultrasound scan, to be used in a diagnosis)) [See MPEP § 2106.05(g)].
Claim 38: A non-transitory computer readable storage medium, instructions, and computer hardware processor are generic and nonspecific elements of computers that do not improve the functioning of any computer or technology described herein [See MPEP § 2106.04(d)(1) and MPEP § 2106.05(d)].Obtaining a plurality of variants associated with a reference sequence construct is an insignificant extra solution activity, specifically necessary data gathering (See Performing clinical tests on individuals to obtain input for an equation, In re Grams, 888 F.2d 835, 839-40; 12 USPQ2d 1824, 1827-28 (Fed. Cir. 1989) and Determining the level of a biomarker in blood, Mayo, 566 U.S. at 79, 101 USPQ2d at 1968. See also PerkinElmer, Inc. v. Intema Ltd., 496 Fed. App'x 65, 73, 105 USPQ2d 1960, 1966 (Fed. Cir. 2012) (assessing or measuring data derived from an ultrasound scan, to be used in a diagnosis)) [See MPEP § 2106.05(g)].

Step 2B: If the claims do not integrate the judicial exception, do the claims provide an inventive concept? [see MPEP § 2106.05]
Because the additional claim elements do not integrate the abstract idea into a practical application, the claims are further examined under Step 2B, which evaluates whether the additional elements, individually and in combination, amount to significantly more than the judicial exception itself by providing an inventive concept.

The claims do not recite additional elements that are sufficient to amount to significantly more than the judicial exception because the claims recite additional elements that are generic, conventional, nonspecific or insignificant extra solution activity. These additional elements include:
The additional elements of a computing device, system, computer hardware processor, non-transitory computer readable storage medium, and instructions are all generic and nonspecific elements of a computer that are well-understood, routine and conventional within the art and therefore do not improve the functioning of any computer or technology described therein (Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information), Performing repetitive calculations, Flook, 437 U.S. at 594, 198 USPQ2d at 199 (recomputing or readjusting alarm limit values), and Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)) [See MPEP § 2106.05(d)(II)]. Therefore, taken both individually and as a whole, the additional elements do not amount to significantly more than the judicial exception by providing an inventive concept.
The additional elements of obtaining a plurality of variants associated with a reference sequence construct (Conventional: Specification Page10, Line 26-31), and obtaining second variants associated with the reference sequence construct (Conventional: Specification Page10, Line 26-31), are insignificant extra solution activities specifically, mere data gathering (See Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information), TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission), OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network), buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network), and Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015)) [See MPEP § 2106.05(g)]. Therefore, taken both individually and as a whole, the additional elements do not amount to significantly more than the judicial exception by providing an inventive concept.

Therefore, claims 1-38, when the limitations are considered individually and as a whole, are rejected under 35 USC § 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 2-36 are contingent limitations, with MPEP § 2111.04 providing the following direction, The broadest reasonable interpretation of a method (or process) claim having contingent limitations requires only those steps that must be performed and does not include steps that are not required to be performed because the condition(s) precedent are not met. While the claims are contingent, examiner has provided examination of the claims in order to facilitate compact prosecution.

Claims 1, 5, 7, 18-21, 28-29, 31 and 34-38 are rejected under 35 U.S.C. 103 as being unpatentable over Herman et al. (BMC Bioinformatics (2015) 1-26), Eizenga et al. (Annual Review Genomics and Human Genetics (2020) 139-162), Rakocevic et al. (Nat Genet (2019) 354-362), and De Summa et al. (BMC Bioinformatics (2017) 57-65).
Claim 1 is directed to a method for generating a graph reference construct from a plurality of variants and reference sequence through filtering.
Claim 37 is directed to a system for generating a graph reference construct from a plurality of variants and reference sequence through filtering.
Claim 38 is directed to a CRM for generating a graph reference construct from a plurality of variants and reference sequence through filtering.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, on page 2, column 1, paragraph 3 “A common approach to tackling the issue of alignment uncertainty has been to attempt to annotate particular regions of the alignment as unreliable, and to remove these before carrying out subsequent analysis. Filtering methods have in some cases been observed to yield improved inference for phylogenies and positive selection”, and on page 23, column 2, paragraph 2 “Java software implementing the minimum-risk alignment summary algorithm and computation of marginal topology probabilities is available for download at http://statalign.github.io/WeaveAlign. A platform-independent jar archive containing version 1.2.1 of WeaveAlign is included in Additional file 2, along with datasets and example results”, which inherently reads on the generic computer components and CRM.
Eizenga et al. teaches on page 143 in paragraph 1 “Methods to construct pangenomic data structures mirror the classes of pangenomic models. A pangenome may simply be a collection of sequences, in which case construction is similar to the genome or metagenome assembly problem, or it may include information about the alignment of sequences or genomes within it… A pangenome can be represented as a collection of sequences… Rather than collecting unique sequences that represent a collection of genomes, we can consider small variants between the collection and a reference genome. Such a model directly implies a directed acyclic graph, ordered along the reference genome, with bubbles at the sites of variation. This pangenome construction approach is used in diverse graph genome read mappers, including GenomeMapper (124), Seven Bridges’ Graph Genome Pipeline (115), PanVC (Pan-Genomic Variant Calling) (139), and Gramtools (87). The VG (Variation Graph) toolkit, specifically VG construct (50), can be applied to transform VCF (Variant Call Format) files and reference sequences into genome graphs. Some methods, such as the journaled string tree (114), and methods based on elastic degenerate texts (12), such as SOPanG (Shift-Or for Pan-Genome) (29), transform the variant set and reference into a structure optimized for online sequence queries of the pangenome”.
Rakocevic et al. teaches on page 356, column 2, paragraph 2 “Graph Genome Pipeline (Supplementary Fig. 9) calls variants, including SVs, using a reassembly variant caller and variant call filters, as suggested previously”, in Figure 3 description on page 358 “We show BWA-GATK results with Hard Filtering…”, and on page 363 “The graph reference genome factors into variant calling in that: (1) k-mers present in the graph are prioritized during a k-mer filtering preprocessing step; (2) variants in the graph are given a higher prior probability. For the benchmarking experiments in the main text, we used the standard Graph Genome Pipeline that employs a set of previously proposed hard variant filters to filter raw variant calls. For the PrecisionFDA Truth Challenge benchmarking experiment only (Supplementary Fig. 14), we used a logistic regression model to filter variants based on the GATK best practices”, the description of the hard filtering process is given in De Summa et al. and allows for the individual to set filtering thresholds and decide exactly how they want to perform such steps and as such in view of the above, reads on obtaining a plurality of variants associated with a reference sequence construct for at least a portion of a genome; and generating the graph reference construct using the plurality of variants and the reference sequence construct, the generating comprising: filtering the plurality of variants to obtain a filtered set of variants, the filtered set of variants being a subset of the plurality of variants, the filtering comprising a plurality of filtering stages including a first filtering stage and a second filtering stage different from and performed subsequent to the first filtering stage, the first filtering stage comprising identifying a first subset of variants from among the plurality of variants at least in part by excluding one or more structural variants from the plurality of variants, the one or more structural variants including a first structural variant; the second filtering stage comprising identifying the filtered set of variants from among the first subset of variants at least in part by excluding one or more multiply-alignable variants from the first subset of variants; generating the graph reference construct using the filtered set of variants and the reference sequence construct; and outputting the generated graph reference construct.
It would have been obvious at the time of filing to modify the teachings of Herman et al. for the graph reference construct creation with the teachings of Eizenga et al. for the creation of a pan genome and filtering, with the teachings Rakocevic et al. for the exact type of filtering that one skilled in the art would want, as Eizenga et al. serves as a review of graph construct methods within the art and all other references are featured within its descriptions of various methods. Additionally, De Summa et al. is cited by Rakocevic et al. as the primary paper from which their filtering method stems. One would have had a reasonable expectation of success given that Herman et al. specifically says that filtering techniques are commonly used and can improve inference, with Eizenga et al. grouping the papers into similar methods for generating graph constructs or pan genomes, Rakocevic et al. shows an application of such methods using filtering techniques described in De Summa et al. in greater detail. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.

Claim 5 is directed to the method of claim 1 but further specifies the alignment of a structural variant to the reference construct.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eizenga et al. teaches on page 143, paragraph 1 “A pangenome may simply be a collection of sequences, in which case construction is similar to the genome or metagenome assembly problem, or it may include information about the alignment of sequences or genomes within it. This alignment could be compressed into a set of variants found against a set of reference sequences. If this alignment is based on k-mers, then it implies a de Bruijn graph. If it is a complete, gapped alignment, covering small and large variation, then the pangenome model can be thought of as a whole-genome alignment”, reading on wherein identifying the first subset of variants from among the plurality of variants comprises: aligning the first structural variant to the reference sequence construct.

Claim 7 is directed to the method of claim 1 but further specifies the alignment of a subset of variants to one or more variants where the variants differ.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Herman et al. teaches on page 3, column 2, paragraph 3 “Once a set of plausible alignments has been generated, a common issue that arises is how to represent and/or summarize this set in a useful fashion. In a Bayesian context this entails representing the approximation to the posterior distribution over alignments, given a collection of samples. We shall present here a graph-based formulation that allows for a compact representation of this distribution”, and in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities”, it would be inherent then that any alignment of variants to the DAG would necessarily be an alignment of variants to variants that differ from each other, thereby reading on wherein identifying the first subset of variants from among the plurality of variants comprises: aligning the first structural variant to one or more variants of the plurality of variants, the one or more variants being different from the first structural variant.

Claim 18 is directed to the method of claim 1 but further specifies using the subset of variants to generate an initial graph reference.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, on page 2, column 1, paragraph 3 “A common approach to tackling the issue of alignment uncertainty has been to attempt to annotate particular regions of the alignment as unreliable, and to remove these before carrying out subsequent analysis. Filtering methods have in some cases been observed to yield improved inference for phylogenies and positive selection”, and on page 23, column 2, paragraph 2 “Java software implementing the minimum-risk alignment summary algorithm and computation of marginal topology probabilities is available for download at http://statalign.github.io/WeaveAlign. A platform-independent jar archive containing version 1.2.1 of WeaveAlign is included in Additional file 2, along with datasets and example results”, which inherently reads on the generic computer components and CRM.
Eizenga et al. teaches on page 143 in paragraph 1 “Methods to construct pangenomic data structures mirror the classes of pangenomic models. A pangenome may simply be a collection of sequences, in which case construction is similar to the genome or metagenome assembly problem, or it may include information about the alignment of sequences or genomes within it… A pangenome can be represented as a collection of sequences… Rather than collecting unique sequences that represent a collection of genomes, we can consider small variants between the collection and a reference genome. Such a model directly implies a directed acyclic graph, ordered along the reference genome, with bubbles at the sites of variation. This pangenome construction approach is used in diverse graph genome read mappers, including GenomeMapper (124), Seven Bridges’ Graph Genome Pipeline (115), PanVC (Pan-Genomic Variant Calling) (139), and Gramtools (87). The VG (Variation Graph) toolkit, specifically VG construct (50), can be applied to transform VCF (Variant Call Format) files and reference sequences into genome graphs. Some methods, such as the journaled string tree (114), and methods based on elastic degenerate texts (12), such as SOPanG (Shift-Or for Pan-Genome) (29), transform the variant set and reference into a structure optimized for online sequence queries of the pangenome”.
Rakocevic et al. teaches on page 356, column 2, paragraph 2 “Graph Genome Pipeline (Supplementary Fig. 9) calls variants, including SVs, using a reassembly variant caller and variant call filters, as suggested previously”, in Figure 3 description on page 358 “We show BWA-GATK results with Hard Filtering…”, and on page 363 “The graph reference genome factors into variant calling in that: (1) k-mers present in the graph are prioritized during a k-mer filtering preprocessing step; (2) variants in the graph are given a higher prior probability. For the benchmarking experiments in the main text, we used the standard Graph Genome Pipeline that employs a set of previously proposed hard variant filters to filter raw variant calls. For the PrecisionFDA Truth Challenge benchmarking experiment only (Supplementary Fig. 14), we used a logistic regression model to filter variants based on the GATK best practices”, the description of the hard filtering process is given in De Summa et al. and allows for the individual to set filtering thresholds and decide exactly how they want to perform such steps and as such in view of the above, reads on wherein identifying the filtered set of variants from among the first subset of variants comprises: generating an initial graph reference construct using at least some of the first subset of variants.

Claim 19 is directed to the method of claim 18 and thus claim 1, but further specifies generating graph reads that are associated with paths of the graph reference.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, reading on wherein identifying the filtered set of variants from among the first subset of variants further comprises: generating a plurality of graph reads using the initial graph reference construct, wherein each of at least some of the plurality of graph reads are associated with a respective path in the initial graph reference construct.

Claim 20 is directed to the method of claim 19 and thus claim 1, but further specifies generating graph reads over intervals.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Rakocevic et al. teaches on page 363, column 1, paragraph 8, “We use a sliding search window approach to locate substantial spatial clusters of loci belonging to an aggregate of k-mer lists determined for the read. These clusters represent candidate match regions. We allow for gaps between loci”, reading on wherein the plurality of graph reads comprise a first subset of graph reads and a second subset of graph reads, and wherein generating the plurality of graph reads comprises: generating the first subset of graph reads by traversing the initial graph reference construct over a first interval; and generating the second subset of graph reads by traversing the initial graph reference construct over a second interval, wherein the first interval and the second interval at least partially overlap.

Claim 21 is directed to the method of claim 19 and thus claim 1, but further specifies traversing the graph reference using a sliding window with a skip.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Rakocevic et al. teaches on page 363, column 1, paragraph 8, “We use a sliding search window approach to locate substantial spatial clusters of loci belonging to an aggregate of k-mer lists determined for the read. These clusters represent candidate match regions. We allow for gaps between loci”, reading on wherein generating the plurality of graph reads comprises traversing the initial graph reference construct using a sliding window with a skip.

Claim 28 is directed to the method of claim 1 but further specifies aligning sequences, identifying differences, and including those differences as variants.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, on page 2, column 1, paragraph 3 “A common approach to tackling the issue of alignment uncertainty has been to attempt to annotate particular regions of the alignment as unreliable, and to remove these before carrying out subsequent analysis. Filtering methods have in some cases been observed to yield improved inference for phylogenies and positive selection”, and on page 23, column 2, paragraph 2 “Java software implementing the minimum-risk alignment summary algorithm and computation of marginal topology probabilities is available for download at http://statalign.github.io/WeaveAlign. A platform-independent jar archive containing version 1.2.1 of WeaveAlign is included in Additional file 2, along with datasets and example results”, which inherently reads on the generic computer components and CRM.
Eizenga et al. teaches on page 143 in paragraph 1 “Methods to construct pangenomic data structures mirror the classes of pangenomic models. A pangenome may simply be a collection of sequences, in which case construction is similar to the genome or metagenome assembly problem, or it may include information about the alignment of sequences or genomes within it… A pangenome can be represented as a collection of sequences… Rather than collecting unique sequences that represent a collection of genomes, we can consider small variants between the collection and a reference genome. Such a model directly implies a directed acyclic graph, ordered along the reference genome, with bubbles at the sites of variation. This pangenome construction approach is used in diverse graph genome read mappers, including GenomeMapper (124), Seven Bridges’ Graph Genome Pipeline (115), PanVC (Pan-Genomic Variant Calling) (139), and Gramtools (87). The VG (Variation Graph) toolkit, specifically VG construct (50), can be applied to transform VCF (Variant Call Format) files and reference sequences into genome graphs. Some methods, such as the journaled string tree (114), and methods based on elastic degenerate texts (12), such as SOPanG (Shift-Or for Pan-Genome) (29), transform the variant set and reference into a structure optimized for online sequence queries of the pangenome”.
Rakocevic et al. teaches on page 356, column 2, paragraph 2 “Graph Genome Pipeline (Supplementary Fig. 9) calls variants, including SVs, using a reassembly variant caller and variant call filters, as suggested previously”, in Figure 3 description on page 358 “We show BWA-GATK results with Hard Filtering…”, and on page 363 “The graph reference genome factors into variant calling in that: (1) k-mers present in the graph are prioritized during a k-mer filtering preprocessing step; (2) variants in the graph are given a higher prior probability. For the benchmarking experiments in the main text, we used the standard Graph Genome Pipeline that employs a set of previously proposed hard variant filters to filter raw variant calls. For the PrecisionFDA Truth Challenge benchmarking experiment only (Supplementary Fig. 14), we used a logistic regression model to filter variants based on the GATK best practices”, and it would be inherent to any alignment of a pangenome, particularly those represented as a DAG, that alternative sequences would in fact be variants thereby reading on wherein obtaining the plurality of variants comprises: obtaining a plurality of alternative sequences associated with the reference sequence construct; processing at least some of the plurality of alternative sequences, the processing comprising, for a first alternative sequence of the plurality of alternative sequences: aligning the first alternative sequence to the reference sequence construct to obtain an aligned position; identifying one or more differences between the first alternative sequence and the reference sequence construct at the aligned position; and including at least some of the one or more differences as first variants in the plurality of variants.

Claim 29 is directed to the method of claim 28 and thus claim 1, but further specifies constructing an updated reference that does not include alternative sequences.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Rakocevic et al. teaches on page 356, column 2, paragraph 2 “Graph Genome Pipeline (Supplementary Fig. 9) calls variants, including SVs, using a reassembly variant caller and variant call filters, as suggested previously”, in Figure 3 description on page 358 “We show BWA-GATK results with Hard Filtering…”, and on page 363 “The graph reference genome factors into variant calling in that: (1) k-mers present in the graph are prioritized during a k-mer filtering preprocessing step; (2) variants in the graph are given a higher prior probability. For the benchmarking experiments in the main text, we used the standard Graph Genome Pipeline that employs a set of previously proposed hard variant filters to filter raw variant calls. For the PrecisionFDA Truth Challenge benchmarking experiment only (Supplementary Fig. 14), we used a logistic regression model to filter variants based on the GATK best practices”, and it would be inherent to any preprocessing that included filtering that any filtered sequences would not be used to construct the reference thereby reading on further comprising, after processing the at least some of the plurality of alternative sequences, constructing an updated reference sequence construct that does not include the plurality of alternative sequences.

Claim 31 is directed to the method of claim 28 and thus claim 1, but further specifies normalizing the variants.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Herman et al. teaches on page 12, column 2, paragraph 3 “One other class of loss function worth mentioning here is the so-called modeller version of each of the aforementioned scores, Lmf(A||A’), which involve normalizing Lf (A||A’) by the length of the predicted alignment”, reading on further comprising left normalizing the first variants with respect to the reference sequence construct before including the first variants in the plurality of variants.

Claim 34 is directed to the method of claim 28 and thus claim 1, but further specifies obtaining additional variants.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eizenga et al. teaches in Figure 1 the representation of multiple variants within the graph structure “In a de Bruijn graph, sequences are represented without bias, but variants may correspond to larger graph structures. (ii) An acyclic sequence graph is equivalent to the multiple sequence alignment. (iii) A generic sequence graph can compactly represent a structural variant (shown in orange), using edges between the forward and reverse strands of the graph to indicate the presence of an inversion”, reading on wherein obtaining the plurality of variants further comprises: obtaining second variants associated with the reference sequence construct; and including the second variants in the plurality of variants.

Claim 35 is directed to the method of claim 34 and thus claim 1, but further specifies annotating the variants with information.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eizenga et al. teaches on page 142, paragraph 5 “Variation graphs further structure this model by embedding the linear sequences of the pangenome as paths (50). [Variation graphs are similar to the variant graphs used in textual research tomodel a collection of revisions of the same text (123).] Paths provide a stable coordinate system that is unaffected by the manner in which the graph was built, thus supporting the coordination of positions, annotations, and alignments between variation graphs and linear reference genomes”, reading on further comprising annotating the second variants with information indicative of sources of the second variants.

	Claim 36 is directed to the method of claim 34 and thus claim 1, but further specifies variants having allele frequencies and calculating an average allele frequency.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Rackocevic et al. teaches in Figure 1, section C, both individual and combined allele frequencies, reading on wherein at least some of the first variants are associated respectively with first allele frequencies and at least some of the second variants are associated respectively with second allele frequencies; and further comprising, for a shared variant included in both the at least some of the first variants and the at least some of the second variants, averaging the first and second allele frequencies associated with the shared variant to obtain an averaged allele frequency.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Herman et al. (BMC Bioinformatics (2015) 1-26), Eizenga et al. (Annual Review Genomics and Human Genetics (2020) 139-162), Rakocevic et al. (Nat Genet (2019) 354-362), and De Summa et al. (BMC Bioinformatics (2017) 57-65) as applied to claims 1, and 37-38 above, and further in view of Duan et al. (Genome Biology (2019) 1-11).
Claim 9 is directed to the method of claim 1 but further specifies alignment of variants to a decoy sequence.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Duan et al. teaches on page 8, column 2, paragraph 2 “We explored the repetitive elements of these sequences by RepeatMasker and compared them with that of reference genome (both the primary assembly sequences and decoy sequences (hs38d1)) to characterize the compositions of repetitive sequences in fully unaligned sequences. Finally, we aligned these fully unaligned sequences to the patch sequence, alternative loci and decoy sequences (hs38d1) as well as existing assembled individual genomes to determinate whether the fully unaligned sequences could be identified in other individuals”, reading on wherein identifying the first subset of variants from among the plurality of variants comprises: aligning the first structural variant to a decoy sequence associated with the reference sequence construct.
It would have been obvious at the time of first filing to have modified the teachings of Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. for the method of claims 1, and 37-38, with the teachings of Duan et al. for alignment of variants to decoy sequences as the latter provides a pipeline for the generation of a pangenome as well as states in the conclusion “HUPAN is a useful tool for capturing complexity of the human genome, and the constructed pangenome can be an important resource for a wide range of human genome-related biomedical studies”. One would have had a reasonable expectation of success given that the software is freely available and it is merely adding a known technique (decoy alignment) into an existing process (filtering). Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.

Claims 2-4, 6, 8, 11-16, 22-27, and 32-33 are rejected under 35 U.S.C. 103 as being unpatentable over Herman et al. (BMC Bioinformatics (2015) 1-26), Eizenga et al. (Annual Review Genomics and Human Genetics (2020) 139-162), Rakocevic et al. (Nat Genet (2019) 354-362), and De Summa et al. (BMC Bioinformatics (2017) 57-65) as applied to claim 1, and 37-38 above, and further in view of Eren et al. (PloS one (2013) 1-6).
Claim 2 is directed to the method of claim 1 but specifies the determining of a length of the variant and comparing it with a threshold to determine whether or not to remove said variant.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Rakocevic et al. teaches on page 356, column 2, paragraph 2 “Graph Genome Pipeline (Supplementary Fig. 9) calls variants, including SVs, using a reassembly variant caller and variant call filters, as suggested previously”, in Figure 3 description on page 358 “We show BWA-GATK results with Hard Filtering…”, and on page 363 “The graph reference genome factors into variant calling in that: (1) k-mers present in the graph are prioritized during a k-mer filtering preprocessing step; (2) variants in the graph are given a higher prior probability. For the benchmarking experiments in the main text, we used the standard Graph Genome Pipeline that employs a set of previously proposed hard variant filters to filter raw variant calls. For the PrecisionFDA Truth Challenge benchmarking experiment only (Supplementary Fig. 14), we used a logistic regression model to filter variants based on the GATK best practices”, the description of the hard filtering process is given in De Summa et al. and allows for the individual to set filtering thresholds and decide exactly how they want to perform such steps. 
Eren et al. teaches on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”, reading on determining whether a first length of the first structural variant exceeds a first specified threshold; and upon determining that the first length exceeds the first specified threshold, excluding the first structural variant from the plurality of variants.
It would be obvious at the time of first filing to modify the teachings of Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. for the method of claim 1 with the teachings of Eren et al. for the use of length in filtering as the latter teaches in the abstract “Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes”. One would have a reasonable expectation of success given that the former references allow for filtering using any method the user chooses, and this is just another method. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.

Claim 3 is directed to the method of claim 2 and thus claim 1, but further specifies the use of a hard threshold of 5000 bp.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Rakocevic et al. teaches in the abstract “Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels)”.
Eren et al. teaches on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”. It would have been obvious at the time of first filing to optimize the length threshold and to try multiple filters, as the user can implement filtering based on whichever criteria they wish, based on many parameters governing sequence alignment, including length, to optimize the graph reference construct being generated.

Claim 4 is directed to the method of claim 2 and thus claim 1, but further specifies the use of a hard threshold of 90000 bp.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Rakocevic et al. teaches in the abstract “Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels)”.
Eren et al. teaches on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”. It would have been obvious at the time of first filing to optimize the length threshold and to try multiple filters, as the user can implement filtering based on whichever criteria they wish, based on many parameters governing sequence alignment, including length, to optimize the graph reference construct being generated.

Claim 6 is directed to the method of claim 1 but further specifies the matching subsequences and the potential dropping if the subsequences match.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, reading on wherein identifying the first subset of variants from among the plurality of variants comprises: determining whether the reference sequence construct includes a subsequence, wherein the subsequence is identical to at least a portion of the first structural variant; and upon determining that the reference sequence construct includes the subsequence, excluding the first structural variant from the plurality of variants.

Claim 8 is directed to the method of claim 1 but further specifies the matching subsequences and the potential dropping if the subsequences match.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, reading on wherein identifying the first subset of variants from among the plurality of variants comprises: determining whether a second structural variant includes a subsequence, wherein the subsequence is identical to at least a portion of the first structural variant; and upon determining that the second structural variant includes the subsequence, excluding one of the first structural variant or the second structural variant from the plurality of variants.

Claim 11 is directed to the method of claim 1 but further specifies determining inclusion of subsequences and removal of variants if subsequences are found.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, reading on wherein identifying the first subset of variants from among the plurality of variants further comprises, upon determining that the first length does not exceed the first specified threshold: determining whether the reference sequence construct includes a first subsequence, wherein the first subsequence is identical to at least a first portion of the first structural variant; and upon determining that the reference sequence construct includes the first subsequence, excluding the first structural variant from the plurality of variants.

Claim 12 is directed to the method of claim 11 and thus claim 1, but further specifies determining whether sequence length is greater than a threshold.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”, reading on wherein determining whether the reference sequence construct includes the first subsequence comprises determining whether the first subsequence has a length that is greater than a second specified threshold.

Claim 13 is directed to the method of claim 11 and thus claim 1, but further specifies determining the presence of a second subsequence and filtering it out if it does.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, reading on further comprising: upon determining that the reference sequence construct does not include the first subsequence, determining whether a second structural variant includes a second subsequence, wherein the second subsequence is identical to at least a second portion of the first structural variant; and upon determining that the second structural variant includes the second subsequence, excluding one of the first structural variant or the second structural variant from the plurality of variants.

Claim 14 is directed to the method of claim 13 and thus claim 1, but further specifies determining if the second subsequence had a length greater than a specified threshold.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…” reading on wherein determining whether the second structural variant includes the second subsequence comprises determining whether the second subsequence has a length that is greater than the second specified threshold.

Claim 15 is directed to the method of claim 14 and thus claim 1, but further specifies that the threshold is at least 150 bp.
Eren et al. teaches on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”. It would have been obvious at the time of first filing to optimize the length threshold and to try multiple filters, as the user can implement filtering based on whichever criteria they wish, based on many parameters governing sequence alignment, including length, to optimize the graph reference construct being generated.

Claim 16 is directed to the method of claim 13 and thus claim 1, but further specifies identifying the shortest variant and excluding it.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, and on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”, reading on wherein excluding one of the first structural variant or the second structural variant from the plurality of variants comprises: identifying a shortest variant from among the first structural variant and the second structural variant; and excluding the shortest variant from the plurality of variants.

Claim 22 is directed to the method of claim 19 and thus claim 1, but further specifies determining whether a quality of alignment between the read and reference exceeds a threshold.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, and on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”, reading on further comprising aligning at least some of the plurality of graph reads to the initial graph reference construct, the aligning comprising, for each graph read of the at least some of the plurality of graph reads: determining a quality of alignment between the graph read and the graph reference construct; and determining whether the quality of alignment exceeds a threshold.

Claim 23 is directed to the method of claim 22 and thus claim 1, but further specifies identifying reads that contain one or more variants.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, reading on further comprising identifying a first group of the at least some of the plurality of graph reads, wherein each graph read included in the first group of the at least some of the plurality of graph reads includes a first combination of one or more variants of the first subset of variants.

Claim 24 is directed to the method of claim 23 and thus claim 1, but further specifies determining quality of alignments for graph reads, comparing them to thresholds and excluding them if they exceed said thresholds.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, and on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, which in view of the above reads on wherein the first group of the at least some of the plurality of graph reads includes a first graph read and a second graph read; and further comprising: upon determining that neither a first quality of alignment determined for the first graph read nor a second quality of alignment determined for the second graph read exceed the specified threshold, excluding at least one multiply-alignable variant from the filtered set of variants.

Claim 25 is directed to the method of claim 24 and thus claim but further specifies that a multiply-alignable variant be included in the variants.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, it is inherent that reads for a graph reference in a pangenome contain variants which are alignable to multiple paths of said graph, with each read containing variant(s) of the genetic sequence, thereby reading on wherein the at least one multiply-alignable variant is included in the first combination of the one or more variants. 

Claim 26 is directed to the method of claim 1 but further specifies the generation of a graph reference and the aligning of reads while excluding other reads depending on the quality of the alignment.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, and on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, which in view of the above reads on wherein identifying the filtered set of variants from among the first subset of variants comprises: generating an initial graph reference construct using the first subset of variants; traversing the initial graph reference construct to generate a plurality of graph reads; aligning the plurality of graph reads to the initial graph reference construct to determine qualities of alignment for each of at least some of the plurality of graph reads; and excluding at least some of the one or more of the first set variants from the second set of variants based on the qualities of alignment.

Claim 27 is directed to the method of claim 26 and thus claim 1, but further specifies the use of a threshold value for quality of alignment.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, and on page 2, column 2, paragraph 2 “To compare results of complete overlap quality filtering with quality-based filtering, we analyzed raw sequence data using two recently published methods that rely on Q-scores. We used the method described by Bokulich et al. with suggested parameters p= 0.75, q=3, r=3, and n = 0, where p defines the ratio of the trimmed read length relative to the original read length…”.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, which in view of the above reads on wherein one or more of the plurality of graph reads are associated with a same combination of one or more of the first subset of variants; and further comprising: determining whether each of the qualities of alignment determined for the one or more of the plurality of graph reads is below a specified threshold; and upon determining that each of the qualities of alignment is below the specified threshold, excluding at least one variant from the filtered set of variants.

Claim 32 is directed to the method of claim 28 and thus claim 1, but further specifies identifying differences between subsequences and if subsequences are shared removing them.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”, reading on wherein the at least some of the one or more differences include consecutive first and second differences, wherein the first difference is associated with a first subsequence of the first alternative sequence, and wherein the second difference is associated with a second subsequence of the reference sequence construct; and further comprising processing the first and second differences before including them as first variants in the plurality of variants, the processing comprising: determining whether the first subsequence includes one or more regions that are included in the second subsequence; and upon determining that the first subsequence includes the one or more regions that are included in the second subsequence, removing the one or more regions from both the first and second subsequences.

Claim 33 is directed to the method of claim 32 and thus claim 1, but further specifies that the differences comprise insertion and deletion events.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Rakocevic et al. teaches in the abstract “Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels)”.
Herman et al. teaches in the abstract “In this work we present a framework for representing a set of sampled alignments as a directed acyclic graph (DAG) whose nodes are alignment columns; each path through this DAG then represents a valid alignment. Since the probabilities of individual columns can be estimated from empirical frequencies, this approach enables sample-based estimation of posterior alignment probabilities. Moreover, due to conditional independencies between columns, the graph structure encodes a much larger set of alignments than the original set of sampled MSAs, such that the effective sample size is greatly increased”, which in view of the above reads on wherein the first and second differences respectively comprise insertion and deletion events.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Herman et al. (BMC Bioinformatics (2015) 1-26), Eizenga et al. (Annual Review Genomics and Human Genetics (2020) 139-162), Rakocevic et al. (Nat Genet (2019) 354-362), and De Summa et al. (BMC Bioinformatics (2017) 57-65) as applied to claims 1, and 37-38 above, and further in view of Eren et al. (PloS one (2013) 1-6) and Duan et al. (Genome Biology (2019) 1-11).
Claim 10 is directed to the method of claim 1 but further specifies determining subsequence similarity of a decoy sequence and subsequently masking it if it does.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”.
Duan et al. teaches on page 8, column 2, paragraph 2 “We explored the repetitive elements of these sequences by RepeatMasker and compared them with that of reference genome (both the primary assembly sequences and decoy sequences (hs38d1)) to characterize the compositions of repetitive sequences in fully unaligned sequences. Finally, we aligned these fully unaligned sequences to the patch sequence, alternative loci and decoy sequences (hs38d1) as well as existing assembled individual genomes to determinate whether the fully unaligned sequences could be identified in other individuals”, reading on wherein identifying a first subset of variants from among the plurality of variants comprises: determining whether a decoy sequence associated with the reference sequence construct includes a subsequence, wherein the subsequence is identical to at least a portion of the first structural variant; and upon determining that the decoy sequence includes the subsequence, masking the decoy sequence.
It would have been obvious at the time of first filing to have modified the teachings of Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. for the method of claims 1, and 37-38, with the teachings of Eren et al. for the use of length in filtering as the latter teaches in the abstract “Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes”, and the teachings of Duan et al. for alignment of variants to decoy sequences as the latter provides a pipeline for the generation of a pangenome as well as states in the conclusion “HUPAN is a useful tool for capturing complexity of the human genome, and the constructed pangenome can be an important resource for a wide range of human genome-related biomedical studies”. One would have had a reasonable expectation of success given that the software is freely available and it is merely adding a known technique (decoy alignment) into an existing process (filtering), and the former references allow for filtering using any method the user chooses, and this is just another method. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Herman et al. (BMC Bioinformatics (2015) 1-26), Eizenga et al. (Annual Review Genomics and Human Genetics (2020) 139-162), Rakocevic et al. (Nat Genet (2019) 354-362), De Summa et al. (BMC Bioinformatics (2017) 57-65) and Eren et al. (PloS one (2013) 1-6) as applied to claim 1, 11, and 13 above, and further in view of Duan et al. (Genome Biology (2019) 1-11).
Claim 17 is directed to the method of claim 13 and thus claim 1, but further specifies identification of a decoy sequence that includes a subsequence that is identical to a variant and if it is found, masking said sequence.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Eren et al. teaches on page 2, columns 1-2, paragraph 2 “The required quality control operations for each paired-end sequencing read include: 1) compute the reverse complement for the second read; 2) Within the last 30 nt of both the forward read and the reverse complemented second read search for the initial 6 nucleotides of the distal V6 primer; 3) Discard sequence pairs that do not perfectly match the initial 6 nt of distal primer in both reads; 4) Within the initial 40 nt of both the forward read and the reverse complemented second read search for the initial 10 nucleotides that matches a consensus sequence for the four proximal primers (967F-AQ, 967F-UC3, 967F-PP and 967F-UC12; see Table S1 for details); 5) If either of the reads fail to match the consensus, discard the pair; 6) Trim the proximal (including its barcode) and distal primers from each read; 7) Retain the sequence pair as a quality-passed V6 sequence if they share 100% consensus between the primer sites”.
Duan et al. teaches on page 8, column 2, paragraph 2 “We explored the repetitive elements of these sequences by RepeatMasker and compared them with that of reference genome (both the primary assembly sequences and decoy sequences (hs38d1)) to characterize the compositions of repetitive sequences in fully unaligned sequences. Finally, we aligned these fully unaligned sequences to the patch sequence, alternative loci and decoy sequences (hs38d1) as well as existing assembled individual genomes to determinate whether the fully unaligned sequences could be identified in other individuals”, reading on further comprising: upon determining that the second structural variant does not include the second subsequence, determining whether a decoy sequence associated with the reference sequence construct includes a third subsequence, wherein the third subsequence is identical to at least a third portion of the first structural variant; and upon determining that the decoy sequence includes the third subsequence, masking the decoy sequence.
It would have been obvious at the time of first filing to have modified the teachings of Herman et al., Eizenga et al., Rakocevic et al.,  De Summa et al., and Eren et al. for the method of claim 13 including the discarding of sequences that match other sequences, and the teachings of Duan et al. for alignment of variants to decoy sequences as the latter provides a pipeline for the generation of a pangenome as well as states in the conclusion “HUPAN is a useful tool for capturing complexity of the human genome, and the constructed pangenome can be an important resource for a wide range of human genome-related biomedical studies”. One would have had a reasonable expectation of success given that the software is freely available and it is merely adding a known technique (decoy alignment) into an existing process (filtering), and the former references allow for filtering using any method the user chooses, and this is just another method. Therefore, it would have been obvious at the time of first filing to have modified the teachings of each and to be successful.

Claim 30 is rejected under 35 U.S.C. 103 as being unpatentable over Herman et al. (BMC Bioinformatics (2015) 1-26), Eizenga et al. (Annual Review Genomics and Human Genetics (2020) 139-162), Rakocevic et al. (Nat Genet (2019) 354-362), and De Summa et al. (BMC Bioinformatics (2017) 57-65) as applied to claims 1, 28, and 37-38  above, and further in view of Schoniger et al. (Bulletin of Mathematical Biology (1992) 521–536).
Claim 30 is directed to the method of claim 28 and thus claim 1, but further specifies obtaining an alternative aligned position for the inverted sequence patch.
Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. teach the method of claim 1 as previously described.
Schoniger et al. teaches in the abstract “A dynamic programming algorithm to find all optimal alignments of DNA subsequences is described. The alignments use not only substitutions, insertions and deletions of nucleotides but also inversions (reversed complements) of substrings of the sequences. The inversion alignments themselves contain substitutions, insertions and deletions of nucleotides. We study the problem of alignment with non-intersecting inversions. To provide a computationally efficient algorithm we restrict candidate inversions to the K highest scoring inversions”, reading on wherein the first alternative sequence includes an inverted sequence patch; and wherein aligning the first alternative sequence to the reference sequence construct to obtain the aligned position comprises obtaining an alternative aligned position for the inverted sequence patch.
It would have been obvious at the time of first filing to modify the teaching of Herman et al., Eizenga et al., Rakocevic et al., and De Summa et al. for the method of claims 1, and 37-38, with the teachings of Schoniger et al. for the use of inversions as the latter is method for incorporating sequences of inversions into alignments as this is a well understood and necessary part of alignments. One would have had a reasonable expectation of success given that the alignment algorithm is provided and would merely be an incorporation of well-understood methods into another well-understood method for a predictable outcome. Therefore, it would have been obvious at the time of first filing to a person skilled in the art to have modified the teachings of each and to be successful.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KEENAN NEIL ANDERSON-FEARS whose telephone number is (571)272-0108. The examiner can normally be reached M-Th, alternate F, 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Karlheinz Skowronek can be reached at 571-272-9047. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/K.N.A./Examiner, Art Unit 1687               

/OLIVIA M. WISE/Supervisory Patent Examiner, Art Unit 1685
Read full office action
SYSTEMS AND METHODS FOR GENERATING GRAPH REFERENCES

This examiner grants 6% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SYSTEMS AND METHODS FOR GENERATING GRAPH REFERENCES

This examiner grants 6% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email