DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Application
This action is in reply to the correspondence received through June 12, 2024.
Claims 1-18 are pending.
Information Disclosure Statement
The information disclosure statements submitted June 12, 2024 and their contents have been considered.
Claim Objections
Claims 5, 10-13, and 17 are objected to because of the following informalities: these claims include periods within the sentences instead of solely at the end. Appropriate correction is required.
Claim Rejections - 35 U.S.C. § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-18 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to non-statutory subject matter. Claims 1-18 are directed to an abstract idea without significantly more as required by the Alice test as discussed below.
Step 1
Claims 1-18 are directed to a process, machine, manufacture, or composition of matter.
Step 2A
Claims 1-18 are directed to abstract ideas, as explained below.
Prong one of the Step 2A analysis requires identifying the specific limitation(s) in the claim under examination that the examiner believes recites an abstract idea; and determining whether the identified limitation(s) falls within at least one of the groupings of abstract ideas of mathematical concepts, mental processes, and certain methods of organizing human activity.
The claims recite the following limitations that are directed to abstract ideas. Claim 15 recites receiving pharmacogenomic data representing at least one pharmacogenomic annotation in association with at least one gene; receiving at least one genomic variation of the at least one gene, searching the pharmacogenomic data for at least one association with each genomic variation, and returning the associated data, the associated data being a haplotype or diplotype and a phenotype; and generating at least one report comprising the associated data with the genomic variation associated. Claims 1 and 18 recite similar features as claim 15. Claims 2-14, 16, and 17 further specify features of the identified abstract ideas or characteristics of the data used thereby.
These limitations describe abstract ideas that correspond to concepts identified as abstract ideas by the courts as mental processes—such as concepts performed in the human mind (including an observation, evaluation, judgment, or opinion)—because the claimed features identified above are concepts performed in the human mind (including an observation, evaluation, judgment, or opinion).
These limitations describe abstract ideas that correspond to concepts identified as abstract ideas by the courts as certain methods of organizing human activity—such as fundamental economic principles or practices (including hedging, insurance, mitigating risk), commercial or legal interactions (including agreements in the form of contracts; legal obligations; advertising, marketing or sales activities or behaviors; business relations), managing personal behavior or relationships or interactions between people (including social activities, teaching, and following rules or instructions)—because the claimed features identified above manage personal behavior or relationships or interactions between people including following rules or instructions.
Thus, the concepts set forth in claims 1-18 recite abstract ideas.
Prong two of the Step 2A requires identifying whether there are any additional elements recited in the claim beyond the judicial exception(s), and evaluating those additional elements to determine whether they integrate the exception into a practical application of the exception. “Integration into a practical application” requires an additional element or a combination of additional elements in the claim to apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the exception. Further, “integration into a practical application” uses the considerations laid out by the Supreme Court and the Federal Circuit to evaluate whether the judicial exception is integrated into a practical application, such as considerations discussed in M.P.E.P. § 2106.05(a)-(h).
The claims recite the following additional elements beyond those identified above as being directed to an abstract idea. Claim 1 recites that its method is computer-implemented, a data processor, a database configuration engine, and a display generator configured to generate a display. Claim 15 recites similar features as claim 1. Claim 18 recites similar features as claim 1 and further recites a non-transitory computer readable medium and a processor. Several of the dependent claims further specify additional computer elements (e.g., non-transitory memory and a processor) or particular file types or formats (binary, text, FASTQ, BAM, etc.).
The identified judicial exception(s) are not integrated into a practical application for the following reasons.
First, evaluated individually, the additional elements do not integrate the identified abstract ideas into a practical application. The additional computer elements identified above—the computer, processors, displays, non-transitory computer readable medium, non-transitory memory—are recited at a high level of generality. Inclusion of these elements amounts to mere instructions to implement the identified abstract ideas on a computer. See M.P.E.P. § 2106.05(f). The use of conventional computer elements to generate a display or to use particular file types or formats for storing data is the insignificant, extra-solution activity of mere data gathering or outputting in conjunction with a law of nature or abstract idea. See M.P.E.P. § 2106.05(g). To the extent that the claims transform data, the mere manipulation of data is not a transformation. See M.P.E.P. § 2106.05(c). Inclusion of computing system in the claims amounts to generally linking the use of the judicial exception to a particular technological environment or field of use. See M.P.E.P. § 2106.05(h). Thus, taken alone, the additional elements do not amount to significantly more than a judicial exception.
Second, evaluating the claim limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. See M.P.E.P. § 2106.05(a). Their collective functions merely provide an implementation of the identified abstract ideas on a computer system in the general field of use of personalized medicine in pharmacogenomics. See M.P.E.P. § 2106.05(h).
Thus, claims 1-18 recite mathematical concepts, mental processes, or certain methods of organizing human activity without including additional elements that integrate the exception into a practical application of the exception.
Accordingly, claims 1-18 are directed to abstract ideas.
Step 2B
Claims 1-18 do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements, when considered both individually and as an ordered combination, do not amount to significantly more than the abstract idea.
The analysis above describes how the claims recite the additional elements beyond those identified above as being directed to an abstract idea, as well as why identified judicial exception(s) are not integrated into a practical application. These findings are hereby incorporated into the analysis of the additional elements when considered both individually and in combination. Additional features of these analyses are discussed below.
Evaluated individually, the additional elements do not amount to significantly more than a judicial exception. In addition to the factors discussed regarding Step 2A, prong two, these additional computer elements also provide conventional computer functions that do not add meaningful limits to practicing the abstract idea. Generic computer components recited as performing generic computer functions that are well-understood, routine and conventional activities amount to no more than implementing the abstract idea with a computerized system. The use of generic computer components to display or to use particular file types or formats for storing data is likewise the well-understood, routine, and conventional computer functions of receiving or transmitting data over a network, e.g., the Internet, and does not impose any meaningful limit on the computer implementation of the identified abstract ideas. See M.P.E.P. § 2106.05(d)(II). Similarly, the use of generic computer components to use particular file types or formats for storing data is likewise the well-understood, routine, and conventional computer functions of receiving, processing, and storing data and does not impose any meaningful limit on the computer implementation of the identified abstract ideas. See M.P.E.P. § 2106.05(d)(II). Thus, taken alone, the additional elements do not amount to significantly more than a judicial exception.
Evaluating the claim limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. In addition to the factors discussed regarding Step 2A, prong two, there is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely amount to mere instructions to implement the identified abstract ideas on a computer.
Thus, claims 1-18, taken individually and as an ordered combination of elements, are not directed to eligible subject matter since they are directed to an abstract idea without significantly more.
Claim Rejections - 35 U.S.C. § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. § 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-4 and 11 are rejected under 35 U.S.C. § 102(a)(1)-(2) as being anticipated by Lefkofsky et al. (U.S. Pub. No. 2021/0350904 A1) (hereinafter “Lefkofsky”).
Claims 1, 15, and 18: Lefkofsky, as shown, discloses the following limitations:
a data processor configured to receive pharmacogenomic data representing at least one pharmacogenomic annotation in association with at least one gene (see at least ¶ [0141]: a knowledge database 40 may be generated for accumulating a cohort of patient molecular data, such as NGS results, and clinical information. The accumulated patient information may be analyzed to identify insights from the information such as potential biomarkers or trends in pharmacogenomics. The knowledge database 40 (KDB) may include treatment implications, diagnostic implications, and prognostic implications. The KDB 40 may include structured data regarding drug-gene interactions, including pharmacogenetic interactions, and precision medicine findings reported in the psychiatric and basic science literature. The KDB 40 may include clinically annotated pharmacogenomic classifications for key pharmacodynamic and pharmacokinetic results related to the treatment of depression and other psychiatric diseases. The KDB 40 of therapeutic and prognostic evidence, which includes therapeutic response and resistance information, may include information from a combination of external sources, which may include sources such as CPIC guidelines, FDA labeling, PharmGKB, Dutch Pharmacogenetics Working Group (DPWG), and/or other proprietary databases or information sources that are either public or available by subscription or upon request, as well as literature sources or novel findings from analyzing a repository of clinical and genetic, genomic, or other-omic information. The KDB 40 may be maintained over time by individuals with experience, education, and training in the relevant field. In some embodiments, clinical actionability entries in the KDB 40 are structured by both (1) the disease and/or the drug-gene interaction to which the evidence applies and (2) the level or strength of evidence; see also at least ¶¶ [0119], [0150]-[0151], [0180]-[0181], and [0323]);
a database configuration engine configured to receive at least one genomic variation of the at least one gene and to search the pharmacogenomic data for at least one association with each genomic variation to return the associated data, the associated data being a haplotype or diplotype and a phenotype (see at least ¶ [0143]: KDB 40 may contain a data dictionary including clinical data elements, imaging information, molecular data such as but not limited to DNA sequence information, single nucleotide variants (SNV), insertion/deletions (indels), copy number variation (CNV), fusion variants, RNA expression (including miRNA expression), microbiome information, haplotypes or alleles including star alleles, haplotype groups or diplotypes including star allele combinations, phenotype of each haplotype or haplotype group, associated medication, associated risk, associated prognosis, associated diagnostic feature, therapy classification (including standard administration, dose adjustment, contraindication, etc.), incidental germline findings including variants associated with additional health implications, epigenetic values, proteomic values, analyses thereof (such as features extracted from images), and/or combinations of one or more of the data elements or data types included in the KDB 40 (for example, a CYP2D6 gene variant combined with a CYP2C19 gene variant and an associated phenotype, or clinical data matched with molecular and/or image data, aggregated from multiple patient records and analyzed to determine associations among the data types). The KDB 40 may also include behavioral indicators including patient activities or movement patterns and related scores or indicators derived from an image of a patient's face; see also at least ¶ [0137]: the terms “genomic alteration,” “mutation,” and “variant” refer to a detectable change in the genetic material of one or more cells; see also at least ¶ [0145]: the analytic power of NGS stands out above conventional methods of processing genetic variants or alleles which have pharmacogenetic importance. Because the entirety of the normal human genome may be referenced for each of the targeted genes (described in more detail below), NGS may identify previously unobserved variant calls even if the variant was not targeted by the NGS pane; see also at least ¶ [0162]: the system 10 can process an actionable gene database to explicitly generate an importance for each gene from the evidence for, and the impact of, a mutation, allele, haplotype, diplotype, or predicted phenotype for helping a clinician make a drug choice. Genes not in the actionable gene database have a weight of 0. Conversely, a gene of weight 1 has a highest confidence that it affects the action of an FDA-approved drug for the cancer type in question for that variant. Other factors for adjusting a gene weighting may include evidence of being a driver gene or having an associated drug interaction in the metric and an assessment of DNA variation at the variant level, rather than just at the gene level; see also at least ¶¶ [0119], [0132], [0141], [0180]-[0181], and [0323]);
a report generator configured to generate at least one report comprising the associated data with the genomic variation associated (see at least ¶ [0175]: analytics using one or more analytics modules 36 may include one or more therapy engines that can generate reports listing predicted drugs that may be used to effectively treat a patient, predicted effective dosage amounts for one or more drugs, potential drug side effects, and/or other treatment predictions based on patient genetic data and real-world clinical data; see also at least ¶¶ [0141], [0143], and [0180]-[0181]); and
a display generator configured to generate a display based on the at least one report, the display further comprising at least one interface element representing the associated data with the genomic variation associated (see at least ¶ [0187]: FIG. 2 is a graphical user interface (GUI) 50 that can be implemented in system 10 to provide patient information for depression (or psychiatric disorders or illnesses, including mood disorders, bipolar disorder, schizophrenia, personality disorders, etc.). As shown, a provider can login to the provided platform (such as account name 54). Here, a physician can view a patient summary table 52 that includes a list view of each of his patients. In some aspects, patient summary table 52 includes patient names, attending physician, report type, report date, and status of a test, order, and/or patient. Various report types can be generated for patients. For example, a risk assessment report may include an analysis of patient risk factors (such as family history, genetics, etc.). Alternatively, a diagnostic report may include test results post-treatment, predictions of disorder progression, further subtyping of the patient's disease, identification of a possible misdiagnosis of the patient, prognostic implications, etc. Reports may be based on single data types (for example, DNA reports) or combinations of multiple data types (for example, clinical and molecular data). Patient summary table 52 can include additional data not represented in FIG. 2. As shown, a provider can select an individual patient to view additional information. The corresponding table row can indicate selection via highlighting or other means; see also at least ¶¶ [0188]-[0193]).
Lefkofsky discloses various architecture for implementing the features addressed above (see at least ¶ [0119]: the disclosed subject matter may be implemented as a system, method, apparatus, or article of manufacture using programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer or processor based device to implement aspects detailed herein; see also at least ¶¶ [0152] and [0319]).
Claim 2: Lefkofsky discloses the limitations as shown in the rejections above. Further, Lefkofsky, as shown, discloses the following limitations:
wherein the phenotype comprises adverse drug reactions, metabolizing status, efficacy indications, dosing data, alternative drug data, pharmacogenomic indication, or prescribing data (see at least ¶ [0143]; see also at least ¶ [0162]: the system 10 can process an actionable gene database to explicitly generate an importance for each gene from the evidence for, and the impact of, a mutation, allele, haplotype, diplotype, or predicted phenotype for helping a clinician make a drug choice. Genes not in the actionable gene database have a weight of 0. Conversely, a gene of weight 1 has a highest confidence that it affects the action of an FDA-approved drug for the cancer type in question for that variant. Other factors for adjusting a gene weighting may include evidence of being a driver gene or having an associated drug interaction in the metric and an assessment of DNA variation at the variant level, rather than just at the gene level; see also at least ¶ [0175]: analytics using one or more analytics modules 36 may include one or more therapy engines that can generate reports listing predicted drugs that may be used to effectively treat a patient, predicted effective dosage amounts for one or more drugs, potential drug side effects, and/or other treatment predictions based on patient genetic data and real-world clinical data; see also at least ¶¶ [0119], [0132], [0141], [0180]-[0181], and [0323]).
Claim 3: Lefkofsky discloses the limitations as shown in the rejections above. Further, Lefkofsky, as shown, discloses the following limitations:
wherein the report generator is configured to receive at least one text-based file representing at least one genetic sequence and generate at least one binary file representing at least one genetic sequence, at least one index file for the at least one binary file, and at least one text file for the at least one binary file (see at least ¶ [0129]: this message may contain sample identifiers, as well as the location of BAM files. A BAM file (.bam) is the binary version of a SAM file. A SAM file (.sam) is a tab-delimited text file that contains sequence alignment data (such as the raw sequencing data). When a message is received, a service may be triggered to evaluate the sequencing data for pharmacogenomics factors; see also at least ¶ [0130]: the term “BAM File” or “Binary file containing Alignment Maps” refers to a file storing sequencing data aligned to a reference sequence (e.g., a reference genome or exome). In some embodiments, a BAM file is a compressed binary version of a SAM (Sequence Alignment Map) file that includes, for each of a plurality of unique sequence reads, an identifier for the sequence read, information about the nucleotide sequence, information about the alignment of the sequence to a reference sequence, and optionally metrics relating to the quality of the sequence read and/or the quality of the sequence alignment; see also at least ¶ [0131]: BAM files can be generated by aligning raw molecular data to a reference genome. For example, raw molecular data can be stored in BCL, FASTA, and/or FASTQ file formats. A suitable process can align the raw molecular data to a human reference sequence and generate aligned sequence reads. The aligned sequence reads can be stored in SAM and/or BAM file formats; see also at least ¶¶ [0187]-[0188] and [0197]).
Claim 4: Lefkofsky discloses the limitations as shown in the rejections above. Further, Lefkofsky, as shown, discloses the following limitations:
a machine learning engine configured to predict at least one genomic variant, wherein at least one of the at least one genomic variation is determined as the at least one genomic variant (see at least ¶ [0181]: the therapy engines may also comprise one or more machine learning algorithms or neural networks. A machine learning algorithm (MLA) or a neural network (NN) may be trained from a training data set. For a depression disease state, an exemplary training data set may include the clinical and molecular details of a patient such as those curated from the Electronic Health Record or genetic sequencing reports. MLAs include supervised algorithms (such as algorithms where the features/classifications in the data set are annotated) using linear regression, logistic regression, decision trees, classification and regression trees, Naïve Bayes, nearest neighbor clustering; unsupervised algorithms (such as algorithms where no features/classification in the data set are annotated) using Apriori, means clustering, principal component analysis, random forest, adaptive boosting; and semi-supervised algorithms (such as algorithms where certain features/classifications in the data set are annotated) using generative approach (such as a mixture of Gaussian distributions, mixture of multinomial distributions, hidden Markov models), low density separation, graph-based approaches (such as mincut, harmonic function, manifold regularization), heuristic approaches, or support vector machines. NNs include conditional random fields, convolutional neural networks, attention based neural networks, long short term memory networks, or other neural models where the training data set includes a plurality of samples and RNA expression data for each sample. While MLA and neural networks identify distinct approaches to machine learning, the terms may be used interchangeably herein. Thus, a mention of MLA may include a corresponding NN or a mention of NN may include a corresponding MLA; see also at least ¶ [0182]: training may include identifying common clinical traits or genetic traits that patients of the overall cohort or patient database may exhibit, labeling these traits as they occur in patient records, and training the MLA to identify patterns in the outcomes of patients based on their treatments as well as their clinical and genetic information. Outputs from one or more analytics module 36 can be provided to display device 16 via communication network 18. Further, provider 12 can input additional data via display device 16 (such as a prescribed treatment), and the data can be transmitted to server 20; see also at least ¶¶ [0119], [0132], [0141], [0162], [0180], and [0323]).
Claim 11: Lefkofsky discloses the limitations as shown in the rejections above. Further, Lefkofsky, as shown, discloses the following limitations:
at least one processor (see at least ¶ [0119]: the disclosed subject matter may be implemented as a system, method, apparatus, or article of manufacture using programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer or processor based device to implement aspects detailed herein; see also at least ¶¶ [0152] and [0319]); and
at least one non-transitory memory storing computer-executable instructions (see at least ¶¶ [0119], [0152], and [0319]) which, when executed, cause the at least one processor to perform a method, the method comprising:
generating at least one annotated variant training dataset, the generating comprising: receiving at least one annotated variant dataset, annotated based on protein functional domain data, sequence ontology data, and at least one prediction score (see at least ¶ [0127]: “molecular data” includes information such as the sequence and/or amount (e.g., expression level, or duplication/deletion information) of one or more proteins, DNA, or RNA samples of a subject, a control subject, or a cohort. By way of example but not by way of limitation, in some embodiments, molecular data includes DNA sequence information including but not limited to whole exome genetic data, single nucleotide variants (SNV), insertion/deletions (indels), copy number variation (CNV), fusion variants, RNA expression data (including miRNA expression), microbiome information, haplotypes or alleles information including star alleles, haplotype groups or diplotypes including star allele combinations, mass array data, microarray data. Whole exome genetic data pertaining to any of the exons in the human genome may further include intronic regions targeted, for example, by intron-specific probes spiked into a whole exome panel. Molecular data as used herein also includes targeted panels of DNA or RNA data (including sequence data and/or expression level data), and targeted panels of protein data. By way of example but not by way of limitation, a targeted panel includes an assay designed for evaluating or analyzing only specific genetic sequences such as specific genes, parts of genes, or specific non-coding sequences (e.g., introns or promoter regions), or specific proteins, as opposed to whole genome RNA analysis for example; see also at least ¶ [0219] the drugs, doses, and other therapies selected for listing in the patient report may be based on data associated with the patient, including the patient's genetic sequencing results, a molecular profile including RNA or protein expression levels, the patient's medical records, therapy notes, patient-reported outcomes (PROs), similarities to other patients' sequencing information or other molecular profiles and/or reported successful treatment regimens of molecularly similar patients); and
applying k-nearest neighbour (kNN) imputation to the at least one annotated variant dataset to generate one or more values for missing data (see at least ¶ [0181]: the therapy engines may also comprise one or more machine learning algorithms or neural networks. A machine learning algorithm (MLA) or a neural network (NN) may be trained from a training data set. For a depression disease state, an exemplary training data set may include the clinical and molecular details of a patient such as those curated from the Electronic Health Record or genetic sequencing reports. MLAs include supervised algorithms (such as algorithms where the features/classifications in the data set are annotated) using linear regression, logistic regression, decision trees, classification and regression trees, Naïve Bayes, nearest neighbor clustering; unsupervised algorithms (such as algorithms where no features/classification in the data set are annotated) using Apriori, means clustering, principal component analysis, random forest, adaptive boosting; and semi-supervised algorithms (such as algorithms where certain features/classifications in the data set are annotated) using generative approach (such as a mixture of Gaussian distributions, mixture of multinomial distributions, hidden Markov models), low density separation, graph-based approaches (such as mincut, harmonic function, manifold regularization), heuristic approaches, or support vector machines. NNs include conditional random fields, convolutional neural networks, attention based neural networks, long short term memory networks, or other neural models where the training data set includes a plurality of samples and RNA expression data for each sample. While MLA and neural networks identify distinct approaches to machine learning, the terms may be used interchangeably herein. Thus, a mention of MLA may include a corresponding NN or a mention of NN may include a corresponding MLA; see also at least ¶ [0182]; see also at least ¶¶ [0119], [0132], [0141], [0162], [0180], and [0323]); and
training the machine learning model using the at least one annotated variant training dataset (see also at least ¶ [0182]: training may include identifying common clinical traits or genetic traits that patients of the overall cohort or patient database may exhibit, labeling these traits as they occur in patient records, and training the MLA to identify patterns in the outcomes of patients based on their treatments as well as their clinical and genetic information. Outputs from one or more analytics module 36 can be provided to display device 16 via communication network 18. Further, provider 12 can input additional data via display device 16 (such as a prescribed treatment), and the data can be transmitted to server 20; see also at least ¶¶ [0119], [0132], [0141], [0162], [0180]-[0181], and [0323]).
wherein the at least one annotated variant dataset is annotated using a Variant Effect Predictor (VEP) (see at least ¶ [0143]; see also at least ¶ [0145]: the analytic power of NGS stands out above conventional methods of processing genetic variants or alleles which have pharmacogenetic importance. Because the entirety of the normal human genome may be referenced for each of the targeted genes (described in more detail below), NGS may identify previously unobserved variant calls even if the variant was not targeted by the NGS pane; see also at least ¶ [0162]: the system 10 can process an actionable gene database to explicitly generate an importance for each gene from the evidence for, and the impact of, a mutation, allele, haplotype, diplotype, or predicted phenotype for helping a clinician make a drug choice. Genes not in the actionable gene database have a weight of 0. Conversely, a gene of weight 1 has a highest confidence that it affects the action of an FDA-approved drug for the cancer type in question for that variant. Other factors for adjusting a gene weighting may include evidence of being a driver gene or having an associated drug interaction in the metric and an assessment of DNA variation at the variant level, rather than just at the gene level; see also at least ¶¶ [0119], [0132], [0141], [0180]-[0181], and [0323]).
Statement Regarding the Prior Art
Claim 5 recites features that the machine learning engine is configured to detect genomic variants leading to altered protein function, the machine learning engine comprising: a non-transitory memory storing one or more features from an annotated variant dataset of at least one variant; a variant validator configured to determine one or more validated variants of the annotated variant dataset, each validated variant matching one or more known variants of a known variant dataset, each known variant leading to altered protein function; a machine learning model configured to assign a classification to one or more predicted variants of variants of the annotated variant dataset not selected as validated variants, each predicted variant leading to altered protein function, the assigning by the machine learning model based on at least one of the one or more features stored in the memory; and a loss-of-function detector configured to determine one or more sequence ontology variants of the variants of the annotated variant dataset not selected as validated variants and not classified as predicted variants, each sequence ontology variant being a loss-of-function variant, the determining by the loss-of-function detector based on at least one of the features stored in the memory. the annotated variant dataset is generated using a Variant Effect Predictor (VEP). each sequence ontology variant is determined by filtering based on sequence ontology data. the loss-of-function variant is a splice acceptor variant, a splice donor variant, a stop gained variant, a frameshift variant, a stop loss variant, or a start loss variant.
Claim 12 recites features in which each prediction score is generated using LoFtool, DEOGEN2, MPC, BayesDel_addAF, FATHMM, integrated_fitCons, or LIST.S2. wherein the protein functional domain data is Interpro domain data. wherein the sequence ontology data represents a splice acceptor variant, a splice donor variant, a stop gained variant, a frameshift variant, a stop lost variant, a start lost variant, or a combination thereof. wherein generating at least one annotated variant training dataset further comprises: generating a LoF indicator feature using the sequence ontology data, the LoF indicator feature representing a loss-of-function variant. wherein generating at least one annotated variant training dataset further comprises: generating an Interpro indicator feature using the Interpro domain data, the Interpro indicator feature representing an effect on an Interpro domain.
Claim 13 recites features in which the machine learning model is a random forest classifier having decision trees, the machine learning model configured to assign a classification based on bootstrap aggregation using the decision trees. wherein the kNN imputation is kNN imputation with weighted mean. wherein generating at least one annotated variant training dataset further comprises: removing data from the at least one annotated variant dataset, wherein the data corresponds to a variant having a percentage greater than or equal to 40%, collectively, of missing values for the annotations, the removing performed before kNN imputation is applied to the at least one annotated variant dataset; and removing data from the at least one annotated variant dataset, wherein the data corresponds to a feature having a percentage greater than or equal to 40%, collectively, of missing values for variants represented in the at least one annotated variant dataset, the removing performed before kNN imputation is applied to the at least one annotated variant dataset.
Claim 14 recites features for performing variant deduplication on the at least one annotated variant dataset to generate at least one new annotated variant dataset; extracting features from the at least one annotated variant dataset, the features comprising protein functional domain data, sequence ontology data, at least one prediction score, at least one variant identifier, and at least one sequence identifier; generating a LoF indicator feature using the sequence ontology data, the LoF indicator feature representing a loss-of-function variant; and generating an Interpro indicator feature using the Interpro domain data, the Interpro indicator feature representing an effect on an Interpro domain.
Lefkofsky does not disclose all of the features of claims 5, 12, 13, and 14.
Ward et al. (U.S. Pub. No. 2021/0115513 A1) teaches techniques for using genetic markers associated with endometriosis, for example via a computer-implemented program to predict risk of developing endometriosis, and methods of preventing or treating endometriosis or a symptom thereof. These techniques can utilize detection of endometriosis associated biomarkers such as single nucleotide polymorphisms (SNPs), insertion deletion polymorphisms (indels), damaging mutation variants, loss of function variants, synonymous mutation variants, nonsynonymous mutation variants, nonsense mutations, recessive markers, splicing/splice-site variants, frameshift mutations, insertions, deletions, genomic rearrangements, stop-gain, stop-loss, Rare Variants (RVs), some of which are identified in its Tables 1-4 (or diagnostically and predicatively functionally comparable biomarkers). In some instances, the method can comprise using a statistical assessment method such as Multi Dimensional Scaling analysis (MDS), logistic regression, or Bayesian analysis. However, Ward does not teach all of the features of claims 5, 12, 13, and 14.
Tymoshenko et al. (U.S. Pub. No. 2021/0256394 A1) teaches methods and systems for identifying variants of a given target protein or target gene that perform the same function and/or improve the phenotypic performance of a host cell transformed with such a variant. To enhance the diversity of identified candidate sequences, the methods may implement the use of a metagenomic database and/or machine learning methods. The methods and systems may be implemented in optimizing a biosynthetic pathway, e.g., to improve the production of a target molecule of interest. However, Tymoshenko does not teach all of the features of claims 5, 12, 13, and 14.
Albertsen et al. (U.S. Pub. No. 2021/0292841 A1) teaches techniques of sequencing one or more genes selected to identify one or more protein damaging or loss of function variants in a human subject suspected of having or developing endometriosis; and administering an endometriosis therapy to the human subject. In these techniques the one or more protein damaging or loss of function variants comprise a stop-gain mutation, a spice-site mutation, a frameshift mutation, a missense mutation, or any combination thereof. However, Albertsen does not teach all of the features of claims 5, 12, 13, and 14.
Hahm et al. (U.S. Pub. No. 2018/0121601 A1) teaches that there are many processing stages for data from DNA (or RNA) sequencing to mapping and aligning to sorting and/or de-duplicating to variant calling, which can vary depending on the primary and/or secondary and/or tertiary processing technologies employed and their applications. However, Hahm does not teach all of the features of claims 5, 12, 13, and 14.
The closest art of record, including the prior art discussed above, each fail to teach, suggest, or render obvious each and every element of the claims as presently arranged in the claims. Further, based on the evidence of record, it appears as though one of ordinary skill in the art at the time of invention would not look to combine these references, or the closest art of record, to arrive at the present claims, without using impermissible hindsight.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. The following references have been cited to further show the state of the art with respect to personalized medicine in pharmacogenetics.
Hatchwell et al. (U.S. Pub. No. 2019/0071726 A1);
Xu et al. (U.S. Pub. No. 2021/0125689 A1);
Lopes et al. (“Targeted genotyping in clinical pharmacogenomics: what is missing?” The Journal of Molecular Diagnostics 24.3 (2022): 253-261).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Christopher Tokarczyk, whose telephone number is 571-272-9594. The examiner can normally be reached Monday-Thursday between 6:00 AM and 4:00 PM Eastern.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mamon Obeid, can be reached at 571-270-1813. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHRISTOPHER B TOKARCZYK/Primary Examiner, Art Unit 3687