Office Action Analysis: 17808902 — MACHINE-LEARNING MODEL FOR GENERATING CONFIDENCE CLASSIFICATIONS FOR GENOMIC COORDINATES

Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
Claims 1-20 are currently pending and under exam herein.
Claims 1-20 are rejected.
Claim 15 is objected to.

Priority
	The instant application claims benefit to provisional application No. 63/216,382 filed on 29 June 2021. Domestic benefit is acknowledged. At this point in examination, the effective filing date of claims 1-20 is 29 June 2021. 

Information Disclosure Statement
	The information disclosure statements (IDS) submitted on 21 July 2022, 10 November 2022, and 6 April 2023 comply with 37 CFR 1.98. Accordingly, all references listed have been considered by the examiner.

Drawings
The drawings filed on 24 June 2022 are objected to because Figures 2-3 and 5-6B depict sequences with 10 or more nucleotides without the necessary sequence listings and sequence identifiers (see nucleotide sequence disclosures section below).  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Nucleotide Sequence Disclosures
REQUIREMENTS FOR PATENT APPLICATIONS CONTAINING NUCLEOTIDE AND/OR AMINO ACID SEQUENCE DISCLOSURES

Items 1) and 2) provide general guidance related to requirements for sequence disclosures.
37 CFR 1.821(c) requires that patent applications which contain disclosures of nucleotide and/or amino acid sequences that fall within the definitions of 37 CFR 1.821(a) must contain a "Sequence Listing," as a separate part of the disclosure, which presents the nucleotide and/or amino acid sequences and associated information using the symbols and format in accordance with the requirements of 37 CFR 1.821 - 1.825. This "Sequence Listing" part of the disclosure may be submitted:
In accordance with 37 CFR 1.821(c)(1) via the USPTO patent electronic filing system (see Section I.1 of the Legal Framework for Patent Electronic System (https://www.uspto.gov/PatentLegalFramework), hereinafter "Legal Framework") as an ASCII text file, together with an incorporation-by-reference of the material in the ASCII text file in a separate paragraph of the specification as required by 37 CFR 1.823(b)(1) identifying:
the name of the ASCII text file;
ii) the date of creation; and
iii) the size of the ASCII text file in bytes;
In accordance with 37 CFR 1.821(c)(1) on read-only optical disc(s) as permitted by 37 CFR 1.52(e)(1)(ii), labeled according to 37 CFR 1.52(e)(5), with an incorporation-by-reference of the material in the ASCII text file according to 37 CFR 1.52(e)(8) and 37 CFR 1.823(b)(1) in a separate paragraph of the specification identifying:
the name of the ASCII text file;
the date of creation; and
the size of the ASCII text file in bytes;
In accordance with 37 CFR 1.821(c)(2) via the USPTO patent electronic filing system as a PDF file (not recommended); or
In accordance with 37 CFR 1.821(c)(3) on physical sheets of paper (not recommended).
When a “Sequence Listing” has been submitted as a PDF file as in 1(c) above (37 CFR 1.821(c)(2)) or on physical sheets of paper as in 1(d) above (37 CFR 1.821(c)(3)), 37 CFR 1.821(e)(1) requires a computer readable form (CRF) of the “Sequence Listing” in accordance with the requirements of 37 CFR 1.824.
If the "Sequence Listing" required by 37 CFR 1.821(c) is filed via the USPTO patent electronic filing system as a PDF, then 37 CFR 1.821(e)(1)(ii) or 1.821(e)(2)(ii) requires submission of a statement that the "Sequence Listing" content of the PDF copy and the CRF copy (the ASCII text file copy) are identical.
If the "Sequence Listing" required by 37 CFR 1.821(c) is filed on paper or read-only optical disc, then 37 CFR 1.821(e)(1)(ii) or 1.821(e)(2)(ii) requires submission of a statement that the "Sequence Listing" content of the paper or read-only optical disc copy and the CRF are identical.
Specific deficiencies and the required response to this Office Action are as follows:
	Figures 2-3 and 5-6B depict sequences containing 10 or more nucleotides. 

Specific deficiency - This application fails to comply with the requirements of 37 CFR 1.821 - 1.825 because it does not contain a "Sequence Listing" as a separate part of the disclosure or a CRF of the “Sequence Listing.”.
Required response - Applicant must provide:
A "Sequence Listing" part of the disclosure; together with 
An amendment specifically directing its entry into the application in accordance with 37 CFR 1.825(a)(2);
A statement that the "Sequence Listing" includes no new matter as required by 37 CFR 1.821(a)(4); and
A statement that indicates support for the amendment in the application, as filed, as required by 37 CFR 1.825(a)(3).
If the "Sequence Listing" part of the disclosure is submitted according to item 1) a) or b) above, Applicant must also provide:
A substitute specification in compliance with 37 CFR 1.52, 1.121(b)(3) and 1.125 inserting the required incorporation-by-reference paragraph, consisting of:
A copy of the previously-submitted specification, with deletions shown with strikethrough or brackets and insertions shown with underlining (marked-up version);
A copy of the amended specification without markings (clean version); and
A statement that the substitute specification contains no new matter.
If the "Sequence Listing" part of the disclosure is submitted according to item 1) c) or d) above, applicant must also provide:
A CRF in accordance with 37 CFR 1.821(e)(1) or 1.821(e)(2) as required by 1.825(a)(5); and
A statement according to item 2) a) or b) above.

Specific deficiency – Nucleotide and/or amino acid sequences appearing in the drawings are not identified by sequence identifiers in accordance with 37 CFR 1.821(d). Sequence identifiers for nucleotide and/or amino acid sequences must appear either in the drawings or in the Brief Description of the Drawings.
Required response – Applicant must provide:
Replacement and annotated drawings in accordance with 37 CFR 1.121(d) inserting the required sequence identifiers;
AND/OR
A substitute specification in compliance with 37 CFR 1.52, 1.121(b)(3) and 1.125 inserting the required sequence identifiers into the Brief Description of the Drawings, consisting of: 
A copy of the previously-submitted specification, with deletions shown with strikethrough or brackets and insertions shown with underlining (marked-up version);
A copy of the amended specification without markings (clean version); and 
A statement that the substitute specification contains no new matter.

Claim Objections
Claim 15 is objected to because it recites “an example nucleic-acid sequence” twice, which could be interpreted as the claim requiring two distinct example nucleic-acid sequences. To overcome this objection, one of the recited “an example nucleic-acid sequence” should be removed, or the second recitation of “an example nucleic-acid sequence” should be corrected to “the example nucleic-acid sequence.”  
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-27 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (abstract ideas) without significantly more. Under MPEP § 2106, subject matter is patent eligible when the claimed invention is to one of the four statutory categories of invention [Step 1], and the claim is not directed to a judicial exception [Step 2A] unless the claim as a whole includes additional limitations amounting to significantly more than the exception [Step 2B]. 

Step 1
Claims 1-20 describe inventions that are to one of the statutory categories. In Step 1, a claim must fall within one of the four enumerated categories of statutory subject matter (process, machine, manufacture, or composition of matter); a claim falling outside these categories is ineligible without further analysis [MPEP § 2106.03]. Claims 1-10 are properly to one of the four statutory categories because the claimed invention is a system, which falls into the machine category [Step 1: Yes]. Claims 11-14 are properly to one of the four statutory categories because the claimed invention is a non-transitory computer readable storage medium storing instructions, which falls into the manufacture category [Step 1: Yes]. Claims 15-20 are properly to one of the four statutory categories because the claimed invention is a method, which falls into the process category [Step 1: Yes].

Step 2A 
Under Step 2A, a claim is directed to a judicial exception if, under the broadest reasonable interpretation, it recites an abstract idea, law of nature, or natural phenomena [Prong One] without the claim as a whole integrating the exception into a practical application [Prong Two]. Abstract ideas include mathematical concepts, mental processes, and certain methods of organizing human activity. Mathematical concepts encompass mathematical relationships, formulas, equations, and mathematical calculations [MPEP § 2106.04(a)(2)(I)]. Mental processes involve concepts that can be performed in the human mind or by a human with the aid of pen and paper, such as observations, evaluations, judgments, or opinions [MPEP § 2106.04(a)(2)(III)]. Certain methods of organizing human activity include fundamental economic principles, commercial or legal interactions, and managing personal behavior or relationships [MPEP § 2106.04(a)(2)(II)]. Laws of nature and natural phenomena, include naturally occurring principles/relations and nature-based products that are naturally occurring or that do not have markedly different characteristics compared to what occurs in nature [MPEP § 2106.04(b)-(c)].

Prong One 
A claim recites a judicial exception when it sets forth or describes a law of nature, natural phenomenon, or abstract idea. Claims 1-20 recite abstract ideas that fall into the groupings of mathematical concepts and mental processes. 
Independent Claims 
Claim 1 recites the following limitations, which describe abstract ideas within the mathematical concepts and/or mental processes groupings: 
determine sequencing metrics for comparing sample nucleic-acid sequences with genomic coordinates of an example nucleic-acid sequence; 
train a genome-location-classification model to determine confidence classifications for the genomic coordinates based on the sequencing metrics and ground- truth classifications for particular genomic coordinates; 
determine, utilizing the genome-location-classification model, a set of confidence classifications for a set of genomic coordinates based on a set of sequencing metrics for one or more sample nucleic-acid sequences.
The limitation of determining sequencing metrics is an abstract idea within the mathematical concepts and/or mental processes groupings because determining metrics through sequence comparison involves mathematical calculations or mental evaluations of alignment and quality patterns that could be performed manually for small datasets. The limitation of training a genome-location-classification model is an abstract idea within the mathematical concepts groupings because training a model using metrics and ground-truth involves mathematical optimization processes like adjusting parameters based on data correlations. The limitation of using the model to determine a set of confidence classifications is an abstract idea within the mathematical concepts and/or mental processes groupings because utilizing the model to determine classifications applies mathematical algorithms to input metrics, representing an evaluation step that humans could do mentally or with basic computation for assessing confidence. 
Claim 11 recites the following limitations, which describe abstract ideas within the mathematical concepts and/or mental processes groupings: 
detect a variant-nucleobase call at a genomic coordinate within a sample nucleic-acid sequence; 
identify, from a digital file, a confidence classification for the genomic coordinate according to a genome-location-classification model.
The limitation of detecting a variant-nucleobase call is an abstract idea within the mathematical concepts and/or mental processes groupings because detecting variants involves mental observation or mathematical comparison of sequences. The limitation of identifying a confidence classification is an abstract idea within the mental processes grouping because identifying classifications from a file is a mental process of retrieving and evaluating stored data. 
Claim 15 recites the following limitations, which describe abstract ideas within the mathematical concepts and/or mental processes groupings: 
determining, from an example nucleic-acid sequence, a contextual nucleic-acid subsequence surrounding a variant-nucleobase call in a sample nucleic-acid sequence at a genomic coordinate from genomic coordinates of an example nucleic-acid sequence; 
training a genome-location-classification model to determine confidence classifications for the genomic coordinate based on the contextual nucleic-acid subsequence and a ground-truth classification for the genomic coordinate; 
determining, utilizing the genome-location-classification model, a confidence classification for the genomic coordinate based on the contextual nucleic-acid subsequence. 
The limitation of determining a contextual subsequence is an abstract idea within the mental processes groupings because determining a subsequence is a mental extraction of contextual sequence parts. The limitation of training a model is an abstract idea within the mathematical concepts grouping because training on subsequence and ground-truth involves mathematical model adjustments. The limitation of utilizing the model to determine a confidence classification is an abstract idea within the mathematical concepts and/or mental processes groupings because utilizing the model to determine classifications applies mathematical algorithms to input metrics, representing an evaluation step that humans could do mentally or with basic computation for assessing confidence.

Dependent Claims 
	Claim 2 recites wherein the confidence classifications indicate a degree to which nucleobases can be accurately determined at the particular genomic coordinates. This limitation narrows abstract idea 2 of claim 1 by specifying classifications as probabilistic degrees, which is a mathematical representation of accuracy that could be mentally assessed or calculated. 
	Claim 4 recites the limitation of determining a confidence classification from the set of confidence classifications by determining the confidence classification for a genomic coordinate comprising a genetic modification or an epigenetic modification. Claim 16 recites determining the confidence classification comprises determining the confidence classification for a single nucleotide variant, a nucleobase insertion, a nucleobase deletion, a part of a structural variation, or a part of a copy number variation at a genomic coordinate. These limitations are abstract ideas within the mathematical concepts and/or mental processes groupings because they involve mathematical or mental determination of confidence for coordinates representing specific modifications or variants, extending the abstract evaluation to particular genomic features. 
	Claim 5 recites determining the sequencing metrics by determining one or more of: alignment metrics …; depth metrics …; or call-data-quality metrics. This narrows abstract idea 1 of claim 1 because the limitation indicates specific mathematical quantifications that are sequencing metrics, which are calculable correlations performable mentally or via basic math. Claim 6 recites determining the alignment metrics by …; determining the depth metrics by …; or determining the call-data-quality metrics by. This further limits the abstract ideas of claim 5 because it details mathematical metrics, which are precise calculations of data patterns that could be mentally derived for small datasets. 
	Claims 7 and 14 recite determining/identifying a confidence classification from the set of confidence classifications by determining/identifying at least one of a high-confidence classification, an intermediate-confidence classification, or a low-confidence classification for a genomic coordinate. These limitations are abstract ideas within the mental processes grouping because they involve mental categorization of confidence into levels based on evaluations. 
	Claim 8 recites determining a confidence classification from the set of confidence classifications by determining a confidence score within a range of confidence scores indicating a degree to which nucleobases can be accurately determined at a genomic coordinate. Claim 12 recites identifying, from the digital file, the confidence classification for the genomic coordinate by identifying the confidence classification indicating a degree to which nucleobases can be accurately determined at the genomic coordinate. Claim 17 recites determining a confidence score within a range of confidence scores indicating a degree to which nucleobases can be accurately determined at a genomic coordinate. These limitations are abstract ideas within the mathematical concepts grouping because they calculate scores mathematically to indicate the degree of accuracy. 
	Claim 9 recites training the genome-location-classification model to determine the confidence classifications by training a statistical machine-learning model (MLM) or a neural network to determine the confidence classifications. Similarly, claim 18 recites training a logistic regression model, a random forest classifier, or a convolutional neural network to determine the confidence classifications. These limitations narrow abstract idea 2 of the claims upon which they depend by specifying the type of model to be trained to determine the confidence classifications, which involves mathematical processes based on optimization, calculus, and linear algebra. 
	Claim 10 recites determining, from the example nucleic-acid sequence, a contextual nucleic-acid subsequence surrounding a variant-nucleobase call; and training the genome-location-classification model to determine a confidence classification for a genomic coordinate of the variant-nucleobase call. The limitation of determining a contextual subsequence is an abstract idea within the mental processes groupings because determining a subsequence is a mental observation of surrounding sequence parts. The limitation of training a model is an abstract idea within the mathematical concepts grouping because training on subsequence and ground-truth uses mathematical correlations for localized optimization. 
	Claim 13 recites identifying, from the digital file, the confidence classification by identifying the confidence classification from an annotation or a score for the genomic coordinate within the digital file. This limitation is an abstract idea within the mental processes grouping because it involves retrieving classifications from noted data, which could be performed mentally by a human. 
	Claim 19 recites comparing, for the genomic coordinate, a projected confidence classification to a ground- truth classification reflecting a Mendelian-inheritance pattern or a replicate concordance of nucleobase calls at the genomic coordinate; determining a loss from the comparison of the projected confidence classification to the ground-truth classification; and adjusting a parameter of the genome-location-classification model based on the determined loss. The limitation of comparing a projected confidence classification is an abstract idea within the mathematical concepts and/or mental processes groupings because comparing projections to truth is a mathematical or mental evaluation step. The limitation of determining a loss is an abstract idea within the mathematical concepts grouping because determining a loss calculates differences mathematically. The limitation of adjusting a parameter is an abstract idea within the mathematical concepts grouping because it involves mathematical optimization of the model based on the calculated loss, similar to training a model using a loss function. 

Prong Two 
Claims 1-20 as a whole do not integrate the recited judicial exception into a practical application. A claim that recites a judicial exception [Prong One] is deemed to be directed to a judicial exception [Step 2A] unless the claim as a whole contains additional elements that integrate the exception into a practical application [Prong Two]. A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception [MPEP § 2106.04(d) and MPEP § 2106.05(e)]. A claim does not integrate a judicial exception into a practical application by reciting insignificant extra-solution activity, generally linking the exception to a particular technological environment or field of use, merely reciting to apply the exception, merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea [MPEP § 2106.04(d)(I)]. Insignificant extra-solution activities are nominal or tangential additions to a claim that are incidental to the primary process or product, including both pre-solution and post-solution activity (e.g. pre-solution data gathering for use in a process). If integrated into a practical application, the claim is eligible; otherwise, it is directed to the judicial exception, necessitating further analysis at Step 2B. 
Claims 1-20 recite the following limitations, which are additional elements: 
Claims 1 and 11 recite at least one processor and a non-transitory computer readable medium comprising/storing instructions. 
Claims 1 and 15 recite generating at least one digital file comprising the confidence classifications for the genomic coordinates. 
Claim 3 recites wherein the sample nucleic-acid sequences are determined using a single sequencing pipeline comprising a nucleic-acid-sequence-extraction method, a sequencing device, and a sequence-analysis software. 
Claim 11 recites generate, for display within a graphical user interface, an indicator of the confidence classification for the genomic coordinate of the variant-nucleobase call. 
Claim 20 recites wherein the example nucleic-acid sequence comprises a reference genome or a nucleic-acid sequence of an ancestral haplotype. 
The limitations of at least one processor and a non-transitory computer readable medium comprising/storing instructions are generic computer components that amount to nothing more than mere instructions to apply the abstract ideas, which do not integrate the abstract ideas into a practical application. See MPEP § 2106.05(b) and (f). The limitations of generating a digital file and generating an indicator of the confidence classification are standard output steps that constitute insignificant post-solution activity because they are incidental to the primary process and merely a nominal or tangential addition to the claim, which do not integrate the abstract ideas into a practical application. See MPEP § 2106.05(g). The additional element of claim 3 describes standard pre-solution activity that is incidental to the primary process and merely a nominal or tangential addition to the claim, which does not integrate the abstract ideas into a practical application. See MPEP § 2106.05(g). The additional element of claim 20 specifies particular data types for the example sequence, which does not integrate the abstract ideas into a practical application but merely limits the data field without any improvement. See MPEP § 2106.05(h). 
Claims 2, 4-10, 12-14, and 16-18 do not recite any elements in addition to the recited abstract ideas. 
Overall, the claims as a whole merely recite insignificant extra-solution activities and abstract ideas implemented on generic computer components without meaningful limitations that tie it to a specific technological improvement. Therefore, claims 1-20 do not contain additional elements that integrate the recited abstract ideas into a practical application [Step 2A, Prong Two: No]. 

Step 2B 
Claims 1-20 do not include additional elements, whether considered individually or in combination, that are sufficient to amount to significantly more than the judicial exception itself. Under Step 2B, the claim is analyzed to determine whether there are any additional elements that, individually or in combination, constitute an “inventive concept” sufficient to ensure that the claim, as a whole, amounts to significantly more than the judicial exception itself [MPEP § 2106.05; Alice Corp. Pty. Ltd. v. CLS Bank Int'l, 573 U.S. 208, 217-18, 110 USPQ2d 1976, 1981 (2014)]. 
Claims 1-20 recite the following limitations, which are additional elements: 
Claims 1 and 11 recite at least one processor and a non-transitory computer readable medium comprising/storing instructions. 
Claims 1 and 15 recite generating at least one digital file comprising the confidence classifications for the genomic coordinates. 
Claim 3 recites wherein the sample nucleic-acid sequences are determined using a single sequencing pipeline comprising a nucleic-acid-sequence-extraction method, a sequencing device, and a sequence-analysis software. 
Claim 11 recites generate, for display within a graphical user interface (GUI), an indicator of the confidence classification for the genomic coordinate of the variant-nucleobase call. 
Claim 20 recites wherein the example nucleic-acid sequence comprises a reference genome or a nucleic-acid sequence of an ancestral haplotype. 
The limitations of at least one processor and a non-transitory computer readable medium comprising/storing instructions are generic computer components that amount to nothing more than mere instructions to apply the abstract ideas, which do not integrate the abstract ideas into a practical application. See MPEP § 2106.05(b) and (f). The additional element of claim 20 specifies particular data types for the example sequence, which does not integrate the abstract ideas into a practical application but merely limits the data field without any improvement. See MPEP § 2106.05(h). 
The limitations of generating a digital file and generating an indicator of the confidence classification are post-solution output steps that are well-understood, routine, and conventional in genomic data processing. See National Cancer Institute, Genomic Data Processing, National Institutes of Health, paras.2-3 (20 April 2021) (generating files and displaying data are standard in common genomic pipelines); and In Re Board of Trustees of the Leland Stanford Junior University, 989 F.3d 1367, 1371, 1373 (Fed. Cir. 2021) (affirming the Board’s conclusion that the steps of receiving data, performing calculations using that data, storing the results, and providing the results upon request using a computer did not go beyond the well-known, routine, and conventional). In combination with other additional elements, the limitations add nominal output to the abstract ideas without transforming the claim such that the claims, as a whole, recite generic automation without inventive concept sufficient to amount to significantly more. 
The sequencing pipeline recited in claim 3 is well-understood, routine, and conventional as single pipelines with extraction, sequencing devices, and software are the foundation of next-generation sequencing, which became widely adopted throughout the 2010s. See Katja Lohmann and Christine Klein, Next Generation Sequencing and the Future of Genetic Diagnosis, 11 Neurotherapeutics 699 col.2 para.2, 700 col.1 para.1, 701 fig.1 (23 July 2014). In combination with the additional elements inherited from claim 1, the limitation narrows data input without inventive integration, as the pipeline merely feeds the abstract ideas and the ordered combination is predictable for genomic analysis. 
Overall, claims 1-20 amount to no more than generic instructions to implement the abstract ideas on conventional computers and insignificant extra-solution activities that do not go beyond the well-known, routine, and conventional. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception itself because the claims recite additional elements that equate to insignificant extra-solution activity and mere instructions to apply the recited abstract ideas in a generic way or in a generic computing environment. Therefore, claims 1-20 are rejected for failing to set forth patent eligible subject matter under 35 U.S.C. 101 because the claimed invention recites abstract ideas [Step 2A, Prong One: Yes] and the additional elements do not integrate the judicial exception into a practical application [Step 2A, Prong Two: No] and do not amount to claiming significantly more than the recited exception [Step 2B: No].

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-10 and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shen (20(2) Genome Res. 273 (February 2010)), Atlas2Team ((@atlasmember), Atlas-SNP (last modified 12 March 2013)), Evani ((@uevani), Atlas2 Suite (last modified 19 June 2013)), and Wang (9 Ann. Data Sci. 187 (12 April 2020)), as evidenced by Patrick (80(4) Yale J Biol Med. 191 (April 2008)), Ling Shen (96(14) Proc Natl Acad Sci. 7871 (6 July 1999)), Brockman (18 Genome Res. 763 (22 January 2008)), Ansari (Medium, (14 June 2023)), and National Cancer Institute (National Institutes of Health, (10 January 2021)). 
Regarding claim 1, Shen discloses a method that estimates the probability of variant alleles for each identified single nucleotide polymorphism (SNP). At 273 col.2 para.3. Shen discloses determining alignment metrics, depth-coverage metrics, and call-quality metrics for comparing candidate SNP sites in a sample sequence to a reference sequence. At 274 col.2 paras.1-3 (determine sequencing metrics for comparing sample nucleic-acid sequences with genomic coordinates of an example nucleic-acid sequence; instant spec para. [0047] “‘genomic coordinate’ refers to a particular location or position of a nucleobase within a genome.” Candidate SNP sites identify a particular loci of a nucleobase, at 276 col.2 para.3). Shen trains a logistic regression model to determine the base-call error probability for a given sample sequence based numerous variables, including call-quality metrics and neighboring sequence context. At 279 col.1 para.2 (train a genome-location-classification model to determine confidence classifications for the genomic coordinates based on the sequencing metrics ). Shen uses the logistic regression model to determine a classification value for each SNP in a set of candidate SNP sites based on call-quality metrics and neighboring sequence context for the sample sequence. At 274 col.2 para.4; 275 col.1 para.4 (determine, utilizing the genome-location-classification model, a set of confidence classifications for a set of genomic coordinates based on a set of sequencing metrics for one or more sample nucleic-acid sequences). Shen establishes a list of candidate SNP sites, at 274 col.2 para.2, but Shen does not explicitly generate at least one digital file comprising the set of confidence classifications for the set of genomic coordinates. However, Shen discloses using sets of SNP probabilities among all identified candidate loci from the genome to determine the SNP probability for a particular locus using Atlas-SNP2 software. At 275 col.1 para.4. Additionally, Atlas2Team teaches that the Atlas-SNP2 software takes a Binary sequence Alignment/Mapping (BAM) file and a FASTA reference genome as input. § For 454/Illumina data; § For SOLiD data; see also Evani, para.2. A person having ordinary skill in the art would understand that a BAM file should be generated that contains the classification values determined by the logistic regression model for the candidate locus. One of ordinary skill in the art would reasonably expect success because the Atlas-SNP2 software requires a BAM file as input. Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention is likely to be obvious. See KSR International Co. v. Teleflex Inc., 550 U.S. 398, 415-421, USPQ2d 1385, 1395 – 97 (2007) (see MPEP § 2143, G).
Additionally, Shen does not explicitly train the logistic regression model based on ground-truth classifications for particular genomic coordinates. However, Wang discloses training classification models by measuring an offset between the candidate classification and the ground truth classification, and modifying parameters of the model. At 203 para.4. Wang teaches that this optimization process of minimizing the loss function helps remedy issues related to object localization in regression classification models. At 203 paras.3-4. A person having ordinary skill in the art would be motivated to combine the teachings by training the logistic regression classification model of Shen with the ground truth training technique taught by Wang. One of ordinary skill in the art would reasonably expect success because using ground truth classifications to ultimately minimize the loss function helps remedy the object localization problem encountered by classification models. Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention is likely to be obvious. See KSR International Co. v. Teleflex Inc., 550 U.S. 398, 415-421, USPQ2d 1385, 1395 – 97 (2007) (see MPEP § 2143, G).
Moreover, Shen does not explicitly recite at least one processor or a non-transitory computer readable medium comprising instructions. However, Shen’s method is implemented via a software package, meaning the method is implemented via a computer system. At 274 col.1 para.1. Inherent in a computer system are the components essential to the functionality of a computer, which includes at least one processor to receive information and a non-transitory computer readable medium to store and execute the instructions. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to explicitly incorporate at least one processor and a non-transitory computer readable medium. 
Regarding claim 2, Shen teaches that the classification values estimate the prediction accuracy of a given substitution for each candidate loci. At 274 col.2 para.4; 279 col.1 para.2 (the system of claim 1, wherein the confidence classifications indicate a degree to which nucleobases can be accurately determined at the particular genomic coordinates). 
Regarding claim 3, Shen discloses generating sample sequences using the 454 platform. At 274 col.2 para.4. While Shen does not explicitly teach using a single sequencing pipeline comprising a nucleic-acid-sequence-extraction method, a sequencing device, and a sequence-analysis software, Patrick teaches that the 454 sequencing technology comprises a method to extract nucleic-acid sequences, a sequencing instrument, and a computer to analyze the sequence. At 192 col.1 para.2. 
Regarding claim 4, Shen teaches using a Bayesian method to determine a SNP probability from the set of classification values for the candidate SNPs, at 274 col.2 para.2, by determining the SNP probability for a particular SNP substitution site, at 275 col.1 paras.4-5 (the system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to determine a confidence classification from the set of confidence classifications by determining the confidence classification for a genomic coordinate). SNPs are the most common type of genetic modification, as evidenced by Ling Shen. § Abstract (for a genomic coordinate comprising a genetic modification or an epigenetic modification). 
Regarding claim 5, Shen discloses determining nucleobase call-quality metrics with the 454 base-caller, at 278 col.1 para.4, which quantifies the probability that the nucleobase call for the sample sequence at the candidate SNP site is an overcall, Brockman, at 764 col.1 para.5 (the system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to determine the sequencing metrics by determining one or more of: call-data-quality metrics for quantifying quality of the nucleobase calls for the sample nucleic-acid sequences at the genomic coordinates of the example nucleic-acid sequence). 
Regarding claim 6, Shen discloses determining nucleobase call-quality metrics with the 454 base-caller. At 278 col.1 para.4 (the system of claim 5, further comprising instructions that, when executed by the at least one processor, cause the system to: determine the call-data-quality metrics by determining one or more of nucleobase-call- quality metrics, callability metrics, or somatic-quality metrics for the sample nucleic-acid sequences). 
Regarding claim 7, Shen teaches using a Bayesian method to determine a SNP probability from the set of classification values for the candidate SNPs, at 274 col.2 para.2, by determining whether a particular loci has high confidence, intermediate confidence, or low confidence, at 277 col.2 para.2 (the system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to determine a confidence classification from the set of confidence classifications by determining at least one of a high-confidence classification, an intermediate-confidence classification, or a low-confidence classification for a genomic coordinate). 
Regarding claim 8, Shen teaches using a Bayesian method to determine a SNP probability from the set of classification values for the candidate SNPs, at 274 col.2 para.2, by determining the SNP probability score within a range of 0 to 1 for a particular SNP substitution site, at 276 col.2 para.7; Table 3 (the system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to determine a confidence classification from the set of confidence classifications by determining a confidence score within a range of confidence scores). Shen teaches that the SNP probability estimates the accuracy of a specific allele being at a particular SNP loci. At 274 col.2 para.4; 279 col.2 paras.2-3 (determining a confidence score within a range of confidence scores indicating a degree to which nucleobases can be accurately determined at a genomic coordinate). 
Regarding claim 9, Shen discloses training a logistic regression model to determine classification value. At 274 col.2 para.4 (the system of claim 1, further comprising instructions that, when executed by the at least one processor, cause the system to train the genome-location-classification model to determine the confidence classifications). A logistic regression model is a statistical machine-learning model, as evidenced by Ansari. At paras.1-2 (by training a statistical machine-learning model or a neural network to determine the confidence classifications). 
Regarding claim 10, Shen discloses analyzing variables that potentially affect the probability of a substitution being a sequencing error including a neighboring quality standard (NQS) threshold test. At 275 col.1 para.2 no.3. NQS considers the quality score of a specific SNP as well as the quality of the 5-base flanking sequence on either side of the SNP. Id. While Shen does not explicitly determine a contextual nucleic-acid subsequence surrounding a variant-nucleobase call from the reference sequence, NQS inherently requires determination of a contextual subsequence from the reference sequence because the flanking nucleotides in the sample sequence must be compared to the flanking nucleotides in the reference sequence to analyze whether the bases match. See Brockman, at 765 col.2 para.5 – 766 col.1 para.2. Shen discloses training the logistic regression model to determine the base-call error probability for a potential SNP in a sample sequence based on variables including call-quality score (sequencing metrics), NQS, NQS threshold, the immediate flanking nucleotides, and true classifications for the specific SNP. At 279 col.1 paras.2-3 (train the genome-location-classification model to determine a confidence classification for a genomic coordinate of the variant-nucleobase call based on; the contextual nucleic-acid subsequence; a subset of sequencing metrics for a subset of genomic coordinates corresponding to the contextual nucleic-acid subsequence). While Shen does not explicitly train the logistic regression model based on a subset of ground-truth classifications for the subset of genomic coordinates corresponding to the contextual nucleic-acid subsequence, Wang discloses training classification models by measuring an offset between the candidate classification and the ground truth classification. At 203 para.4. As discussed above with respect to claim 1, a person having ordinary skill in the art would be motivated to train the logistic regression model of Shen using the ground truth training technique of Wang to help remedy the object localization problem encountered by classification models. 
Regarding claim 15, Shen discloses using NQS for a candidate SNP site, at 275 col.1 para.2 no.3, which inherently requires determination of a contextual subsequence from the reference sequence because the flanking nucleotides in the sample sequence must be compared to the flanking nucleotides in the reference sequence to analyze whether the bases match. See Brockman, at 765 col.2 para.5 – 766 col.1 para.2 (determining, from an example nucleic-acid sequence, a contextual nucleic-acid subsequence surrounding a variant-nucleobase call in a sample nucleic-acid sequence at a genomic coordinate from genomic coordinates of an example nucleic-acid sequence). Shen discloses training the logistic regression model to determine the base-call error probability for a potential SNP in a sample sequence based on numerous variables, including NQS, NQS threshold and the immediate flanking nucleotides. At 279 col.1 paras.2-3 (training a genome-location-classification model to determine confidence classifications for the genomic coordinate based on the contextual nucleic-acid subsequence ). Shen uses the logistic regression model to determine a classification value for each SNP in a set of candidate SNP sites based on call-quality metrics and the neighboring sequence context. At 274 col.2 para.4; 275 col.1 para.4 (determining, utilizing the genome-location-classification model, a confidence classification for the genomic coordinate based on the contextual nucleic-acid subsequence). Shen establishes a list of candidate SNP sites, at 274 col.2 para.2, and uses the set of SNP probabilities to determine the SNP probability for a particular locus using Atlas-SNP2 software, at 275 col.1 para.4, which requires the generation of a BAM file containing the set of SNP probabilities, see Atlas2Team, § For 454/Illumina data; § For SOLiD data; and Evani, para.2 (generating at least one digital file comprising the confidence classification for the genomic coordinate of the variant-nucleobase call). 
While Shen does not explicitly disclose training the logistic regression model based on a ground truth classification for the genomic coordinates, Wang discloses training classification models by measuring an offset between the candidate classification and the ground truth classification. At 203 para.4. As discussed above with respect to claim 1, a person having ordinary skill in the art would be motivated to train the logistic regression model of Shen using the ground truth training technique of Wang to help remedy the object localization problem encountered by classification models. 
Regarding claim 16, Shen discloses determining the classification value for a SNP, at 274 col.2 para.4, which is a single nucleotide variant, as evidenced by National Cancer Institute, para.1 (the method of claim 15, wherein determining the confidence classification comprises determining the confidence classification for a single nucleotide variant, a nucleobase insertion, a nucleobase deletion, a part of a structural variation, or a part of a copy number variation at a genomic coordinate). 
Regarding claim 17, Shen teaches using the Atlas-SNP2 software to determine a SNP probability from the set of classification values for the candidate SNPs, at 274 col.2 para.2, by determining the SNP probability score within a range of 0 to 1 for a particular SNP substitution site, at 276 col.2 para.7; Table 3 (the method of claim 15, wherein determining the confidence classification comprises determining a confidence score within a range of confidence scores). Shen teaches that the SNP probability estimates the accuracy of a specific allele being at a particular SNP loci. At 274 col.2 para.4; 279 col.2 paras.2-3 (indicating a degree to which nucleobases can be accurately determined at a genomic coordinate). 
Regarding claim 18, Shen discloses training a logistic regression model to determine classification values. At 274 col.2 para.4 (the method of claim 15, wherein training the genome-location-classification model to determine the confidence classifications comprises training a logistic regression model, a random forest classifier, or a convolutional neural network to determine the confidence classifications). 
Regarding claim 19, Shen discloses training a logistic regression model to determine classification values for candidate SNP sites. At 274 col.2 para.4. Shen teaches that candidate SNP sites are confirmed as true by analyzing the coverage of variant reads. At 275 col.1 para.4; Figure 2 caption (a replicate concordance of nucleobase calls at the genomic coordinate). Shen discloses training the model based on the classification values of the SNP sites, at 279 col.1 para.2, but Shen fails to explicitly disclose training the model by: comparing, for the genomic coordinate, a projected confidence classification to a ground- truth classification; determining a loss from the comparison; and adjusting a parameter of the genome-location-classification model based on the determined loss. 
However, Wang discloses training classification models by measuring an offset between the candidate classification and the ground truth classification, and modifying parameters of the model based on the offset. At 203 para.4. Wang teaches that this optimization process of minimizing the loss function helps remedy issues related to object localization in regression classification models. At 203 paras.3-4. A person having ordinary skill in the art would be motivated to combine the teachings by training the logistic regression classification model of Shen with the loss function technique taught by Wang. One of ordinary skill in the art would reasonably expect success because using ground truth classifications to ultimately minimize the loss function helps remedy the object localization problem encountered by classification models. Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention is likely to be obvious. See KSR International Co. v. Teleflex Inc., 550 U.S. 398, 415-421, USPQ2d 1385, 1395 – 97 (2007) (see MPEP § 2143, G).
Regarding claim 20, Shen teaches using a sequences from a divided reference genome to determine SNPs in a sample sequence. At 274 col.1 para.2 (the method of claim 15, wherein the example nucleic-acid sequence comprises a reference genome or a nucleic-acid sequence of an ancestral haplotype). 

Claims 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Shen (20(2) Genome Res. 273 (February 2010)) in view of Atlas2Team ((@atlasmember), Atlas-SNP (last modified 12 March 2013)), and Evani ((@uevani), Atlas2 Suite (last modified 19 June 2013))
Regarding claim 11, Shen discloses identifying candidate SNP sites in a sample sequence. At 278 col.1 para.4 (detect a variant-nucleobase call at a genomic coordinate within a sample nucleic-acid sequence). Shen uses a logistic regression model to determine the classification values for the potential SNPs in a given sample sequence. At 274 col.2 para.4. Shen determines a SNP confidence score from the set of classification values for the candidate SNPs produced by the logistic regression model. At 274 col.2 para.2 (identify a confidence classification for the genomic coordinate according to a genome-location-classification model). While Shen does not explicitly determine a SNP probability from a digital file, Atlas2Team teaches that the Atlas-SNP2 software being used by Shen requires a BAM file as input. § For 454/Illumina data; § For SOLiD data; see also Evani, para.2. Shen discloses that the confidence score is output for users to tune the parameters of the method. At 274 col.1 para.1 (generate, for display within a graphical user interface, an indicator of the confidence classification for the genomic coordinate of the variant-nucleobase call). 
Regarding claim 12, Shen teaches using the Atlas-SNP2 software to determine a SNP probability from the set of classification values for the candidate SNPs. At 274 col.2 para.2 (the non-transitory computer-readable medium of claim 11, further storing instructions that, when executed by the at least one processor, cause the computing device to identify, from the digital file, the confidence classification for the genomic coordinate). Shen teaches that the SNP probability estimates the accuracy of a specific allele being at a particular SNP loci. At 274 col.2 para.4; 279 col.2 paras.2-3 (by identifying the confidence classification indicating a degree to which nucleobases can be accurately determined at the genomic coordinate). 
Regarding claim 13, Shen teaches using the Atlas-SNP2 software to determine a SNP probability from the set of classification values for the candidate SNP sites, at 274 col.2 para.2, by determining a SNP probability for a substitution site from the classification value for that site, at 276 col.2 para.7 (the non-transitory computer-readable medium of claim 11, further storing instructions that, when executed by the at least one processor, cause the computing device to identify, from the digital file, the confidence classification by identifying the confidence classification from an annotation or a score for the genomic coordinate within the digital file).
Regarding claim 14, Shen teaches using the Atlas-SNP2 software to determine a SNP probability from the set of classification values for the candidate SNPs, at 274 col.2 para.2, by determining whether a particular loci has high confidence, intermediate confidence, or low confidence, at 277 col.2 para.2 (the non-transitory computer-readable medium of claim 11, further storing instructions that, when executed by the at least one processor, cause the computing device to identify, from the digital file, the confidence classification by identifying at least one of a high- confidence classification, an intermediate-confidence classification, or a low-confidence classification for the genomic coordinate). 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Emily A Darrigrand whose telephone number is (571) 272-1098. The examiner can normally be reached Monday-Thursday 7:30AM-4:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Larry Riggs, can be reached at (571) 270-3062. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/E.A.D./Examiner, Art Unit 1686                

/OLIVIA M. WISE/Supervisory Patent Examiner, Art Unit 1685
Read full office action
MACHINE-LEARNING MODEL FOR GENERATING CONFIDENCE CLASSIFICATIONS FOR GENOMIC COORDINATES

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

MACHINE-LEARNING MODEL FOR GENERATING CONFIDENCE CLASSIFICATIONS FOR GENOMIC COORDINATES

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email