DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-6 are pending and under examination.
Applicant’s claim to priority to a US provisional is acknowledged. The effective filing date for the claims is 2/9/2021, accordingly.
This application has published as US PG Pub US 2022/0254450 A1.
Two IDS statements have been entered and considered.
The Declaration of Dr Chuang, filed 12/14/2021, filed under 35 USC 1.132 to establish the inventorship and authorship of “High-performance deep learning pipeline predicts individuals in mixtures of DNA using Sequencing data” has been entered and considered.
The drawings are objected to because Figs 3, 6 and 7 have type, numbers, wording, legends, shading or fonts which are fuzzy and difficult to read, or impossible to read. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
The disclosure is objected to because of the following informalities:
Portions of Table 4 are indiscernible, due to the small size of the typeface, or the line breaks, or copy/paste artifacts. Even enlarging the table does not resolve the nature of the information within.
Table 6 is indiscernible in its entirety due to the choice of font, faintness of the font, or copy/paste artifacts. Even enlarging the table does not resolve the nature of the information within.
See MPEP 608:
“The specification (including the abstract and claims), and any amendments for applications, except as provided for in 37 CFR 1.821 through 1.825, must have text written plainly and legibly either by a typewriter or machine printer in a nonscript type font (e.g., Arial, Times Roman, or Courier, preferably a font size of 12) lettering style having capital letters which should be at least 0.3175 cm. (0.125 inch) high, but may be no smaller than 0.21 cm. (0.08 inch) high (e.g., a font size of 6) in portrait orientation and presented in a form having sufficient clarity and contrast between the paper and the writing thereon to permit the direct reproduction of readily legible copies in any number by use of photographic, electrostatic, photo-offset, and microfilming processes and electronic capture by use of digital imaging and optical character recognition; and only a single column of text. See 37 CFR 1.52(a) and (b).”
Appropriate correction is required.
Claim Objections
Claims 1-6 are objected to because of the following informalities:
In all claims, the claim should begin with a single capital letter, and end with a single period. MPEP 608.01(m). The words at the beginning of each limitation should not be capitalized. For example, claim 1, step “(1) Providing next-generation…” should read “(1) providing next-generation…”
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-6 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
The claims are generally narrative and indefinite, failing to conform with current U.S. practice. They appear to be a literal translation into English from a foreign document and comprise multiple grammatical and idiomatic errors. See MPEP 608. Applicant is requested to proof the claims, to ensure proper use of verb tense, as well as the proper use of punctuation and pluralities. A claim begins with a single capital letter, and ends with a single period. Limitations should be separated by the semicolon “;” and not merely a comma. Proper verb tense should be used throughout, as well as proper pluralization. For example, the term “a sparse matrix” is singular, and “a plurality of sparse matrices” is the appropriate plural.
The metes and bounds of claim 1 are unclear with respect to the necessary and sufficient steps required to carry out the method, and achieve the desired result. In claim 1, the preamble states that the method is one of classifying individuals, in mixtures of DNA, however, no classification clearly takes place in claim 1, and no individuals are identified, nor is there a determination of a number of total individuals within the mixture. The final step is “inputting” data into a deep learning model, for the intended use of classifying individuals, however no positive active method steps require the model to make the identification and provide the output.
In claim 1 the metes and bounds of “from mixtures of DNA” are unclear. It is unclear if one mixture of multiple individuals is being processed, or whether multiple, separate, mixtures of DNA are being processed in a batch format, or whether multiple mixed samples of DNA have been pooled. Each requires separate information and possible separate procedures in the pre-processing, and in the data analysis and demultiplexing. While breadth is not the same as indefiniteness, one of skill in the art would not be apprised as to the particular processes required for analyzing one mixture, versus demultiplexing a pool of multiple mixtures, versus analyzing several individual mixtures in parallel. No other information for the raw sequence reads is clearly provided in claim 1.
Further in claim 1, the claim fails to particularly point out and distinctly claim how the “data processing procedure” processes the raw sequence reads from the mixtures of DNA. The “data processing procedure” has no positive active method steps, nor does the limitation address what the processing is required to do to the raw sequence data. The claim fails to particularly point out and distinctly claim the necessary and sufficient pre-processing steps that appropriately format the data for the steps that follow. The claim further fails to particularly point out and distinctly claim how the data is transformed into the plurality of sparce matrices, and on what basis the transformation occurs. The claim fails to point out what procedures are required, and fails to point out how many matrices are generated. One of skill would not be apprised of the particular sequence processing steps, and data transformation steps required to achieve data structures able to be analyzed in following steps. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
Further in claim 1, the claim fails to particularly point out and distinctly claim the particular type of deep learning model, and fails to point out how the deep learning model was trained, and fails to point out how the deep learning model acts upon the matrices to determine and classify individuals. Reciting a branch of machine learning (“trained deep learning model”) with no other characteristics fails to particularly point out and distinctly claim what Applicant regards as their invention. As pointed out above, there is no clear link between the data obtained, the data processed, the deep learning, and any individual information. Claim 1 fails to actually provide any classification steps, or final identification or classification of individuals. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
The metes and bounds of claim 2 are unclear.
Claim 2, step 1 does not read in proper grammatical English. “(1) [R]removing a content comprises adapters…” does not make sense, and similar grammatical errors perpetuate through the claim. The Examiner suggests the following types of changes for Claim 2, based only on correcting grammatical and punctuation errors. Indefiniteness still exists as addressed below. These are not the required amendments, merely suggestions.
“2. The method [according to] of claim 1, wherein the data processing procedure comprises [following steps:];
(1) removing [a content comprises] adapters from each of the raw sequence reads, [to] generat[e]ing first sequence reads;
(2) [P]performing [a] sliding window trimming on each of the first sequence reads, [to] generat[e]ing trimmed sequence reads, with lengths ranging from 70 to 200base pairs, [and at least 25 bases are as sliding sizes for each trimming] wherein the sliding window size is at least 25 base pairs;
(3) [P]performing quality score analysis of each of the trimmed sequence reads, wherein qualified trimmed reads have a phred33 score of more than or equal to 28, or a length of at least 100bp; [[an examination by using phred33 score to check quality of the trimmed sequence reads: and qualified trimmed sequence reads are determined when phred33 score of the trimmed sequence reads is equal to or more than 28, or all of the trimmed sequence reads having length of 100bp are determined to be qualified trimmed sequence reads]];
(4) [M]mapping the qualified trimmed sequence reads [on]to human reference genome GRCh38, [to] obtaining mapped sequence reads;
(5) [S]sorting and indexing the mapped sequence reads to construct BAM files;
(6) [Q]querying the mapped sequence reads from the BAM files [by] using the Pysam package;
(7) [P]performing reverse complementation to increase the number of [the] mapped sequence reads stored in the BAM files;
(X) [and then generate] combin[ed]ing paired forward and reverse sequence reads, generating combined forward and reverse sequence reads each with lengths ranging from 100 to 200bp;
(8) [E]encoding the combined forward and reverse sequence reads [with length ranging from 100 to 200bp] into integers by using an integer encoder; and
(9) [T]transforming the integers to a plurality of sparse matrices [by] using one-hot encoding [function], wherein [the] each sparse matrix is constructed from the encoded combined forward and reverse sequence reads [with length ranging from 100 to 200bp].”
The metes and bounds of claim 2 are unclear, with respect to step 2, as step 2 fails to point out and distinctly claim what parts of each sequence read should be trimmed, based on the sliding window. Merely sliding a 25bp window across a raw sequence read fails to particularly indicate what bases are to be trimmed, how many bases should be trimmed, or why any base should be trimmed. One of skill in the art would not be apprised of the particular reasons any particular base should be trimmed in any given sequence read, for the purpose of determining individuals in the mixture. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
The metes and bounds of claim 2 are unclear with respect to step 3. Step 3 is unclear whether the quality score analysis requires active steps of performing a phred33 analysis on each sequence read, or whether phred33 scores are provided with the raw sequence read data. Step three is unclear with respect to how the phred33 scores for each trimmed read are generated, or why the threshold is “28” dimensionless units. The positive active methods steps required to carry out step 3 are not set forth such that one of skill would be apprised as to what reads are to be included or qualified. Naming a software package such as Phred merely indicates the source of the program and is not a positive active recitation of the actual analysis or statistical processes performed within the package. Any software package has a multiplicity of features, actions and outputs from which one must select those appropriate to their own goals. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
The metes and bounds of claim 2 step 5 are unclear. It is unclear if the “sorting and indexing” is intended to reference prior art known steps of demultiplexing sequence read data, or whether some other sort and index process occur in this stage. The basis of the sort is not provided by the claim, nor is the basis of the index. How the data at hand (qualified, trimmed sequence reads) is acted upon to “generate BAM files” is not set forth by the claim, nor is it clear how Applicant intends this to occur.
The metes and bounds of claim 2 step 6 are unclear, with respect to how the “Pysam package” acts upon the data, to “query” the mapped reads. The claim fails to particularly point out and distinctly claim what procedures are performed by the Pysam package, on what part of the mapped reads. The claim fails to particularly point out and distinctly claim what the Query is intended to ask, and then answer. No particular basis for interrogation of the mapped reads is provided. It would appear that this is related to sequencing depth, but this is not an explicit limitation. Naming a software package such as Pysam merely indicates the source of the program and is not a positive active recitation of the actual analysis or statistical processes performed within the package. Any software package has a multiplicity of features, actions and outputs from which one must select those appropriate to their own goals. One of skill would not be apprised as to how to apply the Pysam package to achieve the “query” results required. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
The metes and bounds of claim 2 step 7 are unclear with respect to how the reverse complementation (RC) is performed, and combined. It is unclear if this takes each sequence read, and generates the reverse complement, followed by some analysis of overlapping reads, to stitch together longer sequence reads of up to about 200bp, or whether some other process is intended. The type of sequence reads acquired in claim 1, greatly affects this step. If the sequence reads were generated by paired-end sequencing, the paired-up sequence reads are identified by an overlap, then the RC of each is filled in for the pair, resulting in one longer read. If other types of sequencing data are generated (CHiP-seq, methylation-seq etc) other processes are required. One of skill would not be apprised as to how the RC should be performed, to generate the required long reads. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
The metes and bounds of claim 3 are unclear for similar reasons as those set forth for claim 2. Claim 3 fails to particularly point out and distinctly claim where in the method of claim 1, the quality check should be performed, whether the phred33 score is generated within the bounds of the claim, or is received with the raw sequence data, and fails to set forth the basis on which any particular base should be trimmed. One of skill in the art would not be apprised of the particular reasons any particular base should be trimmed in any given sequence read, for the purpose of determining individuals in the mixture. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
The metes and bounds of claim 4 are entirely unclear. Claim 4 is a run-on paragraph with no particular structure outlining the pieces of the trained deep learning model, and is a confusing mishmash of layers, connections and outputs without any link to the data provided in claim 1. No clear flow from beginning to end of the deep learning model is provided. The claim fails to particularly point out and distinctly claim how the structure acts on each of the generated sparse matrices to make the identification of any individual, and fails to set forth the classification of any particular individual in the mix. The examiner is unable to suggest any particular amendment.
The metes and bounds of claim 5 are unclear as the claim fails to particularly point out and distinctly claim how the model is to be validated. The claim appears to suggest the trained deep learning model should be validated prior to applying the information from the DNA mixture, but the claim does not explicitly recite this. The claim as written could also include validating any information generated about the test sample, however, the claim fails to provide the basis for validation, including any reference information, training data, or other validation requirements. While claims are read in light of the specification, limitations from the specification cannot be read into the claims.
The metes and bounds of claim 6 are unclear. It is entirely unclear how to determine whether a mixture is a forensic dataset or from whole exome sequencing. It is unclear whether this is intended to limit the sequence read data acquired in claim 1 step 1, or whether some other process is intended that is required for a “forensic” dataset, that is not required for a WES dataset, or any other dataset. It is entirely unclear how to carry out this limitation with the data at hand in claim 1.
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claim 6 is rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Claim 6 recites an intended use of the method, without setting forth any steps, or elements which would change the method of claim 1, therefor the claim is not further limiting of claim 1. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.
Claim Interpretation
The claims in this application are given their broadest reasonable interpretation (BRI) using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-6 is/are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea of mental steps, mathematic concepts, organizing human activity, or a natural law without significantly more.
Applicant is directed to MPEP 2106 and the Federal Register notice (FR89, no 137 (7/17/2024) p 58128-58138) for the most current and complete guidelines in the analysis of patent- eligible subject matter. The current MPEP is the primary source for the USPTO’s patent eligibility guidance.
With respect to step (1): YES. The claims are drawn to statutory categories: processes.
With respect to step (2A) (1): YES. The claims recite an abstract idea, law of nature and/or natural phenomenon. The claims recite an abstract idea of analyzing sequence read data following a pipeline of instructions, followed by a statistical learning analysis using a “deep learning model” to identify individuals in a sample with DNA from a mixture of individuals (See MPEP 2106.07(a)). The claims also embrace the natural law describing the naturally occurring correlations between naturally occurring sequence data of an individual, and characteristics of that individual- a genotype/ phenotype relationship. (MPEP 2106.04). The claims explicitly recite elements that, individually and in combination, constitute one or more judicial exceptions (JE).
Mathematic concepts, Mental Processes or Elements in Addition (EIA) in the claim(s) include:
1. A method for classifying individuals in mixtures of DNA, comprising:
(1) Providing next-generation sequencing (NGS) data which comprises raw sequence reads originated from mixtures of DNA;
(EIA- Data Gathering step, of receiving raw sequence read data files.)
(2) Performing a data processing procedure to generate a plurality of sparse matrix; and
(Mental process and mathematic concept. The mental processes are the unspecified “data processing” steps which are steps of observing the data, making a judgement as to what must be modified in the data, and mental processes of annotating, deleting or modifying data; and “generating sparce matrices” is a mathematic transformation.)
(3) Inputting the plurality of sparse matrix into a trained deep learning model installed on computers to classify individuals in the mixtures of DNA.
(Mathematic concept of applying data to a mathematic model, with an intended use of classification.)
2. The method according to claim 1, wherein the data processing procedure comprises following steps:
(1) removing a content comprises adapters from the raw sequence reads to generate first sequence reads;
(Mental Process of observing adaptor sequences in the sequence read, and a mental process of data editing by removal, masking or otherwise eliminating the adaptors from the read. Spec [0040] uses Trimmomatic, TruSeq2, LEADING and TRAILING, and Phred33 algorithmic processes. [0042] invokes CROP and HEADCROP and possibly multiple instances of trimming.)
(2) Performing a sliding window trimming on the first sequence reads to generate trimmed sequence reads with length ranging from 70 to 200bp, and at least 25 bases are as sliding sizes for each trimming;
(Mathematic process of a sliding window analysis, combined with a mental process of observing bases that should be trimmed, deleted, masked or otherwise annotated, and trimming them. Spec [0040] uses Trimmomatic, TruSeq2, LEADING and TRAILING, and Phred33 algorithmic processes. [0042] invokes CROP and HEADCROP and possibly multiple instances of trimming.)
(3) Performing an examination by using phred33 score to check quality of the trimmed sequence reads: and qualified trimmed sequence reads are determined when phred33 score of the trimmed sequence reads is equal to or more than 28, or all of the trimmed sequence reads having length of 100bp are determined to be qualified trimmed sequence reads;
(Mental process of observing quality scores for each trimmed sequence read, and a mathematic concept of the desired score being equal or greater than a listed value. [0040-0042])
(4) Mapping the qualified trimmed sequence reads onto human reference genome GRCh38 to obtain mapped sequence reads;
(Mental process, in a computing environment, of matching sequence read data to a reference file. Alternatively, a mathematic concept: Specification [0044] uses Kart version 2.5.6 algorithmic processes.)
(5) Sorting and indexing the mapped sequence reads to construct BAM files;
(Mental process of sorting and annotating each read in unspecified ways for creating a type of sequence file; Alternatively, a mathematic concept: using the algorithmic package SamTools 1.3.1.)
(6) Querying the mapped sequence reads from the BAM files by using Pysam package,
(Mental process of observing the mapped reads, and applying an undefined software package for an unknown purpose. Specification: [0044] “The mapped, sorted, and indexed BAM files are queried across all somatic
chromosomal regions using the Alignment File and Fetch functions in Pysam and itertools.”)
(7) Performing reverse complementation to increase number of the mapped sequence reads stored in the BAM files and then generate combined forward and reverse sequence reads with length ranging from 100 to 200bp;
(Mental step of observing the sequence reads, and then mentally determining the reverse compliment of the read, and then generating short forward and reverse pairs of a length. Specification [0044] states this is a mathematic concept carried out by the algorithms of Bio.seq tools.)
(8) Encoding the combined forward and reverse sequence reads with length ranging from 100 to 200bp into integers by using an integer encoder; and
(Mathematic concept of integer encoding. [0046] dimension reduction, mapping the data to a lower dimensional space, using SciKit-learn library algorithmic functions.)
(9) Transforming the integers to a plurality of sparse matrix by using one-hot encoding function, wherein the sparse matrix is constructed from the combined forward and reverse sequence reads with length ranging from 100 to 200bp.
(Mathematic concept of mathematic transformation of data files into matrices, using one-hot encoding. [0046] one-hot encoding algorithms provided by SciKit-learn library)
3. The method of claim 1, further comprises a step for checking quality of the raw sequence reads, and phred33 score is used for measure of the quality of the raw sequence reads, and the raw sequence reads are trimmed if the phred33 score is below 15.
(Mental process of observing quality scores, and annotating the reads by trimming if a score is below a given value. Spec [0040] uses Trimmomatic, TruSeq2, LEADING and TRAILING, and Phred33 algorithmic processes (i.e. mathematic concepts). [0042] invokes CROP and HEADCROP and possibly multiple instances of trimming.)
4. The method of claim 1, wherein the trained deep learning model is a one-dimensional deep convolutional neural network constructed from a first convolution layer, a first batch normalization layer, a second convolution layer, a second batch normalization layer, a first max pooling layer, a first concatenate layer, a second max pooling layer, a first flatten layer, a second concatenate layer, a third batch normalization layer, a first hidden layer, a fourth batch normalization layer and a second hidden layer, wherein the first convolution layer connects to the first batch normalization layer, the first batch normalization layer connects to the second convolution layer, the second convolution layer connects to the second batch normalization layer, the second batch normalization layer connects to the first max pooling layer, the first max pooling layer connects to the first concatenate layer, the first concatenate layer connects to the second max pooling layer, the second max pooling layer connects to the first flatten layer, the first flatten layer connects to the second concatenate layer, the second concatenate layer connects to the third batch normalization layer, the third batch normalization layer connects to the first hidden layer, the first hidden layer connects to the fourth batch normalization layer, the fourth batch normalization layer connects to the second hidden layer, and wherein the second hidden layer outputs classification of individuals in the mixtures of DNA.
(Mathematic concept describing the structure of the deep learning model layers and output. The actual steps performed are not set forth. [0048-0050] describes the CONV1D structure and Fig 3.)
5. The method of claim 1, further comprises a step for validating the trained deep learning model, and the trained deep learning model has accuracy equal to or more than about 90%.
(Mathematic concept of data validation [0052-0059].)
6. The method of claim 1, being to classify individuals in mixture of the DNAs from forensic dataset or whole exome sequencing dataset.
(Mental process, an intended use of the method for a particular dataset or data type.)
Natural law embraced by claim(s) 1-6:
The claims recite the naturally occurring correlations between naturally occurring variances in sequence read data among individuals, and characteristics of those individuals: a genotype/ phenotype relationship.
With respect to step 2A (2): NO. The claims were examined further to determine whether they integrated any JE into a practical application (MPEP 2106.04(d)). The claimed additional elements are analyzed alone, or in combination to determine if the JE is integrated into a practical application (MPEP 2106.05(a-c, e, f and h)).
Claim(s) 1 recite(s) the additional non-abstract element(s) of data gathering, or a description of the data gathered.
Data gathering steps are not an abstract idea, they are extra-solution activity, as they collect the data needed to carry out the JE. The data gathering does not impose any meaningful limitation on the JE, or how the JE is performed. The additional limitation (data gathering) must have more than a nominal or insignificant relationship to the identified judicial exception. (MPEP 2106.04/.05, citing Intellectual Ventures LLC v. Symantec Corp, McRO, TLI communications, OIP Techs. Inc. v. Amason.com Inc., Electric Power Group LLC v. Alstrom S.A.).
Claim(s) 1-6 imply the additional non-abstract element (EIA) of a general-purpose computer system or parts thereof.
The EIA do not provide any details of how specific structures of the computer elements are used to implement the JE. The claims require nothing more than a general-purpose computer to perform the functions that constitute the judicial exceptions. The computer elements of the claims do not provide improvements to the functioning of the computer itself (as in DDR Holdings, LLC v. Hotels.com LP); they do not provide improvements to any other technology or technical field (as in Diamond v. Diehr); nor do they utilize a particular machine (as in Eibel Process Co. v. Minn. & Ont. Paper Co.). Hence, these are mere instructions to apply the JE using a computer, and therefore the claim does not recite integrate that JE into a practical application.
Dependent claim(s) 2-6 recite(s) an abstract limitation to the JE reciting additional mathematic concepts, or mental processes. Additional abstract limitations cannot provide a practical application of the JE as they are a part of that JE.
In combination, the limitations of data gathering, for the purpose of carrying out the JE, using a general-purpose computer merely provide extra-solution activity, and fail to integrate the JE into a practical application.
With respect to step 2B: NO. The claims recite a JE, do not integrate that JE into a practical application, and thus are probed for a specific inventive concept. The judicial exception alone cannot provide that inventive concept or practical application (MPEP 2106.05). The additional elements were considered individually and in combination to determine if they provide significantly more than the judicial exception. (MPEP 2106.05.A i-vi).
With respect to claim(s) 1: The limitation(s) identified above as non-abstract elements (EIA) related to data gathering do not rise to the level of significantly more than the judicial exception.
Claim 1 requires receiving raw sequence read data from “mixtures of DNA:”
Curti (2020) receives raw sequence read data meeting the requirements in section 1.4, 1.5 and 1.6. Curti obtains the data for the purpose of using it to train and use deep learning models.
Jabeen (2018) receives raw sequence read data meeting the requirements, where the data is RNA-seq data, and the individuals in the mixture are microbes.
Wen (2021) provides raw sequence read data meeting the requirements, where the individuals are microbes within a sample, and addresses the use of sparse matrices and machine learning.
These elements meet the BRI of the identified data gathering limitations. As such, the prior art recognizes that this data gathering element is routine, well understood and conventional in the art (as in Alice Corp., CyberSource v. Retail Decisions, Parker v. Flook).
In the specification at [0003] it is disclosed that the steps identified as data gathering were routine, well understood and conventional. NGS data is noted as being widely performed on many types of samples.
Activities such as data gathering do not improve the functioning of a computer, or comprise an improvement to any other technical field. The limitations do not require or set forth a particular machine, they do not effect a transformation of matter, nor do they provide an unconventional step (citing McRO and Trading Technologies Int’l v. IBG). Data gathering steps constitute a general link to a technological environment. Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception are insufficient to provide significantly more (as discussed in Alice Corp.,).
With respect to claim(s) 1-6: the limitations implied above as non-abstract elements (EIA) related to general-purpose computer systems do not rise to the level of significantly more than the judicial exception.
Each of Curti, Jabeen and Wen disclose computer systems or computing elements which meet the BRI of the claimed computer system or computer system elements, comprising input, output/ display, a processor, and memory.
As such, the prior art recognizes that these computing elements are routine, well understood and conventional in the art.
These elements do not improve the functioning of the computer itself, or comprise an improvement to any other technical field (Trading Technologies Int’l v IBG, TLI Communications). They do not require or set forth a particular machine (Ultramercial v. Hulu, LLC., Alice Corp. Pty. Ltd v. CLS Bank Int’l), they do not effect a transformation of matter, nor do they provide an unconventional step. Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception are insufficient to provide significantly more (as discussed in Alice Corp., CyberSource v. Retail Decisions, Parker v. Flook, Versata Development Group v. SAP America).
Dependent claim(s) 2-6 each recite a limitation requiring additional mathematic concepts or mental processes. Additional abstract limitations cannot provide significantly more than the JE as they are a part of that JE (MPEP 2106.05).
In combination, the data gathering steps providing the information required to be acted upon by the JE, performed in a generic computer or generic computing environment fail to rise to the level of significantly more than that JE. The data gathering steps provide the data for the JE, which is carried out by the general-purpose computers. No non-routine step or element has clearly been identified.
The claims have all been examined to identify the presence of one or more judicial exceptions. Each additional limitation in the claims has been addressed, alone and in combination, to determine whether the additional limitations integrate the judicial exception into a practical application. Each additional limitation in the claims has been addressed, alone and in combination, to determine whether those additional limitations provide an inventive concept which provides significantly more than those exceptions. For these reasons, the claims, when the limitations are considered individually and as a whole, are rejected under 35 USC § 101 as being directed to non-statutory subject matter.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1, 3, 5, 6 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Curti (2020).
Curti, N. (2020) Implementation and optimization of algorithms in Biomedical Big Data Analytics. Thesis, Universita di Bologna. 171 pages. NOTE: page number refers to 1/171 and not the internal numbering of the paper.
Curti is directed to the generation of an algorithm to process, analyze and identify elements within sequence read datasets, such as those generated by NGS.
The sequence read data obtained by Curti in one example is from TCGA (section 1.4.1) including sequence read data of mRNA, miRNA microarray data, protein levels etc. The “synapse dataset” has four individual tumor types (individuals), which meet the requirements of claim 1, step 1.
The data of step 1 is then pre-processed by Curti in a pipeline, including adding random noise, removing null values, splitting datasets for training/ validation, selecting thresholds etc. Then the processed sequence reads are transformed into matrices of genes, indexed by tumor type (multidimensional signatures p28-29, and section 1.4.2), meeting claim 1, step 2.
The matrices of Curti are then used as input into trained deep learning algorithms in section 2. These models accept the matrices, and by mathematically adjusting the weights of various parameters of the model during training with known data and ground truth labels, the application of the test data can result in the identification of the type of cancer, the microbe, or the individual, depending on the experimental design. Section 2. Curti provides the structures, layers, connections, and functions utilized in Section 2. Section 3 further discusses the combination of types of biomedical data, and the use of deep learning, including gene data, disease symptom data, SNP data, metabolites, drugs etc. Fig 3.6 is a matrix representation of the CHIMeRA network. See also Appendix E, and particularly Appendix F for the bioinformatic pipeline, beginning at p157.
As such, claim 1 is anticipated.
With respect to claim 3, Appendix F of Curti provides analysis of quality scores.
With respect to claim 5, Curti provides validation steps, throughout.
With respect to claim 6, Appendix F specifies WES datasets.
Claim(s) 1, 3-6 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Jabeen, A. et al. (2018).
Jabeen, A. et al. (2018) Machine Learning-based state-of-the-art methods for the classification of RNA-seq data. IN: Dey et al. (eds.) Classification in BioApps, Lecture notes in Computational Vision and biomechanics 26, p133-174.
Jabeen is directed to the generation and analysis of RNA-seq data, which comprises sequence reads from the mRNA present in a sample. “Recent advances in the field of sequencing—that is, next generation sequencing (NGS) via ribonucleic acid sequencing (RNA-Seq)—has enabled biologists and biotechnologists to measure the expression levels of several transcripts simultaneously.” (p2) To identify individual characteristics from this data, whether it is the identity of a microbe, or a disease, or a human, Jabeen uses machine learning including deep learning methods. “Most recently, deep learning methods have been implemented for the classification of biological big data such as RNA-Seq data… The chapter also addresses the difficulties and challenges faced by currently available machine learning-based classification methods which can successfully lay the basis for developing classification method further in research work. The various machine learning approaches for RNA-Seq data classification are discussed, with their pros and cons.” (p3).
Jabeen details the generation of sequence read data from test samples beginning at page 3, section 2:
“RNA-Seq is found to be a powerful transcriptome profiling NGS technique which can provide a detailed view of RNA transcripts in test samples. RNA-Seq measures expression levels of several transcripts simultaneously [57]. The general methodology for RNA-Seq analysis concerning a disease involves expression analysis from raw sequence data of the disease after normalization, then constructing individual network modules for gene identification followed by examining their aggregate network properties [5]. This system biology approach can successfully identify multiple genes related to a disease. These identified genes can then serve as the target for further drug discovery processes. Figure 1 represents the systematic steps for the RNA-Seq analysis method [73].” (P3)
Fig 1 of Jabeen discloses the receipt of the raw data, at the top as “Image raw data” in the diamond, which is the image data of the sequence reads from the RNA-seq process. As one looks down the figure, the section outlined in a blue rectangle is data-pre-processing, followed by a deep learning process at the bottom right of the figure. Figure 2 is a cartoon representing the pipeline, or process for each dataset received.
The pre-processing steps are set forth in section 3, including trimming, quality analysis, read length, read depth are all assessed. (Table 1). Pre-processing is necessary to remove bases of low quality from the sequencing procedure, to address noise, imbalanced samples, and bias. Pre-processing reduces the efforts and costs of ML processing downstream. (p6). Selecting the appropriate features to train the model is essential in the creation of an accurate model. They can include expression levels of certain genes, differential expression based on different states, expression variability, differential distributions of gene expression all are differing ways to analyze the data, depending on the question at hand. Fig 5 illustrates some of the differences between these.
Once the features are selected, they are transformed into matrices- including sparse matrices, which are input into the machine learning algorithm, such as a deep learning model, as per Fig 6, and Section 4 of Jabeen. (p10) See also Table 2.
Section 7 of Jabeen is directly concerned with deep learning, and provides the structure and theory behind the creation of deep learning structures. Jabeen discloses generative ML, Discriminate ML, and Hybrid architectures such as DNN. (p23). Fig 13 of Jabeen illustrates the various layers, convolutions, hidden layers, connections, inputs and outputs, for four types of architectures. Jabeen notes DNN are particularly suited to sparse noisy data, like that provided in transcriptomics at page 25:
“The most popular approach for dealing with high-dimensional data such as RNA-Seq data involves DNNs. DNNs are able to handle large datasets with high-dimensionality, sparse, noisy data with nonlinear relationships such as in the case of transcriptomic and other -omics data in biology. DNNs have high generalization advantages. Once trained on a dataset, it can be applied to other large datasets. Therefore, it can interpret heterogeneous multiplatform data efficiently such as gene expression datasets, reducing issues such as dimension reduction and selectivity/invariance [49].”
Section 7 discusses the training and validation of these models, as well as information on data preparation, and feature selection. “A training dataset with more informative features usually results in a better performance. Therefore, effort should be spent on collecting, labeling, cleaning and normalizing data. Machine learning models need to be trained, selected and tested on independent data sets to avoid overfitting and to ensure that the model will generalize to unseen data.” (p26). Section 7.3 discusses the deep stacking network which meets the broadest reasonable interpretation of a “trained deep learning model” according to the claims.
Section 8 of Jabeen provides a set of known ML tools, algorithms, and classifiers. Section 9 sets forth performance evaluation, confusion matrices, and accuracy analysis. (p31)
As such, claim 1 is anticipated.
With respect to claim 3, Jabeen discloses quality score analysis.
With respect to claim 4, the structures of the deep learning algorithms of Jabeen meet the BRI of this claim.
With respect to claim 5, validation is discussed throughout.
With respect to claim 6, RNA-seq is a type of WES dataset.
Claim(s) 1, 5-6 is/are rejected under 35 U.S.C. 102a1 as being anticipated by Wen (2021).
Wen, Z. et al. (2021) A survey on predicting microbe-disease associations: biological data and computational methods. Briefings in Bioinformatics. Vol 22(3), p1-20.
Wen is directed to identifying the presence of individual microbes in samples with mixtures of microbes, using sequence read data, and machine learning. Figure 1 of Wen shows the 16s RNA-seq data, as well as other types of data, at the top. The heatmap in the middle identifies the presence or absence of differing types of bacteria. The bottom of Fig 1 incorporates other data such as metabolomic data of the bacteria, or human disease information.
The sequence read data (Table 2) is obtained, and encoded into sparse matrices as shown in Fig 2. The yellow boxes at the top level are gene information, and the orange boxes are microbe information. The generated matrixes can be used as input into deep learning models, as set forth in the “sparse learning method” at page 9, col 2. Other potential deep learning structures are discussed in depth in this section. Further applications in Deep Learning are discussed at page 14, col 1-2. With the processing of the matrices by the deep learning model, an individual or a type of bacteria can be classified. As such claim 1 is anticipated.
With respect to claim 5, validation is discussed throughout.
With respect to claim 6, RNA-seq, and 16S RNA fall within the required data.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 2 is/are rejected under 35 U.S.C. 103 as being unpatentable over Jabeen (2018) as applied to claims 1, 3-6 above, in view of Robinson et al. (eds) (2018) and Zhou (2018).
Jabeen, A. et al. (2018) Machine Learning-based state-of-the-art methods for the classification of RNA-seq data. IN: Dey et al. (eds.) Classification in BioApps, Lecture notes in Computational Vision and biomechanics 26, p133-174.
Robinson, P. et al. (2018) Computational exome and genome analysis. CRC Press, Taylor & Francis Group, Florida, USA. P1-156.
Zhou, Y et al. (2018) Bridging the gap between deep learning and spars