Prosecution Insights
Last updated: April 19, 2026
Application No. 17/088,734

APPARATUSES, SYSTEMS, AND METHODS FOR EXTRACTING MEANING FROM DNA SEQUENCE DATA USING NATURAL LANGUAGE PROCESSING (NLP)

Non-Final OA §101§103
Filed
Nov 04, 2020
Examiner
DUONG, HIEN LUONGVAN
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
BASF Corporation
OA Round
5 (Non-Final)
75%
Grant Probability
Favorable
5-6
OA Rounds
3y 1m
To Grant
98%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allow Rate
480 granted / 643 resolved
+19.7% vs TC avg
Strong +23% interview lift
Without
With
+22.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
42 currently pending
Career history
685
Total Applications
across all art units

Statute-Specific Performance

§101
11.0%
-29.0% vs TC avg
§103
51.5%
+11.5% vs TC avg
§102
18.5%
-21.5% vs TC avg
§112
6.6%
-33.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 643 resolved cases

Office Action

§101 §103
DETAILED ACTION Remarks This office action is issued in response to communication filed on 12/16/2025. Claims 1-2,4-10,12-16 and 18-23 are pending in this Office Action. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant's arguments filed on 12/16/2025 with respect to rejection of claims under 35 USC 103 have been considered but are moot in view of the new ground of rejection. Applicant’s amendment to claim 1 to recite hardware processor fails to overcome the objection because the apparatus of claim 1 does not recite any “one or more hardware processors” as part of the comprising language. The examiner suggests amending the claim 1 to recite the following language to overcome the objection. “1. (Currently Amended) An apparatus for identifying genetic elements, the apparatus comprising: one or more hardware processors; a deoxyribonucleic acid (DNA) sequence data receiving module stored on a memory that, when executed by the one or more hardware processors of the apparatus, causes the one or more hardware processors to receive DNA sequence data; a first machine learning model module stored on the memory that, when executed by the one or more hardware processors, causes the one or more hardware processors to generate first machine learning model output data…….” Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-2 and 4-7 are rejected under 35 U.S.C. 101 because claimed invention directed toward non-statutory subject matter. Claim 1 recites an apparatus comprising plurality of modules but fails to recite a hardware element . During examination, the claims must be interpreted as broadly as their terms reasonably allow. The broadest reasonable interpretation of a claim drawn to apparatus that fails to recite a required hardware element covers software per se. Software is not a “process”, a “machine”, a manufacture”, or a “composition of matter” as defined in 35 U.S.C. 101. Accordingly, the recited “system” is not a “process”, a “machine”, a “manufacture”, or a “composition of matter" as defined in 35 U.S.C 101 and claim 1 fails to recite statutory subject matter as defined in 35 U.S.C 101. Claims 2 and 4-7 are also rejected under 35 USC 101 for failing to recite a hardware element. The examiner notes that even though claim 1 recites “when executed by one or more hardware processors of the apparatus”, “the one or more processors” are not part of the claim 1 because “one or more processors have not been previously recited in the claim 1. The examiner suggests amending claim 1 to positively recites a one or more processors (i.e. the apparatus comprising: one or more processors ) to overcome the objection. Appropriate correction is required. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 3. Claims 1-2, 7-10,13-16 , 20 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Devesa.(US Patent Application Publication 2020/0357482 A1, hereinafter “Devesa”) and further in view of Patrick Ng, “dna2vec:Consistent Vector representations of variable-length k-mers”, arXiv:1701.06279v1[q-bio.QM] 23 Jan 2017, hereinafter “Ng” . Regarding independent claims 1, 8 and 14, Devesa teaches an apparatus , method and non-transitory computer readable medium , for identifying genetic elements, comprising: a deoxyribonucleic acid (DNA) sequence data receiving module stored on a memory that, when executed by one or more hardware processors pf the apparatus , causes the one or more hardware processors to receive DNA sequence data (Devesa par [0036] teaches extracting text data from the medical data received from the corpus of medical database) , a first machine learning model module stored on the memory that, when executed by the one or more hardware processors, causes the one or more hardware processors to generate first machine learning model output data based on the DNA sequence data,( Devesa par [0036] teaches the extraction of the text data is performed by using one or more machine learning algorithms) wherein the first machine learning model output data includes word embeddings defining vector representations of k-mers, and wherein the first machine learning model module includes a first machine learning model selected from: a natural language processing (NLP) model, a Bayesian mixture model, a hidden Markov model, a dynamic Bayesian network model, a deep multilayer perceptron (MLP) model, a convolutional neural network (CNN) model, a recursive neural network model, recurrent neural network model, a long short-term memory (LSTM) model, a sequence-to-sequence model, or a shallow neural network model; ( Devesa par [0037] teaches word embeddings of words created using one more methods including RNN, CNN, word embedding layer, word2vec) a second machine learning model module stored on the memory that, when executed by the one or more hardware processors, causes the one or more hardware processors to generate second machine learning model output data based on the DNA sequence data, wherein the second machine learning model module includes a second machine learning model selected from: a natural language processing (NLP) model, a Bayesian mixture model, a hidden Markov model, a dynamic Bayesian network model, a deep multilayer perceptron (MLP) model, a convolutional neural network (CNN) model, a recursive neural network model, recurrent neural network model, a long short-term memory (LSTM) model, a sequence-to-sequence model, or a shallow neural network model ( Devesa par [0036] teaches the extraction of the text data is performed by using one or more machine learning algorithms including prediction algorithm, deep learning algorithms , natural language processing and the like); and an optimization model module stored on the memory that, when executed by the one or more processors causes the one or more hardware processors to identify at least one genetic element by inputting the first machine learning model output data and the second machine learning model output data into a natural language processing (NLP) model.( Devesa par [0039] teaches the training dataset is applied for training the machine to identify genetic related to DNA variances and DNA sequence form the text data. Training is applied using one or more algorithm. Devesa par [0036] teaches machine learning algorithms include prediction algorithms, deep learning algorithms, natural processing algorithms and the like) Devesa fails to expressly teach wherein the first machine learning model output data includes word embeddings defining vector representations of k-mers However, Ng teaches wherein the first machine learning model output data includes word embeddings defining vector representations of k-mers (Ng section 1.2 “we present a novel method to compute distributed representations of variable-length k-mers”) Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add the teachings of Devesa to use word embeddings defining vector representations of K-mers as taught by Ng to achieve the claimed invention. One would have been motivated to make such combination because it is simple to understand and compute.(Ng section 1) Regarding claims 2, 9 and 15, Devesa and Ng teach the apparatus , method and non-transitory computer readable medium as in claims 1,8 and 14, wherein the first machine learning model module includes a first DNA sequence data preprocessing module, wherein the second machine learning model includes a second DNA sequence data preprocessing module, and wherein the second DNA sequence data preprocessing module is different than the first DNA sequence data preprocessing module.(Devesa par [0037] teaches word embeddings of words created using one more methods including RNN, CNN, word embedding layer, word2vec ) Regarding claims 7, 13 and 20, Devesa and Ng teach the apparatus , method and non-transitory computer readable medium as in claims 1, 8 and 14 wherein the second machine learning model module includes a feed-forward neural network model with word embeddings.( Devesa par [0037] teaches word embeddings of words created using one more methods including RNN, CNN, word embedding layer, word2vec) Regarding claims 10 and 16, Devesa and Ng teach the method and non-transitory computer readable medium as in claims 9 and 15 wherein the first DNA sequence data preprocessing module generates at least one of: word embeddings, feature-based representations, or contextual word embeddings. (Devesa par [0037] teaches the data curation system creates word embedding of words present in the text data in a low dimensional vector space) Regarding claim 23 , Devesa and Ng teach the non-transitory computer-readable medium as in claim 14, wherein the second machine learning model output data includes word embeddings defining vector representations of k-mers. (Ng section 1.2 “we present a novel method to compute distributed representations of variable-length k-mers”) 4. Claims 4 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Devesa , Ng and further in view of Mishra et al (US Patent Application 20200302011). Regarding claims 4 and 18, Devesa and Ng teach the apparatus and non-transitory computer readable medium as in claims 1 and 14 , but they don’t expressly teach wherein at least one of the first machine learning model module or the second machine learning model module includes a natural language processing module that computes attention weights. However, Mishra et al teaches wherein at least one of the first machine learning model module or the second machine learning model module includes a natural language processing module that computes attention weights (i.e. use pre-trained recurrent neural network (RNN) classifiers or other techniques for natural language processing and text analysis. Particular regions (e.g., words, phrases and sentences) that influence the score can be identified using attention weights from the classifier (Mishra par. 26)). Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add the teachings of Mishra et al to use a natural language processing module that computes attention weights, because doing so would provide a method to identify portions of the sequence that include these regions can be considered influential portions (Mishra par. 26). 4. Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Devesa and Ng and further in view of Yelensky et al (US Patent Application 20190065675). Regarding claim 5, Devesa and Ng teach the apparatus as in claim 1, but they don’t expressly teach wherein at least one of the first machine learning model module or the second machine learning model module includes gradient-based methods to analyze an importance of whole k-mers. However, Yelensky et al teaches wherein at least one of the first machine learning model module or the second machine learning model module includes gradient-based methods to analyze an importance of whole k-mers (i.e. a presentation model can comprise a statistical regression or a machine learning (e.g., deep learning) model trained on a set of reference data (par. 94). The training data set may include data associated with genomes associated with the samples. The training peptide sequences may be of lengths within a range of k-mers where k is between 8-15 (Yelensky par. 125-126). Various parameters are determined through gradient-based numerical optimization algorithms, such as batch gradient algorithms, stochastic gradient algorithms, and the like (Yelensky par. 397)). Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Devesa and Ng with the teachings of Yelensky to achieve the claimed invention. One would have been motivated to make such combination to provide an optimize an approach for identifying and selecting neoantigens for personalized cancer vaccines (Yelensky par. 10). 5. Claims 6,12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Devesa and Ng and further in view of Hannum.(US Patent Application Publication 2015/0100244 A1, hereinafter “Hannum”) Regarding claims 6, 12 and 19, Devesa and Ng teach the apparatus , method and non-transitory computer readable medium as in claims 1,8 and 14, but they don’t expressly teach wherein at least one of the first machine learning model module or the second machine learning model module includes a logistic regression model. However, Hannum teaches wherein at least one of the first machine learning model module or the second machine learning model module includes a logistic regression model (i.e. data sets, relationships and/or profiles can be analyzed and/or compared by utilizing multiple statistical methods, e.g. logistic regression (Hannum par. 189). A data set can be analyzed by utilizing multiple statistical algorithms, e.g. logistic regression (par. 231)). Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Devesa and Ng with the teachings of Hannum to achieve the claimed invention. One would have been motivated to make such combination to reduce the complexity and/or dimensionality of the data set (Hannum par. 231). 6. Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Devesa and Ng and further in view of Rees. (US Patent Application Publication 2003/0154078 A1, hereinafter “Rees”) Regarding claims 21, Devesa and Ng teach the non-transitory computer-readable medium as in claim 14 but fail to teach wherein the identification is based on a score of the first machine learning model output data and a score of the second machine learning model output data. However, Rees teaches wherein the identification is based on a score of the first machine learning model output data and a score of the second machine learning model output data. (Rees par [0027] teaches the best match is identified and a confidence score is then calculated which is then utilized to determine whether the matching of receive speech input to words is sufficiently accurate to act on the receive in out, ask for confirmation, ignore the received input or to request reentry of the data) Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Devesa and Ng with the teachings of Rees to achieve the claimed invention. One would have been motivated to make such combination to improve the accuracy of the identification . 7. Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Devesa and Ng and further in view of Lupien et al.(US Patent Application Publication 2022/0284983 A1, hereinafter “Lupien”) Regarding claim 22 , Devesa and Ng teach non-transitory computer-readable medium as in claim 14 but fail to teach wherein the at least one genetic element comprises a cis-regulatory element. However, Lupien teaches wherein the at least one genetic element comprises a cis-regulatory element. (Lupien’s abstract teaches method for identifying Cis-regulatory elements) Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Devesa and Ng with the teachings of Lupien to achieve the claimed invention. One would have been motivated to make such combination to systematically identify clusters of genomic regions.(Lupien par [0006]) Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to HIEN DUONG whose telephone number is (571)270-7335. The examiner can normally be reached Monday-Friday 8:00AM-5:00PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at 571-270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /HIEN L DUONG/Primary Examiner, Art Unit 2147
Read full office action

Prosecution Timeline

Nov 04, 2020
Application Filed
Feb 03, 2024
Non-Final Rejection — §101, §103
May 07, 2024
Response Filed
Jul 26, 2024
Final Rejection — §101, §103
Oct 30, 2024
Request for Continued Examination
Nov 04, 2024
Response after Non-Final Action
Feb 17, 2025
Non-Final Rejection — §101, §103
Jun 11, 2025
Response Filed
Sep 11, 2025
Final Rejection — §101, §103
Dec 16, 2025
Request for Continued Examination
Dec 31, 2025
Response after Non-Final Action
Feb 20, 2026
Non-Final Rejection — §101, §103
Mar 07, 2026
Interview Requested
Mar 13, 2026
Examiner Interview Summary
Mar 13, 2026
Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597925
SUPERCONDUCTING CURRENT CONTROL SYSTEM
2y 5m to grant Granted Apr 07, 2026
Patent 12566940
METHOD AND APPARATUS FOR QUANTIZING PARAMETERS OF NEURAL NETWORK
2y 5m to grant Granted Mar 03, 2026
Patent 12566815
METHOD, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM FOR PERFORMING IDENTIFICATION BASED ON MULTI-MODAL DATA
2y 5m to grant Granted Mar 03, 2026
Patent 12554798
FINDING OUTLIERS IN SIMILAR TIME SERIES SAMPLES
2y 5m to grant Granted Feb 17, 2026
Patent 12547430
MODEL-BASED ELEMENT CONFIGURATION IN A USER INTERFACE
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

5-6
Expected OA Rounds
75%
Grant Probability
98%
With Interview (+22.8%)
3y 1m
Median Time to Grant
High
PTA Risk
Based on 643 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month