Last updated: April 19, 2026

Application No. 17/088,734

APPARATUSES, SYSTEMS, AND METHODS FOR EXTRACTING MEANING FROM DNA SEQUENCE DATA USING NATURAL LANGUAGE PROCESSING (NLP)

Non-Final OA §101§103

Filed

Nov 04, 2020

Examiner

DUONG, HIEN LUONGVAN

Art Unit

2147

Tech Center

2100 — Computer Architecture & Software

Assignee

BASF Corporation

OA Round

5 (Non-Final)

Interview Optional

— +22.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 643 resolved cases, 2023–2026

Examiner Intelligence

DUONG, HIEN LUONGVAN View full profile →

Grants 75% — above average

Career Allow Rate

480 granted / 643 resolved

+19.7% vs TC avg

Strong +23% interview lift

Without

With

+22.8%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

42 currently pending

Career history

685

Total Applications

across all art units

Statute-Specific Performance

§101

11.0%

-29.0% vs TC avg

§103

51.5%

+11.5% vs TC avg

§102

18.5%

-21.5% vs TC avg

§112

6.6%

-33.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 643 resolved cases

Office Action

§101 §103

DETAILED ACTION
Remarks
This office action is issued in response to communication filed on 12/16/2025.  Claims 1-2,4-10,12-16 and 18-23 are pending in this Office Action.  
 Notice of Pre-AIA  or AIA  Status
 	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
 	Applicant's arguments filed on 12/16/2025 with respect to rejection of claims under 35 USC 103 have been considered but are moot in view of the new ground of rejection. 
 	Applicant’s amendment to claim 1 to recite hardware processor fails to overcome the objection because the apparatus of claim 1 does not recite any “one or more hardware processors” as part of the comprising language. The examiner suggests amending the claim 1 to recite the following language to overcome the objection.
 	“1. (Currently Amended) An apparatus for identifying genetic elements, the apparatus comprising: one or more hardware processors;
 	 a deoxyribonucleic acid (DNA) sequence data receiving module stored on a memory that, when executed by the one or more hardware processors of the apparatus, causes the one or more hardware processors to receive DNA sequence data; a first machine learning model module stored on the memory that, when executed by the one or more hardware processors, causes the one or more hardware processors to generate first machine learning model output data…….”
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-2 and 4-7 are rejected under 35 U.S.C. 101 because claimed invention directed toward non-statutory subject matter.
 	Claim 1 recites an apparatus comprising plurality of modules but fails to recite a hardware element  . During examination, the claims must be interpreted as broadly as their terms reasonably allow. The broadest reasonable interpretation of a claim drawn to apparatus that fails to recite a required hardware element covers software per se. Software is not a “process”, a “machine”, a manufacture”, or a “composition of matter” as defined in 35 U.S.C. 101. 
 	Accordingly, the recited “system” is not a “process”, a “machine”, a “manufacture”, or a “composition of matter" as defined in 35 U.S.C 101 and claim 1 fails to recite statutory subject matter as defined in 35 U.S.C 101. Claims 2 and 4-7  are also rejected under 35 USC 101 for failing to recite a hardware element. The examiner notes that even though claim 1 recites “when executed by one or more hardware processors of the apparatus”, “the one or more processors” are not part of the claim 1 because “one or more processors have not been previously recited in the claim 1. The examiner suggests amending claim 1 to positively recites a  one or more processors (i.e. the apparatus comprising: one or more processors ) to overcome the objection. Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

 3. 	Claims 1-2, 7-10,13-16 , 20 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over  Devesa.(US Patent Application Publication 2020/0357482 A1, hereinafter “Devesa”)  and further in view of Patrick Ng, “dna2vec:Consistent Vector representations of variable-length k-mers”, arXiv:1701.06279v1[q-bio.QM] 23 Jan 2017, hereinafter “Ng” .
  	Regarding independent claims 1, 8 and 14, Devesa teaches an apparatus , method and non-transitory computer readable medium , for identifying genetic elements,  comprising: 
	a deoxyribonucleic acid (DNA) sequence data receiving module stored on a memory that, when executed by one or more hardware processors pf the apparatus , causes the one or more hardware processors to receive DNA sequence data (Devesa par [0036] teaches extracting text data from the medical data received from the corpus of medical database) , 
	a first machine learning model module stored on the memory that, when executed by the one or more hardware processors, causes the one or more hardware processors to generate first machine learning model output data based on the DNA sequence data,( Devesa par [0036] teaches the extraction of the text data is performed by using one or more machine learning algorithms)
 	wherein the first machine learning model output data includes word embeddings defining vector representations of k-mers, and wherein the first machine learning model module includes a first machine learning model selected from: a natural language processing (NLP) model, a Bayesian mixture model, a hidden Markov model, a dynamic Bayesian network model, a deep multilayer perceptron (MLP) model, a convolutional neural network (CNN) model, a recursive neural network model, recurrent neural network model, a long short-term memory (LSTM) model, a sequence-to-sequence model, or a shallow neural network model; ( Devesa par [0037] teaches word embeddings of words created using  one more methods including RNN, CNN, word embedding layer, word2vec)
	a second machine learning model module stored on the memory that, when executed by the one or more hardware processors, causes the one or more hardware processors to generate second machine learning model output data based on the DNA sequence data, wherein the second machine learning model module includes a second machine learning model selected from: a natural language processing (NLP) model, a Bayesian mixture model, a hidden Markov model, a dynamic Bayesian network model, a deep multilayer perceptron (MLP) model, a convolutional neural network (CNN) model, a recursive neural network model, recurrent neural network model, a long short-term memory (LSTM) model, a sequence-to-sequence model, or a shallow neural network model ( Devesa par [0036] teaches the extraction of the text data is performed by using one or more machine learning algorithms including prediction algorithm, deep learning algorithms , natural language processing and the like); and 
	an optimization model module stored on the memory that, when executed by the one or more processors causes the one or more hardware processors to identify at least one genetic element  by inputting the first machine learning model output data and the second machine learning model output data  into a natural language processing (NLP) model.( Devesa par [0039] teaches the training dataset is applied for training the machine to identify genetic related to DNA variances and DNA sequence form the text data. Training is applied using one or more algorithm. Devesa par [0036] teaches machine learning algorithms include prediction algorithms, deep learning algorithms, natural processing algorithms and the like)
 	Devesa fails to expressly teach wherein the first machine learning model output data includes word embeddings defining vector representations of k-mers 
 	However,  Ng teaches  wherein the first machine learning model output data includes word embeddings defining vector representations of k-mers (Ng section 1.2 “we present a novel method to compute distributed representations of variable-length k-mers”)
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add the teachings of Devesa to use word embeddings defining vector representations of K-mers  as taught by Ng to achieve the claimed invention. One would have been motivated to make such combination because it is simple to understand and compute.(Ng section 1) 

 	Regarding claims 2, 9 and 15, Devesa and Ng teach the apparatus , method and non-transitory computer readable medium as in claims 1,8 and 14, wherein the first machine learning model module includes a first DNA sequence data preprocessing module, wherein the second machine learning model includes a second DNA sequence data preprocessing module, and wherein the second DNA sequence data preprocessing module is different than the first DNA sequence data preprocessing module.(Devesa par [0037] teaches word embeddings of words created using  one more methods including RNN, CNN, word embedding layer, word2vec )

   	Regarding claims 7, 13 and 20, Devesa and Ng teach the apparatus , method and non-transitory computer readable medium as in claims 1, 8 and 14 wherein  the second machine learning model module includes a feed-forward neural network model with word embeddings.( Devesa par [0037] teaches word embeddings of words created using  one more methods including RNN, CNN, word embedding layer, word2vec)  

 	Regarding claims 10 and 16, Devesa and Ng teach the method and non-transitory computer readable medium as in claims 9 and 15 wherein the first DNA sequence data preprocessing module generates at least one of: word embeddings, feature-based representations, or contextual word embeddings. (Devesa par [0037] teaches the data curation system creates word embedding of words present in the text data in a low dimensional vector space)		
 	Regarding claim 23 , Devesa and Ng teach the non-transitory computer-readable medium as in claim 14, wherein the second machine learning model output data includes word embeddings defining vector representations of k-mers. (Ng section 1.2 “we present a novel method to compute distributed representations of variable-length k-mers”)
4. 	Claims 4 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Devesa , Ng and further in view of Mishra et al (US Patent Application 20200302011). 
 	Regarding claims 4 and 18, Devesa and Ng teach the apparatus and non-transitory computer readable medium as in claims 1 and 14 , but they don’t expressly teach wherein at least one of the first machine learning model module or the second machine learning model module includes a natural language processing module that computes attention weights.
However, Mishra et al teaches wherein at least one of the first machine learning model module or the second machine learning model module includes a natural language processing module that computes attention weights (i.e. use pre-trained recurrent neural network (RNN) classifiers or other techniques for natural language processing and text analysis. Particular regions (e.g., words, phrases and sentences) that influence the score can be identified using attention weights from the classifier (Mishra par. 26)).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add the teachings of Mishra et al to use a natural language processing module that computes attention weights, because doing so would provide a method to identify portions of the sequence that include these regions can be considered influential portions (Mishra par. 26).

4. 	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Devesa and Ng and further in view of Yelensky et al (US Patent Application 20190065675). 
 	Regarding claim 5, Devesa and Ng teach the apparatus as in claim 1, but they don’t expressly teach wherein at least one of the first machine learning model module or the second machine learning model module includes gradient-based methods to analyze an importance of whole k-mers.
However, Yelensky et al teaches wherein at least one of the first machine learning model module or the second machine learning model module includes gradient-based methods to analyze an importance of whole k-mers (i.e. a presentation model can comprise a statistical regression or a machine learning (e.g., deep learning) model trained on a set of reference data (par. 94). The training data set may include data associated with genomes associated with the samples. The training peptide sequences may be of lengths within a range of k-mers where k is between 8-15 (Yelensky par. 125-126). Various parameters are determined through gradient-based numerical optimization algorithms, such as batch gradient algorithms, stochastic gradient algorithms, and the like (Yelensky par. 397)).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Devesa and Ng with the teachings of Yelensky to achieve the claimed invention. One would have been motivated to make such combination to provide an optimize an approach for identifying and selecting neoantigens for personalized cancer vaccines (Yelensky par. 10).
5. 	Claims 6,12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Devesa and Ng and further in view of  Hannum.(US Patent Application Publication 2015/0100244 A1, hereinafter “Hannum”)
 	Regarding claims 6, 12 and 19, Devesa and Ng teach the apparatus , method and non-transitory computer readable medium as in claims 1,8 and 14, but they don’t expressly teach wherein at least one of the first machine learning model module or the second machine learning model module includes a logistic regression model.
However, Hannum teaches wherein at least one of the first machine learning model module or the second machine learning model module includes a logistic regression model (i.e. data sets, relationships and/or profiles can be analyzed and/or compared by utilizing multiple statistical methods, e.g. logistic regression (Hannum par. 189). A data set can be analyzed by utilizing multiple statistical algorithms, e.g. logistic regression (par. 231)).  
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Devesa and Ng  with the teachings of Hannum to achieve the claimed invention. One would have been motivated to make such combination to reduce the complexity and/or dimensionality of the data set (Hannum par. 231).
6. 	Claim  21 is rejected under 35 U.S.C. 103 as being unpatentable over Devesa and Ng and further in view of  Rees. (US Patent Application Publication 2003/0154078 A1, hereinafter “Rees”)
 	Regarding claims 21, Devesa and Ng teach the non-transitory computer-readable medium as in claim 14 but fail to teach wherein the identification is based on a score of the first machine learning model output data and a score of the second machine learning model output data.  
 	However, Rees teaches wherein the identification is based on a score of the first machine learning model output data and a score of the second machine learning model output data.  (Rees par [0027] teaches the best match is identified and a confidence score is then calculated which is then utilized to determine whether the matching of receive speech input to words is sufficiently accurate to act  on the receive in out, ask for confirmation, ignore the received input or to request reentry of the data)	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Devesa and Ng  with the teachings of Rees to achieve the claimed invention. One would have been motivated to make such combination to improve the accuracy of the identification . 
7. 	Claim  22 is rejected under 35 U.S.C. 103 as being unpatentable over Devesa and Ng and further in view of  Lupien  et al.(US Patent Application Publication 2022/0284983 A1, hereinafter “Lupien”)
 	Regarding claim 22 , Devesa and Ng teach non-transitory computer-readable medium as in claim 14 but fail to teach wherein the at least one genetic element comprises a cis-regulatory element.  
 	However, Lupien teaches wherein the at least one genetic element comprises a cis-regulatory element. (Lupien’s abstract teaches method for identifying Cis-regulatory elements)
 	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Devesa and Ng  with the teachings of Lupien to achieve the claimed invention. One would have been motivated to make such combination to  systematically identify clusters of genomic regions.(Lupien par [0006])
Conclusion
 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to HIEN DUONG whose telephone number is (571)270-7335. The examiner can normally be reached Monday-Friday 8:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at 571-270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HIEN L DUONG/Primary Examiner, Art Unit 2147

Read full office action

Prosecution Timeline

Nov 04, 2020

Application Filed

Feb 03, 2024

Non-Final Rejection — §101, §103

May 07, 2024

Response Filed

Jul 26, 2024

Final Rejection — §101, §103

Oct 30, 2024

Request for Continued Examination

Nov 04, 2024

Response after Non-Final Action

Feb 17, 2025

Non-Final Rejection — §101, §103

Jun 11, 2025

Response Filed

Sep 11, 2025

Final Rejection — §101, §103

Dec 16, 2025

Request for Continued Examination

Dec 31, 2025

Response after Non-Final Action

Feb 20, 2026

Non-Final Rejection — §101, §103

Mar 07, 2026

Interview Requested

Mar 13, 2026

Examiner Interview Summary

Mar 13, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

18/063,518

Patent 12597925

SUPERCONDUCTING CURRENT CONTROL SYSTEM

2y 5m to grant Granted Apr 07, 2026

17/192,048

Patent 12566940

METHOD AND APPARATUS FOR QUANTIZING PARAMETERS OF NEURAL NETWORK

2y 5m to grant Granted Mar 03, 2026

17/527,173

Patent 12566815

METHOD, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM FOR PERFORMING IDENTIFICATION BASED ON MULTI-MODAL DATA

2y 5m to grant Granted Mar 03, 2026

17/537,705

Patent 12554798

FINDING OUTLIERS IN SIMILAR TIME SERIES SAMPLES

2y 5m to grant Granted Feb 17, 2026

18/128,106

Patent 12547430

MODEL-BASED ELEMENT CONFIGURATION IN A USER INTERFACE

2y 5m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

5-6

Expected OA Rounds

75%

Grant Probability

98%

With Interview (+22.8%)

3y 1m

Median Time to Grant

High

PTA Risk

Based on 643 resolved cases by this examiner. Grant probability derived from career allow rate.

APPARATUSES, SYSTEMS, AND METHODS FOR EXTRACTING MEANING FROM DNA SEQUENCE DATA USING NATURAL LANGUAGE PROCESSING (NLP)

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email