Last updated: April 19, 2026
Application No. 18/655,208
EFFICIENT CLUSTERING OF NOISY POLYNUCLEOTIDE SEQUENCE READS

Non-Final OA §101§103§DP
Filed
May 03, 2024
Examiner
LIU, GUOZHEN
Art Unit
1686
Tech Center
1600 — Biotechnology & Organic Chemistry
Assignee
Microsoft Technology Licensing, LLC
OA Round
1 (Non-Final)
This examiner grants 50% of cases after interview

— +25.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 95 resolved cases, 2023–2026
Examiner Intelligence

LIU, GUOZHEN View full profile →
Grants 50% of resolved cases
Career Allow Rate
47 granted / 95 resolved
-10.5% vs TC avg
Strong +25% interview lift
Without
With
+25.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
39 currently pending
Career history
134
Total Applications
across all art units
Statute-Specific Performance

§101
37.1%
-2.9% vs TC avg
§103
25.2%
-14.8% vs TC avg
§102
7.3%
-32.7% vs TC avg
§112
19.8%
-20.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 95 resolved cases
Office Action

§101 §103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-20 are pending and are examined on the merits.

Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365(c) is acknowledged.  Priority of US application 62/402,873 filed 09/30/2016 is acknowledged.

Claim Objections
Claim 14 is objected to because of the following informalities: claim 14 recites “wherein a hash for a one of the plurality of DNA reads”, which should read as “wherein a hash for [a] one of the plurality of DNA reads”.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Step 1: Process, Machine, Manufacture or Composition
	Claims 1-10 recite a series functional steps, so a process.
	Claims 11-20 recite a series functional steps, so another process.
Step 2A Prong One: Identification of Abstract Ideas
The claim(s) recite:
1. Computing a signature for a first read from the plurality of reads, the signature being a bit string generated in part by a set of k-grams within the first read (claim 1).
This step recites generating a signature (based on the k-grams) out of the first read and convert the signature into a bit string. The whole process can be achieved in human mind with perhaps the aid of a pen and paper. The  step hence equates to an abstract idea of mental processes.
2. Generating a hash for the first read, the hash based at least in part on a sequence of the first read (claim 1). 
Under a broadest reasonable interpretation (BRI), this step is interpreted as using the first read as hash key and assign a value to the hash key.  The whole process can be achieved in human mind with perhaps the aid of a pen and paper. The  step hence equates to an abstract idea of mental processes.
3. Grouping the first read with a second read having a same hash into a same bucket (claim 1). 
Under a BRI, this step is interpreted as moving two reads into a same folder (here the bucket) or label the two reads with the same bucket. The whole process can be achieved in human mind with perhaps the aid of a pen and paper. The  step hence equates to an abstract idea of mental processes.
4. Computing an edit distance between the first read and the second read (claim 1).  
This step recites computing an edit distance, which explicitly recites a mathematical calculation. Therefore the step is directed to an abstract idea of mathematical concepts.
5. Determining that the edit distance is below a threshold value (claim 1). 
This step recites comparing a value to the threshold, which can be achieved in human mind. Therefore the step is directed to an abstract idea of mental processes.
6. Merging a first cluster containing the first read with a second cluster containing the second read into a third cluster (claim 1).
This step recites merging two clusters into a third cluster, which is interpreted as a data manipulation that can be achieved in human mind. Therefore the step is directed to an abstract idea of mental processes.
7. Separating a plurality of DNA reads into a plurality of buckets (claim 11). 
Under a BRI, this step is interpreted as moving reads folders (here the buckets) or label the reads with different bucket labels. The whole process can be achieved in human mind. The  step hence equates to an abstract idea of mental processes.
8. Clustering DNA reads in one of the plurality of buckets into clusters based at least in part on edit distance between respective pairs of the DNA reads (claim 11).
Under a BRI, this step is interpreted as moving reads in a same folder (here the bucket) or of a same bucket label into different clusters, after comparing the edit distance between respective pairs of the DNA reads. The whole process can be achieved in human mind with perhaps the aid of a pen and paper. The  step hence equates to an abstract idea of mental processes.

Step 2A Prong Two: Consideration of Practical Application
The claims result in a process of assigning reads into clusters, which is directed to data analysis and manipulation. The claims do not recite any additional elements that integrate the abstract idea/judicial exception into a practical application.
This judicial exception is not integrated into a practical application because the claims do not meet any of the following criteria:
An additional element reflects an improvement in the functioning of a computer, or an improvement to other technology or technical field; 
an additional element that applies or uses a judicial exception to effect a particular treatment or prophylaxis for a disease or medical condition; 
an additional element implements a judicial exception with, or uses a judicial exception in conjunction with, a particular machine or manufacture that is integral to the claim;
an additional element effects a transformation or reduction of a particular article to a different state or thing; and
an additional element applies or uses the judicial exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than 
a drafting effort designed to monopolize the exception.
 
Step 2B: Consideration of Additional Elements and Significantly More
The claimed method also recites "additional elements" that are not limitations drawn to an abstract idea. The recited additional elements are drawn to: 
“receiving a plurality of reads from a polynucleotide sequencer” (claim 1); and 
“generating a single consensus output sequence from each of the cluster” (claim 20).
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited are directed to insignificant extra-solution activity of inputting/outputting data.
The claims do not include additional elements that are sufficient to amount of significantly more than the judicial exception because it is routine and conventional to perform the acts of acquiring a plurality of reads from a polynucleotide sequencer and outputting consensus sequence from each of the cluster to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea recited in the instantly presented claims into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself.  Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. 

   
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3-11, 13-15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Berlin: (“Assembling large genomes with single-molecule sequencing and locality-sensitive hashing”. Nat Biotechnol 33, 623–630, 2015. Newly cited),  further in view of Kim et al. ("Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings." 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 2016. Newly cited) and Rasheed et al. ("Mc-minh: Metagenome clustering using minwise based hashing." Proceedings of the 2013 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2013. Newly cited).

Claim 1 is interpreted as a method for clustering sequence reads. Regarding claim 1, Berlin provides (page 625, col 1, penultimate para) “To demonstrate the efficient and complete assembly of large genomes, we integrated MHAP into the Celera Assembler34 PBcR10,13 (PacBio corrected reads) hierarchical assembly pipeline and assembled previously released whole-genome SMRT data from Escherichia coli K12, S. cerevisiae W303, D. melanogaster ISO1, Arabidopsis thaliana Ler-035 and the complete hydatidiform mole CHM117 (85×, 117×, 90×, 144× and 54× coverage, respectively; Supplementary Table 4)”, which suggests receiving a plurality of reads from a polynucleotide sequencer.
Berlin provides (2nd page in “Online Methods,” col 2, 4th para) “Since MHAP was designed for complex genomes with large N and many repeats, k1 is set to 16 to decrease the number of false positives in each hash table bucket. k = 16 is the largest value that can be effectively hashed into a 32-bit fingerprint while providing good sensitivity (Supplementary Table 2). Much larger k-mer values would significantly degrade sensitivity, while slightly larger values would double memory usage (requiring a 64-bit fingerprint), without providing significant improvements. Two sketch sizes were chosen (fast = 512 or sensitive = 1,256) based on Supplementary Table 2, to achieve sensitivities required for assembly”, which teaches computing a bit-string signature out of the k-grams from the reads, because the “fingerprints” reads on signatures and k-mers reads on k-grams.
Berlin provides (page 624, Fig. 1 legend) “(a) To create a MinHash sketch of a DNA sequence S, we first decomposed the sequence into its constituent k-mers. In the example shown, k = 3, resulting in 12 k-mers each for S1 and S2. (b) All k-mers are then converted to integer fingerprints by multiple hash functions. The number of hash functions determines the resulting sketch size H. Here, where H = 4, four independent hash sets are generated for each sequence (Γ1…H)”, which teaches generating a hash for the first read, and the hash is based on the first read sequence.
Berlin provides (1st page in “Online Methods,” col 1, 3rd para) “In the case of hash table indexing, all k-mers are first hashed into an integer fingerprint for easier indexing and faster comparison. We define this set of K = L − k + 1 integers as Γ(S), where |Γ(S)| = K is the set’s cardinality. Here we assume that the integer size is large enough that a chance of a random collision is negligible. The integers from all strings are hashed into a table such that each hash table bucket will contain a list of all string indices that contain that specific integer fingerprint”, which teaches grouping reads having a same hash value into a same bucket.
Berlin does not teach editing distance (let alone the threshold on edit distance), or merging two clusters.
Kim provides (page 170, col 1 last para through col 2, 1st para) “Edit distance: given two sequences of s1 and s2 whose lengths are not necessarily the same, the edit distance between them is the minimum number of edit operations to transform s1 to s2, where an edit operation is substitution, insertion, or deletion of a single character”, which teaches edit distance between two read sequences. 
Kim provides (page 176, col 2, 3rd para) “We mapped reads against the reference sequence using edit distance constraints. For the down-stream applications using all-mappers, the edit distance threshold is usually set to 4% or 5% of the read length [14]. We also set edit distance thresholds to around 5% errors of the read length in our experiments”, which teaches a threshold for edit distance in two sequence similarity search.
Kim does not teach merging two clusters.
Rasheed teaches acquiring short reads from a sequencer (page 678, Fig. 1).
Rasheed provides (page 680, col 1, penultimate para) “For metagenomic sequence clustering we are interested in finding those clusters which contain sequences that have similarity greater than some pre-defined threshold theta. Therefore, in MC-MinH algorithm, we formulate the probability of collision such that if Pr[minHash (h(Is1 )) = minHash (h(Is2 ))] > theta, then s1 and s2 belong to the same cluster, otherwise a new cluster is created”, which teaches merging two clusters by the similarity of two sequence reads in the two clusters into a third cluster.

Regarding claim 3, Berlin provides (2nd page in “Online Methods,” col 1, last para through col 2 , 2nd para):
“The first filter generates H fingerprints for all k1-mers of a read using the MurmurHash3 hash function implementation in the Guava library (for larger k values a rolling hash can be used). Next, we use the MurmurHash3 fingerprint as a seed into a computationally efficient XORShift random number generator to generate H random fingerprints. Highly repetitive k-mers, comprising over 0.001% of the total bases, can be computed beforehand and ignored. The fingerprint needed for the second filter is computed using the same MurmurHash3, and the fingerprints are sorted and stored in a data structure associated with each S, along with the H min-mers”, and 
“The hth min-mer for each read is hashed into the hth hash table, which is shared between all reads. The hash tables are maintained in memory, unless there are too many reads. In that case, the computation is treated as a parallel all-to-all computation, where only a subset of reads are hashed and compared to all reads streamed directly from the drive. This subset-to-all comparison is repeated for all nonintersecting subsets, either in parallel or serially, depending on the computation resources available,”
which teaches dividing the first read into two or more sub-reads, and finding all k-grams for each of the sub-reads.

Berlin provides (2nd page in “Online Methods,” col 2, 3rd-4th paras):
“For each read we find all the reads that are similar by looking up the H min-mers in the hash tables. The hth min-mer is looked for in the hth hash table, and the counts are aggregated into one list. Any read that has at least x min-mers in common with the lookup read is propagated to the second stage filter. The second-stage filter is implemented as described above, but to account for the high Indel rate exhibited by SMRT sequencing, the overlap region is extended by 30% on either side before counting all k-mer matches in the overlap. After counting, the bounds of the overlap are computed using the UMVU estimators, as before”, and 
“Since MHAP was designed for complex genomes with large N and many repeats, k1 is set to 16 to decrease the number of false positives in each hash table bucket. k = 16 is the largest value that can be effectively hashed into a 32-bit fingerprint while providing good sensitivity (Supplementary Table 2). Much larger k-mer values would significantly degrade sensitivity, while slightly larger values would double memory usage (requiring a 64-bit fingerprint), without providing significant improvements. Two sketch sizes were chosen (fast = 512 or sensitive = 1,256) based on Supplementary Table 2, to achieve sensitivities required for assembly”, which teaches encoding the k-grams as bit strings and computing a bit-string signature out of the k-grams from the reads, because the “fingerprints” reads on signatures and k-mers reads on k-grams.

Regarding claim 4, Berlin provides Fig. 1 (page 624), which teaches:
Generating a string of random numbers (Fig. 1b: all k-mers are converted to integer fingerprints by multiple hash functions); 
Assigning a different random number from the string of random numbers to individual bits in at least a portion of the signature (Fig. 1b: the number of hash functions determines the resulting sketch size H. Here, where H = 4, four independent hash sets are generated for each sequence (Γ1…H). In MHAP, after the initial hash (Γ1), subsequent fingerprints are generated using an XORShift pseudo-random number generator (Γ2…H). The k-mer generating the minimum value for each hash is referred to as the min-mer for that hash); and
Setting the hash as a subset of the string of random numbers that are assigned to the individual bits (Fig. 1c: the sketch of a sequence is composed of the ordered set of its H min-mer fingerprints, which is much smaller than the set of all k-mers).

Regarding claim 5, Berlin provides (2nd page in “Online Methods,” col 2, 4th para) “since MHAP was designed for complex genomes with large N and many repeats, k1 is set to 16 to decrease the number of false positives in each hash table bucket. k = 16 is the largest value that can be effectively hashed into a 32-bit fingerprint while providing good sensitivity (Supplementary Table 2). Much larger k-mer values would significantly degrade sensitivity, while slightly larger values would double memory usage (requiring a 64-bit fingerprint), without providing significant improvements. Two sketch sizes were chosen (fast = 512 or sensitive = 1,256) based on Supplementary Table 2, to achieve sensitivities required for assembly”, which teaches selecting the individual bits based on bit value.

Regarding claim 6, Berlin provides Fig. 1 (page 624), which teaches generating the hash comprises: generating a string of random nucleotides (Fig. 1a and 1b); identifying an occurrence (here by four hash functions) of the string of random nucleotides in the first read (Fig. 1b); and setting the hash as a sequence of nucleotides adjacent to the occurrence of the string of random nucleotides in the first read (Fig. 1c: “The sketch of a sequence is composed of the ordered set of its H min-mer fingerprints, which is much smaller than the set of all k-mers. In this example, the sketches of S1 and S2 share the same minimum fingerprints (underlined) for Γ1 and Γ2”).

Regarding claim 7, neither Berlin nor Rasheed teaches editing distance. Kim provides (page 170, col 1 last para through col 2, 1st para) “edit distance: given two sequences of s1 and s2 whose lengths are not necessarily the same, the edit distance between them is the minimum number of edit operations to transform s1 to s2, where an edit operation is substitution, insertion, or deletion of a single character”, which teaches the edit distance comprises counting a minimum number of insertions, deletions, and substitutions that change the first read into the second read. 

Regarding claim 8, neither Berlin nor Rasheed teaches editing distance. Kim provides (page 175, col 2, 3rd para) “Lemma 2: Given a read r and a candidate position l of a reference sequence Γ, suppose we use a full-verification window of Γ[l − k .. l + |r| + k − 1] for an edit distance threshold k. Consider two positions l1 and l2 in Γ such that
each of them has only one matched signature. If |l1 −l2| ≤ k, it does not lose any true mapping position to make either l1 or l2 as a candidate position”, which teaches determining that the edit distance is below a threshold value (here k) comprises determining that a signature distance between the signature for the first read and a signature for the second read (here the reference sequence Γ) is below a signature distance threshold.

Regarding claim 9, Berlin provides (page 624, col 1, 1st para) “for each hash function, only the minimum valued fingerprint, or min-mer, is retained. The collection of min-mers for a sequence makes the sketch (Fig. 1 and Online Methods). This locality-sensitive hashing allows the Jaccard similarity of two k-mer sets to be estimated by simply computing the Hamming distance between their sketches. The resulting estimate is strongly correlated with the number of shared k-mers between two sequences (Supplementary Fig. 1)”, which teaches the signature distance is calculated by the Hamming distance

Regarding claim 10, neither Berlin nor Kim teaches merging two clusters.
Rasheed provides (page 680, col 1, penultimate para) “For metagenomic sequence clustering we are interested in finding those clusters which contain sequences that have similarity greater than some pre-defined threshold theta. Therefore, in MC-MinH algorithm, we formulate the probability of collision such that if Pr[minHash (h(Is1 )) = minHash (h(Is2 ))] > theta, then s1 and s2 belong to the same cluster, otherwise a new cluster is created”, which teaches merging two clusters by the similarity of two sequence reads in the two clusters into a same cluster. Under a BRI, here s1 and s2 can belongs to any clusters at the beginning as long as the two clusters are different but in the same bucket..

Regarding claim 11, neither Berlin nor Kim teaches bucketing and clustering sequence reads.  
	Rasheed provides (page 677, col 2, 3rd para) “Unsupervised grouping of sequences that belong to the same species is referred to as the “binning" problem. Successful binning (or grouping) of sequence reads assists in the metagenome assembly problem (See Figure 1), and also allows computation of different species diversity metrics. These approaches also reduce the complexity within computational work-flows by determining cluster representatives that can be analyzed, instead of each individual sequence within a sample”, which teaches separating sequence reads into buckets, because “binning” (or “grouping”), is to divide into buckets..
Rasheed provides (page 677, col 2, 4th para) “Our approach is inspired from our previous work that used locality-sensitive hashing (LSH) to cluster 16S metagenomic sequences [26, 27]. MC-MinH uses min-wise hashing [2] along with a greedy clustering algorithm to group 16S as well as whole metagenomic sequences. This is achieved by representing unequal length sequences using contiguous subsequences and then approximating the computation of pairwise similarity (Jaccard similarity) using independent min-wise hashing”, which teaches clustering DNA sequence reads in a bucket (metagenomes are divided into buckets in the previous step).
However, Rasheed does not teach edit distance. Kim provides (page 170, col 1 last para through col 2, 1st para) “Edit distance: given two sequences of s1 and s2 whose lengths are not necessarily the same, the edit distance between them is the minimum number of edit operations to transform s1 to s2, where an edit operation is substitution, insertion, or deletion of a single character”, which teaches edit distance between two read sequences. 

Regarding claim 13, Berlin provides (1st page in “Online Methods,” col 1, 3rd para) “In the case of hash table indexing, all k-mers are first hashed into an integer fingerprint for easier indexing and faster comparison. We define this set of K = L − k + 1 integers as Γ(S), where |Γ(S)| = K is the set’s cardinality. Here we assume that the integer size is large enough that a chance of a random collision is negligible. The integers from all strings are hashed into a table such that each hash table bucket will contain a list of all string indices that contain that specific integer fingerprint”, which teaches grouping reads into buckets by hash values.

Regarding claim 14, Berlin provides Fig. 1 (page 624), which teaches:
Generating a string of random numbers (Fig. 1b: all k-mers are converted to integer fingerprints by multiple hash functions); 
Assigning a different random number from the string of random numbers to individual bits in at least a portion of the signature (Fig. 1b: the number of hash functions determines the resulting sketch size H. Here, where H = 4, four independent hash sets are generated for each sequence (Γ1…H). In MHAP, after the initial hash (Γ1), subsequent fingerprints are generated using an XORShift pseudo-random number generator (Γ2…H). The k-mer generating the minimum value for each hash is referred to as the min-mer for that hash); and
Setting the hash as a subset of the string of random numbers that are assigned to the individual bits (Fig. 1c: the sketch of a sequence is composed of the ordered set of its H min-mer fingerprints, which is much smaller than the set of all k-mers. “a subset of the string of random numbers” is a binary signature of the DNA read).

Regarding claim 15, neither Berlin nor Rasheed teaches editing distance. Kim provides (page 170, col 1 last para through col 2, 1st para) “edit distance: given two sequences of s1 and s2 whose lengths are not necessarily the same, the edit distance between them is the minimum number of edit operations to transform s1 to s2, where an edit operation is substitution, insertion, or deletion of a single character”, which teaches the edit distance comprises counting a minimum number of insertions, deletions, and substitutions that change the first read into the second read. 

Regarding claim 20, Berlin provides (last page in “Online Methods,” col 1, 1st para) “pipeline supports two consensus modules: a slower but more accurate method called PBDAGCon and a faster but less robust method called  FalconSense. PBDAGCon, described previously, uses a directed acyclic graph (DAG) to construct a partial order alignment66. Finding an optimized path through the DAG generates a robust consensus. Alternatively, the FalconSense algorithm accelerates consensus generation by aligning reads to a template sequence that is the target of correction. Individual matches and indels are then tagged and sorted to determine a consensus sequence with high support”, which suggests generating consensus output sequences. 

It would have been prima facie obvious to combine Berlin’s pipeline that generates k-grams, bit-string signatures (Berlin: 2nd page in “Online Methods,” col 2, 4th para), hashes (Berlin: page 624, Fig. 1 legend) and grouping reads having a same hash value into a same bucket (Berlin: 1st page in “Online Methods,” col 1, 3rd para), with Kim’s method of edit distance measurement (Kim: page 170, col 1 last para through col 2, 1st para) and a threshold (page 176, col 2, 3rd para) for sequence similarity mapping, Because Kim’s edit distance and threshold allows for quantifying sequence similarities.
One would reasonably expect success as Berlin’s pipeline and Kim’s method are about sequence mapping and comparison and they do not interfere with each other. This would be a classic example of:
Combining prior art elements according to known methods to yield predictable results (MPEP §2141.III.(A));
 
 It would have been prima facie obvious to combine Rasheed’s method of merging two clusters by the similarity of two sequence reads in the two clusters into a third cluster (Rasheed: page 680, col 1, penultimate para), with the combined Berlin and Kim pipeline that generates k-grams, bit-string signatures, hashes, and groups reads having a same hash value into a same bucket. Because Rasheed’s method allows growing longer/bigger clusters which is important for sequence assembly and consensus building.
One would reasonably expect success as the combined Berlin and Kim pipeline and Rasheed’s method are about sequence mapping and assembling and they do not interfere with each other. This would be a classic example of:
Combining prior art elements according to known methods to yield predictable results (MPEP §2141.III.(A));

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Berlin, Kim and Rasheed, as applied to claims 1, 3-11, 13-15 and 20  above, and further in view of White et al. ("Moleculo long-read sequencing facilitates assembly and genomic binning from complex soil metagenomes." Msystems 1.3 (2016): 10-1128. Newly cited).

Regarding claim 2, none of Berlin, Kim and Rasheed teach a data set over one billion reads. White provides (page 2, 2nd para) “as a result of the JGI initiative, the two largest soil metagenomes published to date are from Iowa native prairie soil, containing 3.3 billion reads, or 257 Gbp of raw data, and from the adjacent cultivated soil (“continuous corn”), containing 1.8 billion reads or 141 Gbp (4). However, only nine sequences of > 10 kbp could be assembled from the continuous corn, and none was >10 kbp from the native Iowa prairie soil assembly. Approximately 80% of the sequencing data could not be assembled, and less than 11% of the reads mapped to either the assembly from the continuous corn or from the native prairie soil”. Which teaches receiving more than a billion reads representing more than a million different DNA strands, because both 3.3 billion X 80% reads or 1.8 billion X 80% reads generate over 1 million non-overlapping reads. The non-overlapping reads on the different DNA strands in the claim limitation.  

It would have been prima facie obvious to replace the data used by the combined Berlin, Kim and Rasheed pipeline that clusters sequence reads using k-grams, bit-string signatures, hashes, and groups reads having a same hash value into a same bucket, and merges clusters automatically, with White’s large amount of sequence reads. Because  the combined Berlin, Kim and Rasheed pipeline needs a large sequence reads data set to prove its robustness.
One would reasonably expect success as the combined Berlin, Kim and Rasheed pipeline has demonstrated capability of handing sequence reads data (Berlin: page 625, col 1, penultimate para; Rasheed: page 678, Fig. 1).
White’s data might be bigger than the existing data handed by the combined Berlin, Kim and Rasheed pipeline but White’s data would work with the existing  combined Berlin, Kim and Rasheed pipeline. Because the next generation sequencing technologies generate reads in similar format. One would reasonably expect success as this would be a classic example of:
Simple substitution of one known element for another to obtain predictable results. (MPEP §2141.III.(B)).

Claims 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Berlin, Kim and Rasheed, as applied to claims 1, 3-11, 13-15 and 20  above, and further in view of Bar-Yossef et al. (“System And Method For Detecting Matches Of Small Edit Distance”, US 20070085716 A1, Date Published 2007-04-19. Newly cited).

Regarding claims 16-18, none of Berlin, Kim and Rasheed teach approximating edit distance using Hamming distance. Bar-Yossef provides (emphasis added):
[0029] The method may further comprise creating substrings from each of the character strings; identifying anchors in a particular character string; identifying a start position of the substrings of the particular character string 
according to the anchors; identifying a set of substrings according to the start position; encoding the set of substrings to produce the representative sketch; and using a Hamming distance between encodings of the two 
selected character strings to approximate the edit distance between the two selected character strings. 
[0032] The embodiments of the invention provide a framework design for efficient methodologies for the k vs. l gap version. of the edit distance problem: given two n-bit input strings with the promise that the edit distance is either at most k or more than l, decide which of the two cases holds. Such methodologies immediately yield approximation methodologies that are as efficient, with the approximation factor directly correlated with the gap between k and l.
[0033] A sketching methodology for edit distance comprises two compression procedures and a reconstruction procedure, which operate in concert as follows. The compression procedures produce a fingerprint (sketch) from each of the input strings, and the reconstruction procedure uses solely the sketches to approximate the edit distance between the two strings. The 
key feature is that the sketch of each string is constructed without knowledge of the other string. The sketches are supposed to retain the minimum amount of information about the strings that is required to subsequently 
approximate the edit distance. The procedures are allowed to share random coins (e.g., they have access to a string of bits that are chosen at random in advance), and the main measure of complexity is the size of the sketches produced. In actual applications it is desirable that the procedures be efficient.

Bar-Yossef’s above disclosure teaches (claim 16) the edit distance is approximated by a Hamming distance between a binary signature of the two selected character strings. The character strings are encoded into binary signatures.

Bar-Yossef’s above disclosure suggests (claim 17) that when the Hamming distance is less than a first threshold (here “at most k”), placing the first DNA read and the second DNA read a same cluster, as taught by Rasheed (page 680, col 1, penultimate para: “For metagenomic sequence clustering we are interested in finding those clusters which contain sequences that have similarity greater than some pre-defined threshold theta. Therefore, in MC-MinH algorithm, we formulate the probability of collision such that if Pr[minHash (h(Is1 )) = minHash (h(Is2 ))] > theta, then s1 and s2 belong to the same cluster, otherwise a new cluster is created”).
.
Bar-Yossef’s above disclosure suggests (claim 18) that when the Hamming distance is greater than a second threshold (here “or more than l”), placing the first DNA read and the second DNA read in different clusters, as taught by Rasheed (page 680, col 1, penultimate para: “For metagenomic sequence clustering we are interested in finding those clusters which contain sequences that have similarity greater than some pre-defined threshold theta. Therefore, in MC-MinH algorithm, we formulate the probability of collision such that if Pr[minHash (h(Is1 )) = minHash (h(Is2 ))] > theta, then s1 and s2 belong to the same cluster, otherwise a new cluster is created”).

 It would have been prima facie obvious to replace the edit distance calculation used by the combined Berlin, Kim and Rasheed pipeline of using k-grams, bit-string signatures, hashes, and groups reads having a same hash value into a same bucket, and merges clusters automatically, with Bar-Yossef’s method of approximating edit distance through Hamming distance. Because for the identical length bit-strings, the Hamming distance costs less computing power and computing time than the edit distance does (Bar-Yossef: paragraphs [0007-0008]) .
One would reasonably expect success as the combined Berlin, Kim and Rasheed pipeline and  Bar-Yossef have demonstrated capability of measuring sequence similarity (DNA sequence is a special case of string sequence), and this would be a classic example of:
Simple substitution of one known element for another to obtain predictable results. (MPEP §2141.III.(B)).



Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1, 3, 7, 11, 15-16 and 20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-2, 4 of U.S. Patent No. US12009062B2.  Although the claims at issue are not identical, they are not patentably distinct from each other because the instant claims are simplified hence broader versions of the reference claim, they are anticipated by the reference claim. 


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GUOZHEN LIU whose telephone number is (571)272-0224. The examiner can normally be reached Monday-Friday 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Larry D Riggs can be reached at (571) 270-3062. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/GL/
Patent Examiner
Art Unit 1686



/Anna Skibinsky/
Primary Examiner, AU 1635
Read full office action
Prosecution Timeline

May 03, 2024
Application Filed
Mar 05, 2026
Non-Final Rejection — §101, §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/244,966
Patent 12590326
METHODS FOR FRAGMENTOME PROFILING OF CELL-FREE NUCLEIC ACIDS
2y 5m to grant Granted Mar 31, 2026
17/275,994
Patent 12535399
CELL SORTING DEVICE AND METHOD
2y 5m to grant Granted Jan 27, 2026
16/337,846
Patent 12499971
SYSTEMATIC SCREENING AND MAPPING OF REGULATORY ELEMENTS IN NON-CODING GENOMIC REGIONS, METHODS, COMPOSITIONS, AND APPLICATIONS THEREOF
2y 5m to grant Granted Dec 16, 2025
18/665,197
Patent 12393660
DNA Access Control Systems
2y 5m to grant Granted Aug 19, 2025
18/482,779
Patent 12340873
METHODS FOR MULTI-RESOLUTION ANALYSIS OF CELL-FREE NUCLEIC ACIDS
2y 5m to grant Granted Jun 24, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
50%
Grant Probability
75%
With Interview (+25.4%)
4y 8m
Median Time to Grant
Low
PTA Risk
Based on 95 resolved cases by this examiner. Grant probability derived from career allow rate.