Office Action Analysis: 17850739 — ACCELERATING NUCLEIC ACID SEQUENCING DATA WORKFLOWS USING A RAPID COMPUTATION OF HAMMING DISTANCE

Office Action

§101 §103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim status
	2. Claims 1-20 are currently pending and under examination herein.
	Claims 1-20 are rejected.

Priority
3. The instant application also claims benefit to provisional application No 63/216,464 filed on 6/29/2021. Domestic benefit is acknowledged. Thus the effective filing date of claims 1-20 in the instant application will be considered to be 6/29/2021.

Information Disclosure Statement
4. There was no information disclosure statement filed.

Drawings
5. The drawings are accepted.

Specification
6. The use of the terms Oxford Nanopore MinION, Pacific Biosciences SMRT, Java, Python, and Perl, which are trade names or a mark used in commerce, has been noted in this application. The terms should be accompanied by the generic terminology; furthermore the terms should be capitalized wherever they appear or, where appropriate, include proper symbols indicating use in commerce such as ™, SM , or ® following the terms.
Although the use of trade names and marks used in commerce (i.e., trademarks, service marks, certification marks, and collective marks) are permissible in patent applications, the proprietary nature of the marks should be respected and every effort made to prevent their use in any manner which might adversely affect their validity as commercial marks.

Claim Interpretation
7. Claims 3 and 13 contain the phrase “ a [nucleotide] sequence of a predicted protein that is not associated with a position in the reference genome”. For the purposes of examination, this will be interpreted to be equivalent to “a sequence encoding a predicted protein that is not associated with a position in the reference genome”.

	Claims 5 and 15 recite that “the value representing the Hamming distance between the first string and the second string is a value that is four times the Hamming distance.” Hamming distance between two character strings of equal length is defined in the specification as the number of positions at which the corresponding characters are different (para. 0004). Therefore, this limitation is interpreted to mean that the value representing the Hamming distance resulting from the methods in claims 1 and 11, which is a bit-wise Hamming distance, is four times the Hamming distance, which includes but is not limited to a symbol-wise Hamming distance, which is derived from nucleic acid symbols. 

Claims 10 and 20 recite “clustering of the first set of nucleotide sequences and the second set of nucleotide sequences”. Using the broadest reasonable interpretation, for the purpose of examination, this will be considered to be clustering of either the first set or the second set of nucleotide sequences alone or clustering of the first and second sets of nucleotide sequence together. 

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


8. Claims 3, 4, 13 and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.	
	Claims 3 and 13 recite “partial or complete sequences of fusion genes as a result of translocation, inversion, and deletion”.  This is ambiguous because it is unclear if each fusion gene must result from all three events or if each fusion gene could result from one or more of the events. Claims 4 and 14 are rejected by virtue of dependence on claims 3 and 13. Two possible solutions include replacing “and” with “or” or replacing “as a result of” with “arising from mechanisms including.” Using the broadest reasonable interpretation,  for purposes of examination, this will be interpreted to mean partial or complete sequences of fusion genes that are formed as a result of one or more of the events happening, rather than restricting to fusion genes arising from all of the events happening together. Therefore, claims 3, 4, 13 and 14 are rejected under 35 U.S.C. 112(b).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


9. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
In accordance with MPEP § 2106, claims found to recite statutory subject matter ( Step 1 : YES) are then analyzed to determine if the claims recite any concepts that equate to an abstract idea, law of nature or natural phenomenon (Step 2A, Prong 1). In the instant application, the claims recite the following limitations that equate to an abstract idea:
Independent claims 1 and 11 recite:
Converting value characters in two strings to one hot encoding and converting unknown characters in the strings to zero (i.e. converting a string of text values (A, G, C, T and N) to numerical values (0s and 1s) using a code (e.g. A = 0001, G = 0010, C = 0100, G = 1000, N = 0000)) (lines 10 and 12)
Comparing two strings of numerical values (including 0s and 1s) to identify differences at each position and recording the results (i.e., if the two values are the same, record a 0 and if they are different, record a 1) (lines 14 and 16)
Counting the number of 1s in the above result and multiplying by two (lines 16 and 18)
Adjusting the resulting number from above based on the number of Ns in the original text strings to obtain a result (lines 18 and 20) (e.g. adding the number of Ns in the text string)
Dependent claims 4 and 14 recite:
Aligning multiple strings of text values by matching like values (i.e. lining up multiple matching text strings one on top of the other in a position based in their similarity)
Dependent claims 6 and 16 recite:
Counting the number of Ns in a string of letters that includes A, G, C, T and N
adding the number of Ns (as determined above) to another number 
Dependent claims 7 and 17 recite:
Counting the number of Ns in each of two string of letters that include A, G, C, T and N
Comparing two strings and at each position, counting how many times both strings have an N (i.e. matched Ns)
Adding the number of Ns and the number of matched Ns counted above to another number obtained previously
Dependent claims 8 and 18 recite:
Converting two text strings into numerical strings (by assigning a 0 to letters A, G, C or T (masking) and assigning a 1 to the letter N)
Comparing the two string to find matching values at each position and recording the results. (At any given position, if it is true that both strings have a 1, then a value of 1 is recorded, if it is false, a 0 is recorded).
Counting the number 1s in the above result 
Dependent claims 10 and 20 recite:
Grouping two or more strings of characters (each comprising the letters A, G, C, T and N) based on a metric (e.g. similarity)
The limitations listed below, further limit the abstract idea, but do not Change the nature of the abstract idea.
Dependent claims 5 and 15 recite: 
The value representing the Hamming distance is four times the Hamming distance  
Under the broadest reasonable interpretation, these limitations include not more than observations, evaluations, judgments and opinion that are considered mental processes because they are concepts that can be practically performed in the human mind, or by a human using a pen and paper (see MPEP § 2106.04(a)(2), subsection III).Therefore, these limitations fall under the “Mental process” grouping of abstract ideas. While claims 1 and 11 recite performing some aspects of the analysis with either a “computer” or a “processor of a computing device”, there are no additional limitations that indicate that this computer or processor requires anything other than carrying out the recited mental processes in a generic computer environment. Merely reciting that a mental process is being performed in a generic computer environment does not preclude the steps from being performed practically in the human mind or with pen and paper as claimed. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then if falls within the “Mental processes” grouping of abstract ideas (see MPEP § 2106.04(a)(2), subsection IIIC). 
Limitations directed to counting, adding or multiplying including counting the number of 1s and adjusting a resulting number based on the number of Ns in claims 1 and 11, counting and adding the number of Ns in claims 6 and 16, counting the number of Ns and matched Ns and adding them together in claims 7 and 17, and counting the number of 1s in claims 8 and 18, also fall into the category of mathematical concepts because they are mathematical calculations (see MPEP § 2106.04(a)(2), subsection I).
As such, claims 1, 4, 6-8, 10, 11, 14, 16-18 and 20 recite an abstract idea that falls into the mental process and mathematical concepts groupings ( Step 2A, Prong 1 : YES).

Claims found to recite a judicial exception under Step 2A, Prong 1 are then further analyzed to determine if the claims as a whole integrate the recited judicial exception into a practical application or not (Step 2A, Prong 2). This judicial exception is not integrated into a practical application because the claims do not recite an additional element that reflects an improvement in the functioning of a computer or an improvement to another technology (see MPEP §§ 2106.04(d)(1) and 2106.05(a)). Rather, the instant claims recite additional elements that amount to mere instructions to implement the abstract idea in a generic computing environment (MPEP 2106.05(f)) and insignificant extra-solution activity directed to necessary data gathering and outputting the results (MPEP 2106.05(g)).
Specifically, the claims recite the following additional elements:
Independent claims 1 and 11 recite:
Receiving nucleotide sequences (lines 4 and 6) (ESA)
Providing a result based on the calculated values (lines 21 and 23)
Independent claim 1 recites:
The method is implemented on a computer
Dependent claims 2, 3, 6, 7, 9, 12, 13, 16, 17 and 19 recite: 
Receiving or having nucleotide sequences that are either from a sequencing device, or that have one of various properties
Independent claim 11 recites:
Storing computer-executable instructions on non-transitory computer-readable medium
Performing actions claimed using one or more processors of a computing device.
There are no additional elements that indicate that the claimed computer or processor,  non-transitory computer-readable medium or formats of the provided data require anything other than generic computing systems, and no additional elements qualify as improving an existing technology. As such, these limitations equate to either 1) mere instructions to implement the abstract idea on a generic computer that the courts have stated does not render an abstract idea eligible in Alice Corp., 573 U.S. at 223, 110 USPQ2d at 1983 (see also 573 U.S. at 224, 110 USPQ2d at 1984 and MPEP 2106.05(f)), or 2) gathering data and outputting the results, which the courts have stated do not render an abstract idea eligible in  Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015) (see MPEP 2106.05(g).  These limitations add insignificant extra-solution activities do not add meaningful limitations to the process of computing Hamming distance. Therefore, claims 1-20 are directed to an abstract idea (Step 2A, Prong 2 : NO).

Claims found to be directed to a judicial exception are then further evaluated to determine if the claims recite an inventive concept that provides significantly more than the judicial exception itself (Step 2B). The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims recite additional elements that equate to mere instructions to apply the recited exception in a generic way or in a generic computing environment (see MPEP § 2106.05). 
The instant claims recite the following additional elements:
Independent claims 1 and 11 recite:
Receiving nucleotide sequences (lines 4 and 6) (ESA)
Providing a result based on the calculated values (lines 21 and 23)
Independent claim 1 recites:
The method is implemented on a computer
Dependent claims 2, 3, 6, 7, 9, 12, 13, 16, 17 and 19 recite: 
Receiving or having nucleotide sequences that are either from a sequencing device, or that have one of various properties
Independent claim 11 recites:
Storing computer-executable instructions on non-transitory computer-readable medium for executing the disclosed method
Performing actions claimed using one or more processors of a computing device
	As discussed above, there are no additional limitations to indicate that the claimed computer or processor or non-transitory computer-readable medium requires anything other than generic computer components in order to carry out the recited abstract idea in the claims. Claims that amount to nothing more than an instruction to apply the abstract idea using a generic computer do not render an abstract idea eligible (see Alice Corp., 573 U.S. at 223, 110 USPQ2d at 1983. See also 573 U.S. at 224, 110 USPQ2d at 1984). MPEP 2106.05(f) discloses that mere instructions to apply the judicial exception cannot provide an inventive concept to the claims. 
The courts have found certain computer functions are well-understood, routine, and conventional when they are claimed in a merely generic manner a including: 1) receiving or transmitting  data over a network as set forth in (buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network) and 2) storing and retrieving information from memory  as set forth in Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.
Therefore, the additional elements do not comprise an inventive concept when considered individually or as an ordered combination that transforms the claimed judicial exception into a patent-eligible application of the judicial exception. Therefore, the claims do not amount to significantly more than the judicial exception itself (Step 2B : No). As such, claims 1-20 are not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

10. Claims 1-5 and 11-15 are rejected under 35 USC 103(a) as being unpatentable over Chang et al. (US 2020/0013483 A1), in view of Carbunar et al. (Secure Data Management (SDM), 2010, vol. 6358, pp. 70-86) and Carper et al. (PeerJ, 2020 vol. 8, No. 8534, pp. 1-18). The italicized text corresponds to the instant claim limitations.
With respect to claims 1 and 11, Chang et al. discloses a non-transitory computer-readable medium that stores one or more programs, including instructions which, when executed by an electronic device including a processor, cause the device to perform any of the methods described herein (para. 0023, a non-transitory computer-readable medium having computer-executable instructions stored thereon that performs the actions claimed). Chang et al. disclose acquiring sequence read data representing nucleic acid sequences, a storage system to store sequence data and computer system to perform analyses (para. 0063, 0071 and 0146, receiving by a computing device, a set of first strings representing nucleic acid sequences). Chang et al. disclose a method of comparing sequence reads from an individual to a reference genome that includes one-hot encoding of known nucleotides whereby the nucleotide bases can be encoded using a full byte (8 bits) and can be expressed in binary values as follows: adenine=‘00000001’, cytosine=‘00000010’, guanine ‘00000100’, thymine=‘00001000’. Chang et al. further discloses that an unknown base can be represented by ‘00000000’ (para. 0004, 0078, converting any value characters (known nucleotides) in two sets of strings to numerical values (specifically one hot encoding) and converting unknown characters in the strings to zero values).  
Chang et al. discloses that in comparing numerically encoded nucleic acid sequences, a bitwise exclusive-OR (XOR) can be used to compare the reference genome file to the sequence read file, resulting in a non-zero value being returned for a mismatch between any nucleotide bases, and a zero value for matching nucleotide bases. Chang et al. further teaches that DNA string are analyzed using an active window, which is a fixed-length sliding window to restrict the regions being compared to the same lengths (para. 0091 and 0110, comparing the numerical strings using a bitwise XOR operation). 
	Chang et al. is silent to counting a number of bits in the bitwise XOR result and multiplying by two to obtain a bitcount result in claim 1. However, these limitations were known in the art before the effective filing date of the instant application, as taught by Carbunar et al. 
	Regarding Claim 1, Carbunar et al., discloses a method of calculating a bitwise Hamming distance (dH) by binary encoding DNA sequences, calculating bitwise XOR and counting the bits in the results. Similar to the instant application, Carbunar et al., teaches encoding the 4 nucleotide symbols as follows: A=0001, C=0010, G=0100, T=1000 prior to the XOR operation. By this method, at any given position being compared, dH has a count of 1 if the numerical values are mismatched and 0 if matched, which equivalent to the bitwise XOR description in the instant application (p. 79, para. 4 counting the number of bits in the bitwise XOR result).
An invention would have been prima facie obvious to a person having ordinary skill in the art before the effective filing date of the invention if some motivation in the prior art would have led that person to combine the prior art teachings to arrive at the claimed invention. Carbunar et al. taught that counting the number of bits in the bitwise XOR result to obtain a bitwise Hamming distance enabled evaluation of differences between bit-encoded DNA strings on remote data with privacy in un-trusted environments for forensic science and other applications requiring identity protection (Carbunar et al., p. 77, para. 5). Therefore, a person having ordinary skill in the art would have been motivated to apply the distance calculation taught by Carbunar et al. to the method to of sequence alignment and analysis taught by Chang et al. to measure differences between DNA sequences in an encoded environment to protect subject identity. Furthermore, a person having ordinary skill in the art would predict that the Hamming distance calculation taught by Carbunar et al., could be readily added to the method of Chang et al., with a reasonable expectation of success because both methods pertain to DNA sequence comparisons the calculation of Hamming distance is a simple step of addition with a predictable outcome. The invention is therefore prima facie obvious (see MPEP 2143(I)(G)).
Carbunar et al. does not teach multiplying the bitwise XOR result by 2, but this is a simple step that would be obvious try in normalizing bitwise distance. Carbunar et al., points to two differences between bit-wise and the symbol-based Hamming distances that would motivate trying this step in order to normalize the bit-wise distance to the standard symbol-wise distance per string length. First, Carbunar et al. teaches that due to the encoding with 4 bits per nucleotide, pairs of elements with a symbol-wise distance of ‘d’ will have a ‘2d’ bit-wise Hamming distance (i.e. dnucleotide = dbit/2). Second, since there are 4 symbols (A, G, C and T), the number of bits will be 4 times the number of symbols (p. 10, para. 4, multiplying the bitwise XOR result by two to obtain a bitcount result). Therefore, the simplest approach to normalize the bit-wise hamming distance to the symbol-wise distance is to multiply the XOR result (i.e. the bit-wise distance) by two. In doing so, one mismatched nucleotide would have a bit-wise Hamming distance of 4 (per 4 bits), which is equivalent to a symbol-wise Hamming distance of 1 (per 1 nucleotide) because 4/4 = 1/1. 
Therefore, it would have been prima facie obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to try multiplying the number of bits in the bitwise XOR result by two to normalize the bit-wise distance to the genomic distance per string length. This is one of a finite number of predictable solutions for effective normalization, which also includes dividing the symbol-wise Hamming distance by two. This application has a reasonable expectation of success because it is simple multiplication and it would effectively overcome the differences pointed out by Carbunar et al. to normalize bitwise distance to genomic distance per string length in the method taught by Chang et al. The invention is therefore prima facie obvious (see MPEP 2143(I)(E)).
Chang et al. and Carbunar et al. are silent on adjusting the bitcount result based on unknown characters in at least one of the string sets to calculate the Hamming distance in claim 1. However, this limitation was known in the art at the time of the effective filing date of the instant application, as taught by Carper et al.
In regard to claim 1, Carper et al. teaches calculating a symbol-wise Hamming distance based on comparison of two strings of nucleic acids and then adjusting the distance results based on unknown bases in at least one of the two sequences (using nucleic acid symbols including A, G, C, T and N). Carper et al. created a custom Hamming distance (termed nucleotide Hamming distance), which accommodates IUPAC nucleotide ambiguities by adjusting the distance value. For example, in a comparison of two strings of nucleotides, there is a total of one mismatch (Y-C) resulting in a symbol-wise Hamming distance of 1. However, Y is the ambiguity code representing C or T, so a Y-C pair represents a possible C-C match. Therefore, the Hamming distance is adjusted from 1 to 0 to account for the ambiguous base Y, lowering the probability of a mismatch to 0 (p. 4, para. 2, Fig. 1, adjusting the bitcount result based on unknown characters in at least one of the string sets to calculate the Hamming distance).
An invention would have been prima facie obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention if some motivation in the prior art would have led that person to combine the prior art teachings to arrive at the claimed invention. Carper et al., taught that adjusting the bitcount result (Hamming distance) to account for unknown nucleotides in sequence comparisons avoids artificially inflating the number of differences between sequences and thus better distinguishes related organisms than unadjusted methods (p.4, para. 2). Therefore, a person having ordinary skill in the art would have been motivated to utilize the method of adjusting for unknown nucleotides in sequence alignments taught by Carper et al. to the method of calculating Hamming distance taught by Chang et al. and Carbunar et al. to avoid artificially inflating the Hamming distance due to ambiguous nucleotides in the sequences. Furthermore, a person having ordinary skill in the art would predict that the correction method taught by Carper et al. could be readily added to the distance calculation method by Chang et al. and Carbunar et al. with reasonable expectation of success because they both pertain to DNA sequence comparisons and adjusting distances based on nucleotide ambiguity can be a simple operation such as subtraction that can easily be applied in a rule-based manner to effectively account for unknown bases. The invention is therefore prima facie obvious (see MPEP 2143(I)(G)).
Regarding claim 1,  Carper. et al., teaches providing a result based on the determined plurality of values. Carper et al., teaches that aligned sequence reads are compared to corresponding reference genome to determine ‘support characteristics’ which include all output data (such as matched and mismatched nucleotides, the number of supporting sequence reads, alignment positions and a distribution of estimated fragment lengths for each). Carper also teaches that any of the support characteristics or other values can be output from one computer component to another and can be provided to a user (para. 0004, 0018, 0147, providing a result based on the determined plurality of values).

Regarding claims 2 and 12, Chang et al. teaches analyzing sets of nucleotide sequences from a sequencer and that any suitable sequencing-by-synthesis platform can be used to identify mutations  including the Genome Sequencers from Roche/454 Life Sciences, the GENOME ANALYZER from Illumina/SOLEXA, the SOLID system from Applied BioSystems, and the HELISCOPE system from Helicos Biosciences (para. 0066, the first set of nucleotide sequences are from a sequencing device).

Regarding claims 3 and 13, Chang et al. discloses that aligned sequence reads from a sample from an individual are compared to corresponding reference genome positions (para. 0004, the second set of nucleotide sequences includes at least one sequence from a reference genome, a viral insertion, a gene encoding a predicted protein, and a sequence of a fusion gene).

	In reference to claims 4 and 14, Chang et al. discloses that the comparison includes performing a bitwise operation to compare N bytes of the converted reference genome file to a corresponding N bytes of the converted content of the sequence read file and that the results include an alignment in which the sequence reads are aligned to a reference genome. (para. 0070, 0091 and Fig. 1 step 130, providing an alignment of the first set of nucleotide sequences to the reference genome, the viral insertion, the predicted protein, the predicted gene, or the fusion genes).

Regarding claims 5 and 15, Chang et al., Carbunar et al. and Carper et al. do not explicitly teach that the value representing Hamming distance is 4 times the Hamming distance. However, this is the result of multiplying the bitwise XOR result by 2, which is a simple step that would be obvious try to normalize the bitwise distance to the symbol-wise distance. Carbunar et al. points to two differences between bit-wise and the symbol-based Hamming distances that would motivate multiplying the bit-wise Hamming distance by two: 1) Carbunar et al. teaches that due to the coding (i.e. encoding with 4 bits per nucleotide), pairs of elements of symbol-wise distance d will have a 2d bit-wise Hamming distance (i.e. dnucleotide = dbit/2) and 2) Since there are 4 symbols (A, G, C and T), the number of bits will be 4 times the number of symbols (p. 10, para. 4, wherein the value representing the Hamming distance is 4x the Hamming distance). These differences motivate normalizing the bitwise distance to the symbol-wise distance by multiplying the XOR result (i.e. the bit-wise distance) by two. After this transformation, a mismatched nucleotide would have a modified bit-wise Hamming distance of 4 (per 4 bits), which is equivalent to the symbol-wise Hamming distance of 1 (per 1 nucleotide) because 4/4 = 1/1. 
	Therefore, it would have been prima facie obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to try multiplying the number of bits in the bitwise XOR result by two so that the bitwise hamming is 4 times the symbol-wise hamming in order to normalize the bit-wise distance to the genomic distance per string length. There are a finite number of predictable solutions for effective normalization. This application has a reasonable expectation of success because it is simple multiplication and it would effectively overcome the differences pointed out by Carbunar et al., to normalize bitwise distance to genomic distance per string length in the method taught by Chang et al. The invention is therefore prima facie obvious (see MPEP 2143(I)(E)).

11. Claims 6 and 16 are rejected under 35 USC 103(a) as being unpatentable over Chang et al. (US 2020/0013483 A1), in view of Carbunar et al. (Secure Data Management (SDM), 2010, vol. 6358, pp. 70-86) and Carper et al. (PeerJ, 2020, vol. 8, No. 8534, pp. 1-18), as applied to claims 1-5 and 11-15 above, and further in view of Carver (EMBOSS DistMat manual, 2001) and Bleasby et al. (EMBOSS ajbase source code, 2011). The italicized text corresponds to the instant claim limitations.
The limitations of claims 1-5 and 11-15 have been taught by Chang et al., Carbunar et al. and Carper et al. 
Pertaining to claims 6 and 16, Chang et al. teaches comparing strings from sequence read files with those of a reference genome, in which only the sequence read file and not the reference genome has an unknown character (Fig. 3D, determining the value representing the Hamming distance wherein the first but not the second string includes unknown character(s)).
Chang et al., Carbunar et al. and Carper et al. are silent to adjusting for unknown characters in the first string to get a Hamming distance by counting the number of unknown characters and adding the number to the bitcount result in claims 6 and 16. However, this limitation was known in the art at the time of the effective filing date of the invention, as taught by Carver and Bleasby et al.
Pertaining to claims 6 and 16, Carver teaches a method for adjusting a symbol-wise Hamming distance based on unknown characters, which yields the same result as the claimed limitation of adjusting for unknown characters in the bit-wise Hamming distance. Following the method in claims 6 and 16 of the instant application, the resulting bit-wise Hamming4 distance is equal to 4 XY + 3 XN, and the symbol-wise Hamming distance is equal to 1 XY + ¾ XN (para. 0060-0062 of the specification), which is identical to the equation taught by Carver described below. 
Caver teaches that distances for ambiguous nucleotides are expected distances calculated based on the probability of a mismatch, assuming the ambiguous nucleotide N has equal likelihood of being A, G, C or T. In generating a distance matrix from a multiple sequence alignment, distances for ambiguous nucleotides are calculated using fractional scoring as shown below: 
D = distance = p-distance = 1 – S
S = similarity = m/(number of positions + gaps*gap_penalty)
m = score of matches (1 for an exact match (X-X), a fraction for partial matches (including X-N, N-N) and 0 for mismatch (X-Y)
The definition of m for partial matches is given by Bleasby et al. in the ajBaseAlphacharCompare function of the code implementing the method described by Caver: m = (1/len1) * (1/len2), where len1 and len2 are the lengths of the respective character lists for each nucleotide (p. 5 lines 28-31). For each known nucleotide, the character list is A, G, C or T; therefore, the length is 1. For unknown nucleotide ‘N’ the character list is AGCT; therefore, the length is 4.
Note that at each position, distance (D) is 1-m; therefore, 
distance for a match of known nucleotides (X-X) is 0
distance for a mismatch of known nucleotides (X-Y) is 1
distances for partial matches (including pairs with N or N-N) are fractions
In analysis of nucleotides, if the flag "-ambiguous" is used then partial matches are included in the score.  For example, for a pairing of A with N, m = (1/1)*(¼) = 0.25. For any given pair of nucleotides, S = m and D = 1 – S; therefore, distance ‘D’ for any X-N mismatch = 0.75. Therefore, Caver teaches the Hamming distance  = 1 XY + ¾ XN, which is the same as the Hamming distance equation resulting from the methods of claims 6 and 16  (section “uncorrected distances”, p. 1, equations 1 and 2, para. 2, and Table 1, determining the value representing the Hamming distance wherein the bitcount result is adjusted for unknown characters by the method of counting the number of unknown characters in the first string and adding the number to the bitcount result).
Therefore, the limitations in claims 6 and 16 of adjusting for unknown characters by calculating Hamming distance by counting the number of unknown characters and adding the numbers to the bitcount results, equates to implementing fractional scoring of ambiguous nucleotides taught by Carver et al. (i.e. Hamming distance  = 1 XY + ¾ XN). However, in Carver, symbol-wise encoding is substituted for bit-wise encoding; therefore, the specific steps of counting ambiguous nucleotides and adding the number to bitcount results is not necessary. Both methods for Hamming distance calculation with symbol-wise encoding (taught by Carver and Bleasby et al.) and bit-wise encoding (taught by Chang et al., Carbunar et al. and Carper et al.) were known in the art before the filing date of the instant application. 
A person having ordinary skill in the art before the effective filing date of the instant application could have simply substitute letter symbol encoding with bitwise encoding in calculating adjusted Hamming distance (Hamming = 1 XY + ¾ XN) because it involves simple algebra, counting and adding steps. Counting the number of unknown characters in the first string (1 XN) and adding the number to the bitcount result (4XY + 2XN) effectively applies the equation for adjusted hamming distance (Hamming4 = 4 XY + 3XN) taught by Carver and Bleasby et al. The result of the substitution would have been predictable because the expected mismatch values based on probability are independent of the specific representation of sequence symbols and are invariant to whether sequences are encoded symbolically or numerically, and both methods pertain to comparing DNA sequences. The invention is therefore prima facie obvious (see MPEP 2143(I)(B)).

12. Claims 7-10 and 17-20 are rejected under 35 USC 103(a) as being unpatentable over Chang et al. (US 2020/0013483 A1), in view of Carbunar et al. (Secure Data Management (SDM), 2010, vol. 6358, pp. 70-86), Carper et al. (PeerJ, 2020 vol. 8, No. 8534, pp. 1-18), as applied to claims 1-5 and 11-15 above, and further in view of Ross (A first course in probability, 2014, 9th edition, pg. 75). The italicized text corresponds to the instant claim limitations.
   The limitations of claims 1-6 and 11-16 have been taught by Chang et al., Carbunar et al., and Carper et al. 
Pertaining to claims 7 and 17, Chang et al. teaches that the sequence read file and the reference genome can have unknown characters (para. 0078, Fig. 3D, wherein the first string and the second string include unknown character(s)).
Chang et al., Carbunar et al., and Carper et al. are silent to the method of adjusting for unknown characters in both strings being compared by counting the number of unknown characters in both strings, counting the number of unknown character pairs in the alignment and adding both numbers to the bitcount result to get the Hamming distance in claims 7 and 17. However, these limitations were known in the art at the time of the effective filing date of the invention, as taught by Ross.
Pertaining to claims 7 and 17, Ross teaches a well-established probability principle that when applied to the calculation of Hamming distance to account for unknown nucleotides in both the first and second strings, yields a Hamming distance equation that is equivalent to equation resulting from claims 7 and 17 of the instant application, thus making the claimed approach obvious to try. Ross teaches a principle for independent events (P(EF) = P(E)*P(F)), whereby the probability of two independent events ‘E’ and ‘F’ is the product of their probabilities. This equation can be used to estimate distances for ambiguous nucleotide pairs (N-X and N-N) based on the probability of a match assuming equal likelihood of an ambiguous nucleotide N being A, G, C or T.
To summarize, estimated fractional distances for ambiguous nucleotides using the probability principle for independent events (P(EF) = P(E)*P(F)) are shown below:  
p(match) is probability of a matched known pair, and distance = 1 - p(match)
The distance for matches of known nucleotides (X-X) is 0
The distance for mismatches of known nucleotides (X-Y) is 1
The distance for an ambiguous nucleotide paired with a known nucleotide (X-N) is equal to the expected Hamming distance between a known nucleotide and a uniformly ambiguous nucleotide (N). For example, given a pair N-A, P(match) = P(N=A) * P(A=A) = ¼*1 = ¼. Therefore, the distance = 1 – P(match) = 1-¼= ¾. 
The distance for an ambiguous nucleotide paired with an ambiguous nucleotide (N-N) is equal the expected distance between two uniformly ambiguous nucleotides. P(match) = P(N1=A)*P(N2=A) + P(N1=G)*P(N2=G) + P(N1=C)*P(N2=C) + P(N1=T)*P(N2=T)  = 0.25*0.25 + 0.25*0.25 + 0.25*0.25 + 0.25*0.25 = 0.25. Therefore, distance = 1 – P(match) = 1 – ¼ = ¾. 
Therefore, the Hamming distance equation derived using the probability principle is: distance = 1 XY + ¾ XN + ¾ NN, which is identical to the distance equation resulting from the methods described in claims 7 and 17 and disclosed in the specification of the instant application (para. 0060) (i.e., bitwise Hamming4 = 4 XY + 3 XN + 3 NN and symbol-wise Hamming = 1 XY + ¾ XN + ¾ NN) (Ross, p. 75, equation 4.1, A method of adjusting the Hamming distance calculation to account for unknown characters in both the first and second strings by: 1) counting the number of unknown characters in the first string and the second string, 2) determining the number of unknown characters pairs in the alignment and 3) adding the number of unknown characters and the number of pairs of unknown characters to the bitcount result).
At the time of the invention, there had been a recognized need in the art to adjust the Hamming distance (bitcount result) to account for unknown nucleotides in sequence comparisons to avoid artificially inflating the number of differences between sequences and thus better distinguishes related organisms (Carper et al., p.4, para. 2). There are a finite number of identified, predictable potential solutions to the need to calculate distances for ambiguous nucleotides, which includes either reducing mismatch value for pairs with ambiguous nucleotides to 0 (Carper et al. p. 4, para. 2, Fig. 1) or deriving a fractional value of expected distances based on well-grounded probability theory to calculate pairwise probability of independent events disclosed by Ross.
A person having ordinary skill in the art before the effective filing date of the instant application could have applied the probability principle for independent events taught by Ross to adjust the Hamming distance for comparisons with unknown nucleotides in both sequences taught by Chang et al., Carbunar et al., and Carper et al. A person having ordinary skill in the art could have pursued the known potential solutions with a reasonable expectation of success because it involves simple algebra, counting and adding steps: counting the number of unknown characters in both strings (1 XN + 2 NN), counting the number of unknown character pairs in the alignment (1NN) and adding both numbers to the bitcount result (4XY + 2 XN) to get the Hamming distance (Hamming4 = 4 XY + 3 XN + 3 NN; Hamming = XY + ¾ XN + ¾ NN). Additionally, the steps effectively apply the well-established probability principle for independent events, which is broadly relevant to many biological applications. The invention is therefore prima facie obvious (see MPEP 2143(I)(E).

Pertaining to claims 8 and 18, Chang et al., Carbunar et al., Carper et al., and Ross do not explicitly teach determining the number of matched unknown characters in an alignment (N-N) by converting value characters to 0 and unknown characters to 1, applying bitwise AND to the results and counting bitwise AND results to obtain the number of matched unknown characters.
However, Chang et al. teach a method to determine the number of matched unknown characters in bit-encoded data that would provide the same results. Chang et al. teach that in comparing strings of the same length, various bitwise operations can be performed and specific examples are given using bitwise XOR or AND-NOT operations. Chang et al. also teach that unknown nucleotides can be encoded either as 00000000 or 00001111.  Therefore, the method of Chang et al. comprises encoding unknown nucleotides with 00001111 and using bitwise AND operation, which can distinguish N-N matches from all other types of matches or mismatches (e.g. X-N, X-Y or X-X) so that the number of N-N can be counted.  See the example below using the encoding taught by Chang et al. 
		N-N	bitwise AND of 00001111 versus 00001111 = 00001111 = 4
	A-A	bitwise AND of 00000001 versus 00000001 = 00000001 = 1
	N-A	bitwise AND of 00001111 versus 00000001 = 00000001 = 1
	A-G	bitwise AND of 00000001 versus 00000100 = 00000101 = 2
Chang et al. further teach that the size of the nucleotide base (e.g., 8 bits in the example above), can also be integrated in analyses to determine the location of each type of mismatch or match in the string. Therefore, using the method of Chang et al. encoding and applying bitwise AND, would enable distinguishing N-N pairs from all other types of pairs and thus would enable counting them (para. 0078, 0091, 0092, determining the number of matched unknown characters in an alignment (i.e. N-N) by converting value characters to 0 and unknown characters to 1, applying bitwise AND to the results and counting bitwise AND results to obtain the number of matched unknown characters).
	Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the instant application to substitute the ‘full byte encoding’ method to count the number of matched unknown characters (taught by method of Chang et al.) for the encoding disclosed in the instant application (i.e. converting value characters to 0 and converting unknown characters to 1). The result of the substitution would have been predictable because the methods for binary bitwise encoding of DNA and the bitwise operations (like AND) to compare the encoded sequences were well known before the effective filing date of the instant application and they are simple operations involving simple Boolean logic that are easily computed in the human mind. The invention is therefore prima facie obvious (see MPEP 2143(I)(B)).

Regarding claims 9 and 19 Chang et al., Carbunar et al., Carper et al. and Ross teach that the two sequences can be read sequences.
For the purposes of examination, read sequence will be interpreted to be simply a sequence fragment produced by a machine. Chang et al., teaches a method by which aligned sequence reads obtained from a subject’s sample are compared to a reference genome to determine support characteristics. It is further disclosed that the reference genome file can be sequencing data from a human subject.  Therefore Chang et al., Carbunar et al., Carper et al. and Ross, teach that the two sequences can be read sequences (para. 004 and 0073, the two sequences can be read sequences). 


Regarding Claims 10 and 20 Chang et al., Carbunar et al., Carper et al. and Ross teach the provided results include clustering of the first set of nucleotide sequences and the second set of nucleotide sequences.
Chang et al. discloses that using the same numerical encoding for the reference genome and a plurality of sequence reads enables rapid comparison between groups of sequence-read and reference-genome nucleobases. Making and comparing different groups of sequence reads is a form of clustering the sequence reads (the first set of nucleotide sequences) and aligning each group with the reference genome (the second set of nucleotide sequences) is a form of clustering between the first and second set of nucleotide sequences. Chang et al. further discloses that nucleotide strings can be clustered or grouped based on their position in the reference genome or their position of alignment to the reference genome. Chang et al. further discloses partitioning the reference genome into a plurality of partitions, each partition comprising contiguous positions of the reference genome (para. 0005, 0017, Fig. 6A,  provided results include clustering of the first set of nucleotide sequences and the second set of nucleotide sequences). 

Therefore, claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chang et al. and further in view of Carbunar et al., Carper et al., Carver, Bleasby et al. and Ross.

Conclusion
In conclusion, claims 1-20 are rejected and no claims are allowed.

References not cited
13. References considered but not cited include Demeler et al. (Nucleic Acids Research, Vol. 19, p1593-1599, 1991), Salamat et al. (IEEE Biomedical circuits and Systems Conference, ISSN 2766-4465, 2021), Prochazka et al. (Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2019), pages 66-77, 2019) and Ceze et al. (US 2021/0035657 A1). 
Demeler et al. teaches one hot encoding of nucleotide sequences, Prochazka et al. teaches one-hot encoding of DNA sequences and encoding ambiguous characters differently.  Salamat et al. teaches one hot encoding of nucleotides, differently encoding ambiguous characters, and calculating pair-wise distances. Ceze et al., teaches bit encoding DNA sequences, performing XOR to get Hamming distance, performing DNA alignments and clustering DNA strings based on Hamming distance.

14. Any inquiry concerning this communication or earlier communications from the examiner should be directed to JENNIFER J SMITH whose telephone number is (571)272-7801. The examiner can normally be reached Monday-Friday 7:30 AM - 3:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Olivia Wise can be reached at (571) 272-2249. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/J.J.S./               Examiner, Art Unit 1685          

/OLIVIA M. WISE/               Supervisory Patent Examiner, Art Unit 1685
Read full office action
ACCELERATING NUCLEIC ACID SEQUENCING DATA WORKFLOWS USING A RAPID COMPUTATION OF HAMMING DISTANCE

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

ACCELERATING NUCLEIC ACID SEQUENCING DATA WORKFLOWS USING A RAPID COMPUTATION OF HAMMING DISTANCE

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email