Abstract

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

Keywords

BiologySequence (biology)Quadratic growthComposition (language)Computational biologyDNA sequencingAlgorithmGeneticsDNAComputer science

Affiliated Institutions

Related Publications

Publication Info

Year
2011
Type
article
Volume
21
Issue
3
Pages
487-493
Citations
1396
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1396
OpenAlex

Cite This

Szymon M. Kiełbasa, Raymond Wan, Kengo Sato et al. (2011). Adaptive seeds tame genomic sequence comparison. Genome Research , 21 (3) , 487-493. https://doi.org/10.1101/gr.113985.110

Identifiers

DOI
10.1101/gr.113985.110