Kalign – an accurate and fast multiple sequence alignment algorithm

Timo Lassmann; Erik L. L. Sonnhammer

doi:10.1186/1471-2105-6-298

Abstract

Abstract Background The alignment of multiple protein sequences is a fundamental step in the analysis of biological data. It has traditionally been applied to analyzing protein families for conserved motifs, phylogeny, structural properties, and to improve sensitivity in homology searching. The availability of complete genome sequences has increased the demands on multiple sequence alignment (MSA) programs. Current MSA methods suffer from being either too inaccurate or too computationally expensive to be applied effectively in large-scale comparative genomics. Results We developed Kalign, a method employing the Wu-Manber string-matching algorithm, to improve both the accuracy and speed of multiple sequence alignment. We compared the speed and accuracy of Kalign to other popular methods using Balibase, Prefab, and a new large test set. Kalign was as accurate as the best other methods on small alignments, but significantly more accurate when aligning large and distantly related sets of sequences. In our comparisons, Kalign was about 10 times faster than ClustalW and, depending on the alignment size, up to 50 times faster than popular iterative methods. Conclusion Kalign is a fast and robust alignment method. It is especially well suited for the increasingly important task of aligning large numbers of sequences.

Keywords

Alignment-free sequence analysisMultiple sequence alignmentSequence alignmentComputer scienceAlgorithmSmith–Waterman algorithmString searching algorithmSequence (biology)Set (abstract data type)Structural alignmentMatching (statistics)String (physics)DNA microarrayComputational biologyPattern matchingArtificial intelligenceBiologyMathematicsGeneticsPeptide sequence

Affiliated Institutions

Karolinska Institutet SE

Related Publications

MUSCLE: multiple sequence alignment with high accuracy and high throughput

R. C. Edgar

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting,...

2004 Nucleic Acids Research 44728 citations

Protein multiple sequence alignment benchmarking through secondary structure prediction

Quan Le , Fabian Sievers , Desmond G. Higgins

Abstract Motivation Multiple sequence alignment (MSA) is commonly used to analyze sets of homologous protein or DNA sequences. This has lead to the development of many methods a...

2017 Bioinformatics 40 citations

MMseqs software suite for fast and deep clustering and searching of large protein sequence sets

Maria Hauser , Martin Steinegger , Johannes Söding

Abstract Motivation: Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundancy of sequence databases by similarity clustering improve...

2016 Bioinformatics 276 citations

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

R. C. Edgar

Abstract Background In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and s...

2004 BMC Bioinformatics 9005 citations

Multiple sequence alignment using partial order graphs

Christopher Lee , Catherine S. Grasso , Mark Sharlow

Abstract Motivation: Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of ...

2002 Bioinformatics 805 citations

Publication Info

Year: 2005
Type: article
Volume: 6
Issue: 1
Pages: 298-298
Citations: 713
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Kalign – an accurate and fast multiple sequence alignment algorithm

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

713

OpenAlex

Cite This

APA Style

                            
                                    Timo Lassmann, 
                                
                                    Erik L. L. Sonnhammer
                                
                            (2005). 
                            Kalign – an accurate and fast multiple sequence alignment algorithm. 
                            BMC Bioinformatics
                            , 6
                            (1)
                            , 298-298.
                            https://doi.org/10.1186/1471-2105-6-298

Identifiers

DOI: 10.1186/1471-2105-6-298