Abstract

Abstract Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa .

Keywords

Computer scienceMultiple sequence alignmentPairwise comparisonSubsequenceSequence (biology)Data miningSequence alignmentLongest common subsequence problemAlgorithmArtificial intelligenceMathematicsBiologyPeptide sequence

Affiliated Institutions

Related Publications

Publication Info

Year
2016
Type
article
Volume
6
Issue
1
Pages
33964-33964
Citations
186
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

186
OpenAlex

Cite This

Sebastian Deorowicz, Agnieszka Debudaj-Grabysz, Adam Gudyś (2016). FAMSA: Fast and accurate multiple sequence alignment of huge protein families. Scientific Reports , 6 (1) , 33964-33964. https://doi.org/10.1038/srep33964

Identifiers

DOI
10.1038/srep33964