FAMSA: Fast and accurate multiple sequence alignment of huge protein families

Sebastian Deorowicz; Agnieszka Debudaj-Grabysz; Adam Gudyś

doi:10.1038/srep33964

Abstract

Abstract Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa .

Keywords

Computer scienceMultiple sequence alignmentPairwise comparisonSubsequenceSequence (biology)Data miningSequence alignmentLongest common subsequence problemAlgorithmArtificial intelligenceMathematicsBiologyPeptide sequence

Affiliated Institutions

Silesian University of Technology PL

Related Publications

Aligning 415 519 proteins in less than two hours on PC

Sebastian Deorowicz , Agnieszka Debudaj-Grabysz , Adam Gudyś

Rapid development of modern sequencing platforms enabled an unprecedented growth of protein families databases. The abundance of sets composed of hundreds of thousands sequences...

2016 arXiv (Cornell University) 31 citations

The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools

Julie Thompson , Toby J. Gibson , Frédéric Plewniak +2 more

CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system...

1997 Nucleic Acids Research 38996 citations

VSEARCH: a versatile open source tool for metagenomics

Torbjørn Rognes , Tomáš Flouri , Ben Nichols +2 more

Background VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence...

2016 PeerJ 10017 citations

MAFFT version 5: improvement in accuracy of multiple sequence alignment

Kazutaka Katoh

The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i...

2005 Nucleic Acids Research 4851 citations

CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

Julie Thompson , Desmond G. Higgins , Toby J. Gibson

The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individu...

1994 Nucleic Acids Research 64103 citations

Publication Info

Year: 2016
Type: article
Volume: 6
Issue: 1
Pages: 33964-33964
Citations: 186
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

186

OpenAlex

Cite This

APA Style

                            
                                    Sebastian Deorowicz, 
                                
                                    Agnieszka Debudaj-Grabysz, 
                                
                                    Adam Gudyś
                                
                            (2016). 
                            FAMSA: Fast and accurate multiple sequence alignment of huge protein families. 
                            Scientific Reports
                            , 6
                            (1)
                            , 33964-33964.
                            https://doi.org/10.1038/srep33964

Identifiers

DOI: 10.1038/srep33964