F<scp>AST</scp>SP: linear time calculation of alignment accuracy

Siavash Mirarab; Tandy Warnow

doi:10.1093/bioinformatics/btr553

Abstract

Abstract Motivation: Multiple sequence alignment is a basic part of much biological research, including phylogeny estimation and protein structure and function prediction. Different alignments on the same set of unaligned sequences are often compared, sometimes in order to assess the accuracy of alignment methods or to infer a consensus alignment from a set of estimated alignments. Three of the standard techniques for comparing alignments, Developer, Modeler and Total Column (TC) scores can be derived through calculations of the set of homologies that the alignments share. However, the brute-force technique for calculating this set is quadratic in the input size. The remaining standard technique, Cline Shift Score, inherently requires quadratic time. Results: In this article, we prove that each of these scores can be computed in linear time, and we present FastSP, a linear-time algorithm for calculating these scores. Even on the largest alignments we explored (one with 50 000 sequences), FastSP completed &lt;2 min and used at most 2 GB of the main memory. The best alternative is qscore, a method whose empirical running time is approximately the same as FastSP when given sufficient memory (at least 8 GB), but whose asymptotic running time has never been theoretically established. In addition, for comparisons of large alignments under lower memory conditions (at most 4 GB of main memory), qscore uses substantial memory (up to 10 GB for the datasets we studied), took more time and failed to analyze the largest datasets. Availability: The open-source software and executables are available online at http://www.cs.utexas.edu/~phylo/software/fastsp/. Contact: tandy@cs.utexas.edu

Keywords

ExecutableSet (abstract data type)Computer scienceMultiple sequence alignmentAlgorithmFunction (biology)SoftwareSequence (biology)Time complexityQuadratic equationSequence alignmentMathematicsBiology

Affiliated Institutions

The University of Texas at Austin US

Related Publications

Aligning two sequences within a specified diagonal band

Kun‐Mao Chao , William R. Pearson , Webb Miller

We describe an algorithm for aligning two sequences within a diagonal band that requires only O(NW) computation time and O(N) space, where N is the length of the shorter of the ...

1992 Computer applications in the biosciences 165 citations

FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix

Morgan N. Price , Paramvir Dehal , Adam P. Arkin

Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constr...

2009 Molecular Biology and Evolution 5466 citations

FAMSA: Fast and accurate multiple sequence alignment of huge protein families

Sebastian Deorowicz , Agnieszka Debudaj-Grabysz , Adam Gudyś

Abstract Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of...

2016 Scientific Reports 186 citations

MMseqs software suite for fast and deep clustering and searching of large protein sequence sets

Maria Hauser , Martin Steinegger , Johannes Söding

Abstract Motivation: Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundancy of sequence databases by similarity clustering improve...

2016 Bioinformatics 276 citations

DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.

Burkhard Morgenstern

Abstract MOTIVATION: The performance and time complexity of an improved version of the segment-to-segment approach to multiple sequence alignment is discussed. In this approach,...

1999 Bioinformatics 686 citations

Publication Info

Year: 2011
Type: article
Volume: 27
Issue: 23
Pages: 3250-3258
Citations: 63
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

F<scp>AST</scp>SP: linear time calculation of alignment accuracy

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Siavash Mirarab, 
                                
                                    Tandy Warnow
                                
                            (2011). 
                            F<scp>AST</scp>SP: linear time calculation of alignment accuracy. 
                            Bioinformatics
                            , 27
                            (23)
                            , 3250-3258.
                            https://doi.org/10.1093/bioinformatics/btr553

Identifiers

DOI: 10.1093/bioinformatics/btr553