FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix

Abstract

Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N(2)) space and O(N(2)L) time, but FastTree requires just O(NLa + N ) memory and O(N log (N)La) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 h and 2.4 GB of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 h and 50 GB of memory. In simulations, FastTree was slightly more accurate than Neighbor-Joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

Keywords

Distance matrixBootstrapping (finance)Tree (set theory)SpeedupDistance matrices in phylogenyJoinsMatrix (chemical analysis)Pairwise comparisonBiologyComputer sciencePhylogenetic treeHeuristicsAlgorithmCombinatoricsArtificial intelligenceBioinformaticsMathematicsGeneticsParallel computingGene

Affiliated Institutions

Related Publications

BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data

Olivier Gascuel

We propose an improved version of the neighbor-joining (NJ) algorithm of Saitou and Nei. This new algorithm, BIONJ, follows the same agglomerative scheme as NJ, which consists o...

1997 Molecular Biology and Evolution 1784 citations

TCS: a computer program to estimate gene genealogies

Mark Clement , David Posada , Keith A. Crandall

Phylogenies are extremely useful tools, not only for establishing genealogical relationships among a group of organisms or their parts (e.g. genes), but also for a variety of re...

2000 Molecular Ecology 9775 citations

Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree

Sudhindra R. Gadagkar , Michael S. Rosenberg , Sudhir Kumar

Abstract Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super‐gene alignment, which i...

2004 Journal of Experimental Zoology Part ... 457 citations

RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

Alexandros Stamatakis

Abstract Summary: RAxML-VI-HPC (randomized axelerated maximum likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies ...

2006 Bioinformatics 15659 citations

New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0

Stéphane Guindon , Jean-François Dufayard , Vincent Lefort +3 more

PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasona...

2010 Systematic Biology 18106 citations

Publication Info

Year: 2009
Type: article
Volume: 26
Issue: 7
Pages: 1641-1650
Citations: 5466
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

5466

OpenAlex

Cite This

APA Style

                            
                                    Morgan N. Price, 
                                
                                    Paramvir Dehal, 
                                
                                    Adam P. Arkin
                                
                            (2009). 
                            FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix. 
                            Molecular Biology and Evolution
                            , 26
                            (7)
                            , 1641-1650.
                            https://doi.org/10.1093/molbev/msp077

Identifiers

DOI: 10.1093/molbev/msp077