Increasing the Efficiency of Searches for the Maximum Likelihood Tree in a Phylogenetic Analysis of up to 150 Nucleotide Sequences

Abstract

Even when the maximum likelihood (ML) tree is a better estimate of the true phylogenetic tree than those produced by other methods, the result of a poor ML search may be no better than that of a more thorough search under some faster criterion. The ability to find the globally optimal ML tree is therefore important. Here, I compare a range of heuristic search strategies (and their associated computer programs) in terms of their success at locating the ML tree for 20 empirical data sets with 14 to 158 sequences and 411 to 120,762 aligned nucleotides. Three distinct topics are discussed: the success of the search strategies in relation to certain features of the data, the generation of starting trees for the search, and the exploration of multiple islands of trees. As a starting tree, there was little difference among the neighbor-joining tree based on absolute differences (including the BioNJ tree), the stepwise-addition parsimony tree (with or without nearest-neighbor-interchange (NNI) branch swapping), and the stepwise-addition ML tree. The latter produced the best ML score on average but was orders of magnitude slower than the alternatives. The BioNJ tree was second best on average. As search strategies, star decomposition and quartet puzzling were the slowest and produced the worst ML scores. The DPRml, IQPNNI, MultiPhyl, PhyML, PhyNav, and TreeFinder programs with default options produced qualitatively similar results, each locating a single tree that tended to be in an NNI suboptimum (rather than the global optimum) when the data set had low phylogenetic information. For such data sets, there were multiple tree islands with very similar ML scores. The likelihood surface only became relatively simple for data sets that contained approximately 500 aligned nucleotides for 50 sequences and 3,000 nucleotides for 100 sequences. The RAxML and GARLI programs allowed multiple islands to be explored easily, but both programs also tended to find NNI suboptima. A newly developed version of the likelihood ratchet using PAUP* successfully found the peaks of multiple islands, but its speed needs to be improved.

Keywords

Phylogenetic treeTree (set theory)Tree rearrangementSet (abstract data type)Range treeHeuristicBiologyStatisticsMathematicsSearch treeCombinatoricsComputer scienceInterval treeAlgorithmSearch algorithmMathematical optimizationGeneticsGene

Affiliated Institutions

Related Publications

IQPNNI: Moving Fast Through Tree Space and Stopping in Time

Lê Sỹ Vinh

An efficient tree reconstruction method (IQPNNI) is introduced to reconstruct a phylogenetic tree based on DNA or amino acid sequence data. Our approach combines various fast al...

2004 Molecular Biology and Evolution 169 citations

AxML: a fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method

Alexandros Stamatakis , Thomas Ludwig , Harald Meier +1 more

Heuristics for the NP-complete problem of calculating the optimal phylogenetic tree for a set of aligned rRNA sequences based on the maximum likelihood method are computationall...

2003 45 citations

IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies

Lam-Tung Nguyen , Heiko A. Schmidt , Arndt von Haeseler +1 more

Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to fin...

2014 Molecular Biology and Evolution 25080 citations

Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood

Wim Hordijk , Olivier Gascuel

Abstract Motivation: Maximum likelihood (ML) methods have become very popular for constructing phylogenetic trees from sequence data. However, despite noticeable recent progress...

2005 Bioinformatics 214 citations

Accelerating Parallel Maximum Likelihood-Based Phylogenetic Tree Calculations Using Subtree Equality Vectors

Alexandros Stamatakis , Thomas Ludwig , Harald Meier +1 more

Heuristics for calculating phylogenetic trees for a large sets of aligned rRNA sequences based on the maximum likelihood method are computationally expensive. The core of most p...

2002 Conference on High Performance Comput... 26 citations

Publication Info

Year: 2007
Type: article
Volume: 56
Issue: 6
Pages: 988-1010
Citations: 66
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Increasing the Efficiency of Searches for the Maximum Likelihood Tree in a Phylogenetic Analysis of up to 150 Nucleotide Sequences

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    David A. Morrison
                                
                            (2007). 
                            Increasing the Efficiency of Searches for the Maximum Likelihood Tree in a Phylogenetic Analysis of up to 150 Nucleotide Sequences. 
                            Systematic Biology
                            , 56
                            (6)
                            , 988-1010.
                            https://doi.org/10.1080/10635150701779808

Identifiers

DOI: 10.1080/10635150701779808