Selecting the Best-Fit Model of Nucleotide Substitution

Abstract

Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best fits the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting best-fit models of nucleotide substitution. We specifically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution and recording how often this true model is recovered by the different methods. Our results suggest that model selection is reasonably accurate and indicate that some likelihood ratio test methods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not influence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, influence model selection in some cases. Model fitting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented in standard computer programs for phylogenetic estimation. We show here that a best-fit model can be readily identified. Consequently, given the relevance of models, model fitting should be routine in any phylogenetic analysis that uses models of evolution.

Keywords

Akaike information criterionModel selectionBayesian information criterionSubstitution (logic)Likelihood-ratio testSelection (genetic algorithm)Bayesian probabilityTree (set theory)Statistical modelInformation CriteriaPhylogenetic treeStatisticsComputer scienceBiologyArtificial intelligenceMathematicsGenetics

Affiliated Institutions

Brigham Young University US

Related Publications

jModelTest: Phylogenetic Model Averaging

David Posada

jModelTest is a new program for the statistical selection of models of nucleotide substitution based on "Phyml" (Guindon and Gascuel 2003. A simple, fast, and accurate algorithm...

2008 Molecular Biology and Evolution 10411 citations

Model Selection and Model Averaging in Phylogenetics: Advantages of Akaike Information Criterion and Bayesian Approaches Over Likelihood Ratio Tests

David Posada , Thomas R. Buckley

Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental conc...

2004 Systematic Biology 3936 citations

The Effects of Nucleotide Substitution Model Assumptions on Estimates of Nonparametric Bootstrap Support

Thomas R. Buckley , Cliff Cunningham

The use of parameter-rich substitution models in molecular phylogenetics has been criticized on the basis that these models can cause a reduction both in accuracy and in the abi...

2002 Molecular Biology and Evolution 105 citations

Phylogenetic information and experimental design in molecular systematics

Nick Goldman

Despite the widespread perception that evolutionary inference from molecular sequences is a statistical problem, there has been very little attention paid to questions of experi...

1998 Proceedings of the Royal Society B Bi... 133 citations

Accounting for Uncertainty in the Tree Topology Has Little Effect on the Decision-Theoretic Approach to Model Selection in Phylogeny Estimation

Zaid Abdo , Vladimir N. Minin , Paul Joyce +1 more

Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigor...

2004 Molecular Biology and Evolution 86 citations

Publication Info

Year: 2001
Type: article
Volume: 50
Issue: 4
Pages: 580-601
Citations: 854
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Selecting the Best-Fit Model of Nucleotide Substitution

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

854

OpenAlex

Cite This

APA Style

                            
                                    David Posada, 
                                
                                    Keith A. Crandall
                                
                            (2001). 
                            Selecting the Best-Fit Model of Nucleotide Substitution. 
                            Systematic Biology
                            , 50
                            (4)
                            , 580-601.
                            https://doi.org/10.1080/10635150118469

Identifiers

DOI: 10.1080/10635150118469