Abstract
Probabilistic inference of a phylogenetic tree from molecular sequence data is predicated on a substitution model describing the relative rates of change between character states along the tree for each site in the multiple sequence alignment. Commonly, one assumes that the substitution model is homogeneous across sites within large partitions of the alignment, assigns these partitions a priori, and then fixes their underlying substitution model to the best-fitting model from a hierarchy of named models. Here, we introduce an automatic model selection and model averaging approach within a Bayesian framework that simultaneously estimates the number of partitions, the assignment of sites to partitions, the substitution model for each partition, and the uncertainty in these selections. This new approach is implemented as an add-on to the BEAST 2 software platform. We find that this approach dramatically improves the fit of the nucleotide substitution model compared with existing approaches, and we show, using a number of example data sets, that as many as nine partitions are required to explain the heterogeneity in nucleotide substitution process across sites in a single gene analysis. In some instances, this improved modeling of the substitution process can have a measurable effect on downstream inference, including the estimated phylogeny, relative divergence times, and effective population size histories.
Keywords
Affiliated Institutions
Related Publications
The Effects of Nucleotide Substitution Model Assumptions on Estimates of Nonparametric Bootstrap Support
The use of parameter-rich substitution models in molecular phylogenetics has been criticized on the basis that these models can cause a reduction both in accuracy and in the abi...
Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution
We describe a model of neutral DNA evolution that allows substitution rates at a site to depend on the two flanking nucleotides (“context”), the branch of the phylogenetic tree,...
Model Averaging and Bayes Factor Calculation of Relaxed Molecular Clocks in Bayesian Phylogenetics
We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitutio...
Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree
Abstract Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super‐gene alignment, which i...
Hierarchical Phylogenetic Models for Analyzing Multipartite Sequence Data
Debate exists over how to incorporate information from multipartite sequence data in phylogenetic analyses. Strict combined-data approaches argue for concatenation of all partit...
Publication Info
- Year
- 2012
- Type
- article
- Volume
- 30
- Issue
- 3
- Pages
- 669-688
- Citations
- 54
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/molbev/mss258