Abstract
Alignments of nucleotide or amino acid sequences may contain a variety of different signals, one of which is the historical signal that we often try to recover by phylogenetic analysis. Other signals, such as those arising due to compositional heterogeneities, among-lineage and among-site rate heterogeneities, invariant sites, and covariotides, may interfere adversely with the recovery of the historical signal. The effect of the interaction of these signals on phylogenetic inference is not well understood and may, in many cases, even be underappreciated. In this study, we investigate this matter and present results based on Monte Carlo simulations. We explored the success of four phylogenetic methods in recovering the true tree from data that had evolved under conditions where the equilibrium base frequencies and substitution rates were allowed to vary among lineages. Seven scenarios with increasingly complex conditions were investigated. All of the methods tested, with the exception of neighbor-joining using LogDet distances, were sensitive to compositional convergence in nonsister lineages. Maximum parsimony was also susceptible to attraction between long edges. In many cases, however, phylogenetic inference methods can still recover the true tree when misleading signals are present, in some instances even when the historical signal is no longer dominant. These results highlight the growing need for simple methods to detect violation of the phylogenetic assumptions.
Keywords
Affiliated Institutions
Related Publications
Ultraconserved Elements Anchor Thousands of Genetic Markers Spanning Multiple Evolutionary Timescales
Although massively parallel sequencing has facilitated large-scale DNA sequencing, comparisons among distantly related species rely upon small portions of the genome that are ea...
Cases in which Parsimony or Compatibility Methods will be Positively Misleading
For some simple three- and four-species cases involving a character with two states, it is determined under what conditions several methods of phylogenetic inference will fail t...
Whole-genome analyses resolve early branches in the tree of life of modern birds
To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods cr...
Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
Protein sequence database search programs may be evaluated both for their retrieval accuracy--the ability to separate meaningful from chance similarities--and for the accuracy o...
Genome-wide nucleotide-level mammalian ancestor reconstruction
Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide recon...
Publication Info
- Year
- 2004
- Type
- article
- Volume
- 53
- Issue
- 4
- Pages
- 623-637
- Citations
- 156
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/10635150490503035