Abstract
Abstract Motivation: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein’s function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation (‘phylogenomics’) is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our goal is to automate phylogenomics using explicit phylogenetic inference. A necessary component is an algorithm to infer speciation and duplication events in a given gene tree. Results: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(\batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(n^{2}\) \end{document}) which is inferior to two previous algorithms that are \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \({\sim}\) \end{document}O(\batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(n\) \end{document}) for a gene tree of \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(n\) \end{document}sequences. However, our algorithm is extremely simple, and its asymptotic worst case behavior is only realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene trees. Availability: http://www.genetics.wustl.edu/eddy/forester Contact: zmasek@genetics.wustl.edu; eddy@genetics.wustl.edu
Keywords
Affiliated Institutions
Related Publications
AL2CO: calculation of positional conservation in a protein sequence alignment
Abstract Motivation: Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily ...
Direct Detection of the Yarkovsky Effect by Radar Ranging to Asteroid 6489 Golevka
Radar ranging from Arecibo, Puerto Rico, to the 0.5-kilometer near-Earth asteroid 6489 Golevka unambiguously reveals a small nongravitational acceleration caused by the anisotro...
Conversion of Zinc Oxide Nanobelts into Superlattice-Structured Nanohelices
A previously unknown rigid helical structure of zinc oxide consisting of a superlattice-structured nanobelt was formed spontaneously in a vapor-solid growth process. Starting fr...
STRAP: editor for STRuctural Alignments of Proteins
Abstract Summary: STRAP is a comfortable and extensible tool for the generation and refinement of multiple alignments of protein sequences. Various sequence ordered input file f...
Remarks on the Method of Paired Comparisons: III. A Test of Significance for Paired Comparisons when Equal Standard Deviations and Equal Correlations are Assumed
A test of goodness of fit is developed for Thurstone's method of paired comparisons, Case V. The test involves the computation of \documentclass[12pt]{minimal} \usepackage{amsma...
Publication Info
- Year
- 2001
- Type
- article
- Volume
- 17
- Issue
- 9
- Pages
- 821-828
- Citations
- 209
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/17.9.821