Abstract
Abstract Motivation When gene duplication occurs, one of the copies may become free of selective pressure and evolve at an accelerated pace. This has important consequences on the prediction of orthology relationships, since two orthologous genes separated by divergence after duplication may differ in both sequence and function. In this work, we make the distinction between the primary orthologs, which have not been affected by accelerated mutation rates on their evolutionary path, and the secondary orthologs, which have. Similarity-based prediction methods will tend to miss secondary orthologs, whereas phylogeny-based methods cannot separate primary and secondary orthologs. However, both types of orthology have applications in important areas such as gene function prediction and phylogenetic reconstruction, motivating the need for methods that can distinguish the two types. Results We formalize the notion of divergence after duplication and provide a theoretical basis for the inference of primary and secondary orthologs. We then put these ideas to practice with the Hybrid Prediction of Paralogs and Orthologs (HyPPO) framework, which combines ideas from both similarity and phylogeny approaches. We apply our method to simulated and empirical datasets and show that we achieve superior accuracy in predicting primary orthologs, secondary orthologs and paralogs. Availability and implementation HyPPO is a modular framework with a core developed in Python and is provided with a variety of C++ modules. The source code is available at https://github.com/manuellafond/HyPPO. Supplementary information Supplementary data are available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
Comparative inference of illegitimate recombination between rice and sorghum duplicated genes produced by polyploidization
Whole-genome duplication produces massive duplicated blocks in plant genomes. Sharing appreciable sequence similarity, duplicated blocks may have been affected by illegitimate r...
Identification of mammalian orthologs using local synteny
Abstract Background Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplicati...
THE MOLECULAR GENETICS OF CELLULAR ONCOGENES
Orthologs and paralogs are two fundamentally different types of homologous genes that evolved, respectively, by vertical descent from a single ancestral gene and by duplication....
Reconstruction of monocotelydoneous proto-chromosomes reveals faster evolution in plants than in animals
Paleogenomics seeks to reconstruct ancestral genomes from the genes of today's species. The characterization of paleo-duplications represented by 11,737 orthologs and 4,382 para...
Identifying and removing haplotypic duplication in primary genome assemblies
Abstract Motivation Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic ge...
Publication Info
- Year
- 2018
- Type
- article
- Volume
- 34
- Issue
- 13
- Pages
- i366-i375
- Citations
- 27
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/bty242