Abstract

Opinions on the hypothesis that ancient genome duplications contributed to the vertebrate genome range from strong skepticism to strong credence. Previous studies concentrated on small numbers of gene families or chromosomal regions that might not have been representative of the whole genome, or used subjective methods to identify paralogous genes and regions. Here we report a systematic and objective analysis of the draft human genome sequence to identify paralogous chromosomal regions (paralogons) formed during chordate evolution and to estimate the ages of duplicate genes. We found that the human genome contains many more paralogons than would be expected by chance. Molecular clock analysis of all protein families in humans that have orthologs in the fly and nematode indicated that a burst of gene duplication activity took place in the period 350 650 Myr ago and that many of the duplicate genes formed at this time are located within paralogons. Our results support the contention that many of the gene families in vertebrates were formed or expanded by large-scale DNA duplications in an early chordate. Considering the incompleteness of the sequence data and the antiquity of the event, the results are compatible with at least one round of polyploidy.

Keywords

BiologyChordateGene duplicationGenomeGeneHuman genomeGeneticsEvolutionary biologySegmental duplicationGenome evolutionGene family

MeSH Terms

AnimalsCaenorhabditis elegansChordataNonvertebrateDrosophila melanogasterEvolutionMolecularGene DuplicationGenomeGenomeHumanHumansPolyploidy

Affiliated Institutions

Related Publications

Publication Info

Year
2002
Type
article
Volume
31
Issue
2
Pages
200-204
Citations
521
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

521
OpenAlex
27
Influential
419
CrossRef

Cite This

Aoife McLysaght, Karsten Hokamp, Kenneth H. Wolfe (2002). Extensive genomic duplication during early chordate evolution. Nature Genetics , 31 (2) , 200-204. https://doi.org/10.1038/ng884

Identifiers

DOI
10.1038/ng884
PMID
12032567

Data Quality

Data completeness: 86%