Abstract
Abstract Background Analyses of genetic data at the level of haplotypes provide increased accuracy and power to infer genotype-phenotype correlations and evolutionary history of a locus. However, empirical determination of haplotypes is expensive and laborious. Therefore, several methods of inferring haplotypes from unphased genotypic data have been proposed, but it is unclear how accurate each of the methods is or which methods are superior. The accuracy of some of the leading methods of computational haplotype inference (PL-EM, Phase, SNPHAP, Haplotyper) are compared using a large set of 308 empirically determined haplotypes based on 15 SNPs, among which 36 haplotypes were observed to occur. This study presents several advantages over many previous comparisons of haplotype inference methods: a large number of subjects are included, the number of known haplotypes is much smaller than the number of chromosomes surveyed, a range in values of linkage disequilibrium, presence of rare SNP alleles, and considerable dispersion in the frequencies of haplotypes. Results In contrast to some previous comparisons of haplotype inference methods, there was very little difference in the accuracy of the various methods in terms of either assignment of haplotypes to individuals or estimation of haplotype frequencies. Although none of the methods inferred all of the known haplotypes, the assignment of haplotypes to subjects was about 90% correct for individuals heterozygous for up to three SNPs and was about 80% correct for up to five heterozygous sites. All of the methods identified every haplotype with a frequency above 1%, and none assigned a frequency above 1% to an incorrect haplotype. Conclusions All of the methods of haplotype inference have high accuracy and one can have confidence in inferences made by any one of the methods. The ability to identify even rare (≥ 1%) haplotypes is reassuring for efforts to identify haplotypes that contribute to disease in a significant proportion of a population. Assignment of haplotypes is relatively accurate among subjects heterozygous for up to 5 sites, and this might be the largest number of SNPs for which one should define haplotype blocks or have confidence in haplotype assignments.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters
Abstract Summary: We present a Markov chain Monte Carlo coalescent genealogy sampler, LAMARC 2.0, which estimates population genetic parameters from genetic data. LAMARC can co-...
On the inference of ancestries in admixed populations
Inference of ancestral information in recently admixed populations, in which every individual is composed of a mixed ancestry (e.g., African Americans in the United States), is ...
Empirical tests of some predictions from coalescent theory with applications to intraspecific phylogeny reconstruction.
Abstract Empirical data sets of intraspecific restriction site polymorphism in Drosophila have been gathered in order to test hypotheses derived from coalescent theory. Three ma...
High-Throughput Gene Mapping in <i>Caenorhabditis elegans</i>
Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-th...
Separating population structure from population history: a cladistic analysis of the geographical distribution of mitochondrial DNA haplotypes in the tiger salamander, Ambystoma tigrinum.
Abstract Nonrandom associations of alleles or haplotypes with geographical location can arise from restricted gene flow, historical events (fragmentation, range expansion, colon...
Publication Info
- Year
- 2004
- Type
- article
- Volume
- 5
- Issue
- 1
- Pages
- 22-22
- Citations
- 69
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1186/1471-2156-5-22
- PMID
- 15291970
- PMCID
- PMC512280