Abstract
We generalize an approach suggested by Hill (Heredity, 33, 229-239, 1974) for testing for significant association among alleles at two loci when only genotype and not haplotype frequencies are available. The principle is to use the Expectation-Maximization (EM) algorithm to resolve double heterozygotes into haplotypes and then apply a likelihood ratio test in order to determine whether the resolutions of haplotypes are significantly nonrandom, which is equivalent to testing whether there is statistically significant linkage disequilibrium between loci. The EM algorithm in this case relies on the assumption that genotype frequencies at each locus are in Hardy-Weinberg proportions. This method can accommodate X-linked loci and samples from haplodiploid species. We use three methods for testing significance of the likelihood ratio: the empirical distribution in a large number of randomized data sets, the X2 approximation for the distribution of likelihood ratios, and the Z2 test. The performance of each method is evaluated by applying it to simulated data sets and comparing the tail probability with the tail probability from Fisher's exact test applied to the actual haplotype data. For realistic sample sizes (50-150 individuals) all three methods perform well with two or three alleles per locus, but only the empirical distribution is adequate when there are five to eight alleles per locus, as is typical of hypervariable loci such as microsatellites. The method is applied to a data set of 32 microsatellite loci in a Finnish population and the results confirm the theoretical predictions. We conclude that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
On the inference of ancestries in admixed populations
Inference of ancestral information in recently admixed populations, in which every individual is composed of a mixed ancestry (e.g., African Americans in the United States), is ...
Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm
Abstract. The expectation‐maximization (EM) algorithm is a popular approach for obtaining maximum likelihood estimates in incomplete data problems because of its simplicity and ...
Fast model-based estimation of ancestry in unrelated individuals
Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be u...
Short allele dominance as a source of heterozygote deficiency at microsatellite loci: experimental evidence at the dinucleotide locus Gv1CT in<i>Gracilaria gracilis</i>(Rhodophyta)
In this study, we compared the genotypes obtained at a microsatellite locus using two methods of amplification and detection of variation in a set of individuals belonging to th...
LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters
Abstract Summary: We present a Markov chain Monte Carlo coalescent genealogy sampler, LAMARC 2.0, which estimates population genetic parameters from genetic data. LAMARC can co-...
Publication Info
- Year
- 1996
- Type
- article
- Volume
- 76
- Issue
- 4
- Pages
- 377-383
- Citations
- 572
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1038/hdy.1996.55
- PMID
- 8626222