Abstract
Reductions in the cost of sequencing have enabled whole-genome sequencing to identify sequence variants segregating in a population. An efficient approach is to sequence many samples at low coverage, then to combine data across samples to detect shared variants. Here, we present methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information. For each population, we first collect SNP candidates based on independent sequence calls per site. We then use MARGARITA with genotype or phased haplotype data from the same samples to collect 20 ancestral recombination graphs (ARGs). We refine the posterior probability of SNP candidates by considering possible mutations at internal branches of the 40 marginal ancestral trees inferred from the 20 ARGs at the left and right flanking genotype sites. Using a population genetic prior distribution on tree-branch length and Bayesian inference, we determine a posterior probability of the SNP being real and also the most probable phased genotype call for each individual. We present experiments on both simulation data and real data from the 1000 Genomes Project to prove the applicability of the methods. We also explore the relative tradeoff between sequencing depth and the number of sequenced samples.
Keywords
Affiliated Institutions
Related Publications
Patterns of DNA sequence polymorphism along chromosome 1 of maize ( <i>Zea mays</i> ssp. <i>mays</i> L.)
We measured sequence diversity in 21 loci distributed along chromosome 1 of maize ( Zea mays ssp. mays L.). For each locus, we sequenced a common sample of 25 individuals repres...
Low-coverage sequencing: Implications for design of complex trait association studies
New sequencing technologies allow genomic variation to be surveyed in much greater detail than previously possible. While detailed analysis of a single individual typically requ...
MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes
Abstract Genome‐wide association studies (GWAS) can identify common alleles that contribute to complex disease susceptibility. Despite the large number of SNPs assessed in each ...
<i>Stacks</i>: Building and Genotyping Loci <i>De Novo</i> From Short-Read Sequences
Abstract Advances in sequencing technology provide special opportunities for genotyping individuals with speed and thrift, but the lack of software to automate the calling of te...
The diploid genome sequence of an Asian individual
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We...
Publication Info
- Year
- 2010
- Type
- article
- Volume
- 21
- Issue
- 6
- Pages
- 952-960
- Citations
- 156
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1101/gr.113084.110