Abstract

Abstract Most common hereditary diseases in humans are complex and multifactorial. Large‐scale genome‐wide association studies based on SNP genotyping have only identified a small fraction of the heritable variation of these diseases. One explanation may be that many rare variants (a minor allele frequency, MAF <5%), which are not included in the common genotyping platforms, may contribute substantially to the genetic variation of these diseases. Next‐generation sequencing, which would allow the analysis of rare variants, is now becoming so cheap that it provides a viable alternative to SNP genotyping. In this paper, we present cost‐effective protocols for using next‐generation sequencing in association mapping studies based on pooled and un‐pooled samples, and identify optimal designs with respect to total number of individuals, number of individuals per pool, and the sequencing coverage. We perform a small empirical study to evaluate the pooling variance in a realistic setting where pooling is combined with exon‐capturing. To test for associations, we develop a likelihood ratio statistic that accounts for the high error rate of next‐generation sequencing data. We also perform extensive simulations to determine the power and accuracy of this method. Overall, our findings suggest that with a fixed cost, sequencing many individuals at a more shallow depth with larger pool size achieves higher power than sequencing a small number of individuals in higher depth with smaller pool size, even in the presence of high error rates. Our results provide guidelines for researchers who are developing association mapping studies based on next‐generation sequencing. Genet. Epidemiol . 34: 479–491, 2010. © 2010 Wiley‐Liss, Inc.

Keywords

PoolingBiologyMinor allele frequencyGenotypingGenetic associationGenome-wide association studyType I and type II errorsDNA sequencingDeep sequencingGeneticsComputational biologyStatistical powerStatisticsAllele frequencySingle-nucleotide polymorphismComputer scienceAlleleGenotypeGenomeMathematicsArtificial intelligenceGene

Affiliated Institutions

Related Publications

Publication Info

Year
2010
Type
article
Volume
34
Issue
5
Pages
479-491
Citations
84
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

84
OpenAlex

Cite This

Su Yeon Kim, Yingrui Li, Yiran Guo et al. (2010). Design of association studies with pooled or un‐pooled next‐generation sequencing data. Genetic Epidemiology , 34 (5) , 479-491. https://doi.org/10.1002/gepi.20501

Identifiers

DOI
10.1002/gepi.20501