Abstract

Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics.

Keywords

Computer sciencePersonal genomicsGenomicsGenomeBiologyGenetics

MeSH Terms

Data MiningDatabasesGeneticGenetic VariationGenomeHumanGenomicsGenotypeHumansSoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2013
Type
article
Volume
9
Issue
7
Pages
e1003153-e1003153
Citations
456
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

456
OpenAlex
32
Influential
378
CrossRef

Cite This

Umadevi Paila, Brad Chapman, Rory Kirchner et al. (2013). GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations. PLoS Computational Biology , 9 (7) , e1003153-e1003153. https://doi.org/10.1371/journal.pcbi.1003153

Identifiers

DOI
10.1371/journal.pcbi.1003153
PMID
23874191
PMCID
PMC3715403
arXiv
1304.4860

Data Quality

Data completeness: 88%