Abstract

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.

Keywords

IndelExome sequencingGenomeGenomicsDNA sequencingComputational biologyExomeBiologyDeep sequencingDeep learningConvolutional neural networkWhole genome sequencingINDEL MutationHuman genomeComputer scienceArtificial intelligenceGeneticsMutationSingle-nucleotide polymorphismGenotypeGene

MeSH Terms

AnimalsDNA Mutational AnalysisGenomeHumanGenomicsGenotypeHigh-Throughput Nucleotide SequencingHumansINDEL MutationMammalsNeural NetworksComputerPolymorphismSingle NucleotideSequence AnalysisDNASoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
36
Issue
10
Pages
983-987
Citations
1649
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1649
OpenAlex
157
Influential

Cite This

Ryan Poplin, Pi-Chuan Chang, David H. Alexander et al. (2018). A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology , 36 (10) , 983-987. https://doi.org/10.1038/nbt.4235

Identifiers

DOI
10.1038/nbt.4235
PMID
30247488

Data Quality

Data completeness: 90%