Abstract

Genome assemblies that are accurate, complete and contiguous are essential for identifying important structural and functional elements of genomes and for identifying genetic variation. Nevertheless, most recent genome assemblies remain incomplete and fragmented. While long molecule sequencing promises to deliver more complete genome assemblies with fewer gaps, concerns about error rates, low yields, stringent DNA requirements and uncertainty about best practices may discourage many investigators from adopting this technology. Here, in conjunction with the platinum standard Drosophila melanogaster reference genome, we analyze recently published long molecule sequencing data to identify what governs completeness and contiguity of genome assemblies. We also present a hybrid meta-assembly approach that achieves remarkable assembly contiguity for both Drosophila and human assemblies with only modest long molecule sequencing coverage. Our results motivate a set of preliminary best practices for obtaining accurate and contiguous assemblies, a 'missing manual' that guides key decisions in building high quality de novo genome assemblies, from DNA isolation to polishing the assembly.

Keywords

BiologyGenomeSequence assemblyComputational biologyContiguityHybrid genome assemblyDrosophila melanogasterStructural variationDNA sequencingReference genomeGeneticsEvolutionary biologyDNAGene

MeSH Terms

AnimalsCell LineComputational BiologyDrosophila melanogasterGenomeGenomicsHigh-Throughput Nucleotide SequencingHumansSequence AnalysisDNA

Affiliated Institutions

Related Publications

Publication Info

Year
2016
Type
article
Volume
44
Issue
19
Pages
gkw654-gkw654
Citations
472
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

472
OpenAlex
217
CrossRef

Cite This

Mahul Chakraborty, James G. Baldwin-Brown, Anthony D. Long et al. (2016). Contiguous and accurate <i>de novo</i> assembly of metazoan genomes with modest long read coverage. Nucleic Acids Research , 44 (19) , gkw654-gkw654. https://doi.org/10.1093/nar/gkw654

Identifiers

DOI
10.1093/nar/gkw654
PMID
27458204
PMCID
PMC5100563

Data Quality

Data completeness: 86%