Abstract

Despite advances in DNA sequencing technology, assembly of complex genomes remains a major challenge, particularly for genomes sequenced using short reads, which yield highly fragmented assemblies. Here we show that genome-wide in vivo chromatin interaction frequency data, which are measurable with chromosome conformation capture–based experiments, can be used as genomic distance proxies to accurately position individual contigs without requiring any sequence overlap. We also use these data to construct approximate genome scaffolds de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods in 26/31 cases attempted in common. Our approach can theoretically bridge any gap size and should be applicable to any species for which global chromatin interaction data can be generated.

Keywords

GenomeContigChromosome conformation captureComputational biologyChromatinBiologyDNA sequencingChromosomeGenomicsGeneticsHuman genomeDNAGene

MeSH Terms

AlgorithmsContig MappingDNAData InterpretationStatisticalGene FrequencyHigh-Throughput Nucleotide SequencingSequence AnalysisDNA

Affiliated Institutions

Related Publications

The Phusion Assembler

The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ∼7.5× sequence coverage,...

2002 Genome Research 220 citations

Publication Info

Year
2013
Type
article
Volume
31
Issue
12
Pages
1143-1147
Citations
199
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

199
OpenAlex
1
Influential
182
CrossRef

Cite This

N. Kaplan, Job Dekker (2013). High-throughput genome scaffolding from in vivo DNA interaction frequency. Nature Biotechnology , 31 (12) , 1143-1147. https://doi.org/10.1038/nbt.2768

Identifiers

DOI
10.1038/nbt.2768
PMID
24270850
PMCID
PMC3880131

Data Quality

Data completeness: 86%