Abstract
Despite advances in DNA sequencing technology, assembly of complex genomes remains a major challenge, particularly for genomes sequenced using short reads, which yield highly fragmented assemblies. Here we show that genome-wide in vivo chromatin interaction frequency data, which are measurable with chromosome conformation capture–based experiments, can be used as genomic distance proxies to accurately position individual contigs without requiring any sequence overlap. We also use these data to construct approximate genome scaffolds de novo. Applying our approach to incomplete regions of the human genome, we predict the positions of 65 previously unplaced contigs, in agreement with alternative methods in 26/31 cases attempted in common. Our approach can theoretically bridge any gap size and should be applicable to any species for which global chromatin interaction data can be generated.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing
The latest revolution in the DNA sequencing field has been brought about by the development of automated sequencers that are capable of generating giga base pair data sets quick...
Genome-Wide Mapping of in Vivo Protein-DNA Interactions
In vivo protein-DNA interactions connect each transcription factor with its direct targets to form a gene network scaffold. To map these protein-DNA interactions comprehensively...
IDBA-UD: a <i>de novo</i> assembler for single-cell and metagenomic sequencing data with highly uneven depth
Abstract Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. How...
The Sequence of the Human Genome
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp ...
The Phusion Assembler
The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ∼7.5× sequence coverage,...
Publication Info
- Year
- 2013
- Type
- article
- Volume
- 31
- Issue
- 12
- Pages
- 1143-1147
- Citations
- 199
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1038/nbt.2768
- PMID
- 24270850
- PMCID
- PMC3880131