Abstract
Abstract Motivation : Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects. Results : Here we present proovread , a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli , Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread -corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing. Availability and implementation: proovread is available at the following URL: http://proovread.bioapps.biozentrum.uni-wuerzburg.de Contact : frank.foerster@biozentrum.uni-wuerzburg.de Supplementary information: Supplementary data are available at Bioinformatics online.
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
Error filtering, pair assembly and error correction for next-generation sequencing reads
Abstract Motivation: Next-generation sequencing produces vast amounts of data with errors that are difficult to distinguish from true biological variation when coverage is low. ...
IDBA-UD: a <i>de novo</i> assembler for single-cell and metagenomic sequencing data with highly uneven depth
Abstract Motivation: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. How...
Fragment assembly with short reads
Abstract Motivation: Current DNA sequencing technology produces reads of about 500–750 bp, with typical coverage under 10×. New sequencing technologies are emerging that produce...
Minimap2: pairwise alignment for nucleotide sequences
Abstract Motivation Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic cont...
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitation...
Publication Info
- Year
- 2014
- Type
- article
- Volume
- 30
- Issue
- 21
- Pages
- 3004-3011
- Citations
- 549
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btu392
- PMID
- 25015988
- PMCID
- PMC4609002