Abstract

Abstract Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms ( Escherichia coli , Saccharomyces cerevisiae , Neurospora crassa , Arabidopsis thaliana , and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research.

Keywords

Computational biologyBiologyGenomeNeurospora crassaShotgun sequencingSequence assemblyGeneticsHybrid genome assemblyWhole genome sequencingGeneTranscriptome

MeSH Terms

AnimalsArabidopsisDrosophila melanogasterEscherichia coliGenomeBacterialGenomeFungalGenomeInsectGenomePlantModelsAnimalNeurospora crassaSaccharomyces cerevisiaeSequence AnalysisDNA

Affiliated Institutions

Related Publications

The Phusion Assembler

The Phusion assembler has assembled the mouse genome from the whole-genome shotgun (WGS) dataset collected by the Mouse Genome Sequencing Consortium, at ∼7.5× sequence coverage,...

2002 Genome Research 220 citations

Publication Info

Year
2014
Type
article
Volume
1
Issue
1
Pages
140045-140045
Citations
161
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

161
OpenAlex
12
Influential

Cite This

Kristi E. Kim, Paul Peluso, Primo Babayan et al. (2014). Long-read, whole-genome shotgun sequence data for five model organisms. Scientific Data , 1 (1) , 140045-140045. https://doi.org/10.1038/sdata.2014.45

Identifiers

DOI
10.1038/sdata.2014.45
PMID
25977796
PMCID
PMC4365909

Data Quality

Data completeness: 86%