Abstract

Abstract Motivation The emergence of high-throughput sequencing technologies revolutionized genomics in early 2000s. The next revolution came with the era of long-read sequencing. These technological advances along with novel computational approaches became the next step towards the automatic pipelines capable to assemble nearly complete mammalian-size genomes. Results In this manuscript, we demonstrate performance of the state-of-the-art genome assembly software on six eukaryotic datasets sequenced using different technologies. To evaluate the results, we developed QUAST-LG—a tool that compares large genomic de novo assemblies against reference sequences and computes relevant quality metrics. Since genomes generally cannot be reconstructed completely due to complex repeat patterns and low coverage regions, we introduce a concept of upper bound assembly for a given genome and set of reads, and compute theoretical limits on assembly correctness and completeness. Using QUAST-LG, we show how close the assemblies are to the theoretical optimum, and how far this optimum is from the finished reference. Availability and implementation http://cab.spbu.ru/software/quast-lg Supplementary information Supplementary data are available at Bioinformatics online.

Keywords

Sequence assemblyCorrectnessGenomeSoftwareComputer scienceGenomicsReference genomeSet (abstract data type)Computational biologyData miningTheoretical computer scienceBiologyAlgorithmGeneticsGeneProgramming language

MeSH Terms

AnimalsGenomicsHigh-Throughput Nucleotide SequencingHumansSaccharomyces cerevisiaeSequence AnalysisDNASoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
34
Issue
13
Pages
i142-i150
Citations
1437
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1437
OpenAlex
271
Influential
1263
CrossRef

Cite This

Alla Mikheenko, Andrey D. Prjibelski, Vladislav Saveliev et al. (2018). Versatile genome assembly evaluation with QUAST-LG. Bioinformatics , 34 (13) , i142-i150. https://doi.org/10.1093/bioinformatics/bty266

Identifiers

DOI
10.1093/bioinformatics/bty266
PMID
29949969
PMCID
PMC6022658

Data Quality

Data completeness: 90%