Abstract

Abstract Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins. Availability and implementation: Benchmark data and scripts are available for download at http://www.bioinf.ucd.ie/download/ContTest.tar.gz. Contact: des.higgins@ucd.ie Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

Benchmark (surveying)Computer scienceMultiple sequence alignmentSequence (biology)Data miningScripting languageSequence alignmentRank (graph theory)Computational biologyBiologyPeptide sequenceProgramming languageMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2015
Type
article
Volume
32
Issue
6
Pages
814-820
Citations
21
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

21
OpenAlex

Cite This

Gearóid Fox, Fabian Sievers, Desmond G. Higgins (2015). Using <i>de novo</i> protein structure predictions to measure the quality of very large multiple sequence alignments. Bioinformatics , 32 (6) , 814-820. https://doi.org/10.1093/bioinformatics/btv592

Identifiers

DOI
10.1093/bioinformatics/btv592