Using <i>de novo</i> protein structure predictions to measure the quality of very large multiple sequence alignments

Abstract

Abstract Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins. Availability and implementation: Benchmark data and scripts are available for download at http://www.bioinf.ucd.ie/download/ContTest.tar.gz. Contact: des.higgins@ucd.ie Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

Benchmark (surveying)Computer scienceMultiple sequence alignmentSequence (biology)Data miningScripting languageSequence alignmentRank (graph theory)Computational biologyBiologyPeptide sequenceProgramming languageMathematics

Affiliated Institutions

University College Dublin IE

Related Publications

Analysis and Comparison of Benchmarks for Multiple Sequence Alignment

Gordon Blackshields , Iain M. Wallace , Mark Larkin +1 more

The most popular way of comparing the performance of multiple sequence alignment programs is to use empirical testing on sets of test sequences. Several such test sets now exist...

2006 In Silico Biology 67 citations

MUSCLE: multiple sequence alignment with high accuracy and high throughput

R. C. Edgar

We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting,...

2004 Nucleic Acids Research 44728 citations

Recent Evolutions of Multiple Sequence Alignment Algorithms

Cédric Notredame

An ever-increasing number of biological modeling methods depend on the assembly of an accurate multiple sequence alignment (MSA). These include phylogenetic trees, profiles, and...

2007 PLoS Computational Biology 219 citations

Generating consensus sequences from partialorder multiple sequence alignment graphs

Christopher J. Lee

Abstract Motivation: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. Howe...

2003 Bioinformatics 99 citations

The Jalview Java alignment editor

Michèle Clamp , James Cuff , Stephen M. J. Searle +1 more

Abstract Summary: Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is kno...

2004 Bioinformatics 1538 citations

Publication Info

Year: 2015
Type: article
Volume: 32
Issue: 6
Pages: 814-820
Citations: 21
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Using <i>de novo</i> protein structure predictions to measure the quality of very large multiple sequence alignments

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Gearóid Fox, 
                                
                                    Fabian Sievers, 
                                
                                    Desmond G. Higgins
                                
                            (2015). 
                            Using <i>de novo</i> protein structure predictions to measure the quality of very large multiple sequence alignments. 
                            Bioinformatics
                            , 32
                            (6)
                            , 814-820.
                            https://doi.org/10.1093/bioinformatics/btv592

Identifiers

DOI: 10.1093/bioinformatics/btv592