Abstract
In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases. Even below the 'twilight zone' at 10-20% residue identity, the best programs were capable of correctly aligning on average 47% of the residues. We show that iterative algorithms often offer improved alignment accuracy though at the expense of computation time. A notable exception was the effect of introducing a single divergent sequence into a set of closely related sequences, causing the iteration to diverge away from the best alignment. Global alignment programs generally performed better than local methods, except in the presence of large N/C-terminal extensions and internal insertions. In these cases, a local algorithm was more successful in identifying the most conserved motifs. This study enables us to propose appropriate alignment strategies, depending on the nature of a particular set of sequences. The employment of more than one program based on different alignment techniques should significantly improve the quality of automatic protein sequence alignment methods. The results also indicate guidelines for improvement of alignment algorithms.
Keywords
Affiliated Institutions
Related Publications
MUSCLE: multiple sequence alignment with high accuracy and high throughput
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting,...
Analysis and Comparison of Benchmarks for Multiple Sequence Alignment
The most popular way of comparing the performance of multiple sequence alignment programs is to use empirical testing on sets of test sequences. Several such test sets now exist...
Generating consensus sequences from partialorder multiple sequence alignment graphs
Abstract Motivation: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. Howe...
MAFFT version 5: improvement in accuracy of multiple sequence alignment
The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i...
Aligning Multiple Genomic Sequences With the Threaded Blockset Aligner
We define a “threaded blockset,” which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for “threaded blockset aligner...
Publication Info
- Year
- 1999
- Type
- article
- Volume
- 27
- Issue
- 13
- Pages
- 2682-2690
- Citations
- 742
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/nar/27.13.2682