Abstract
Significance To make sense of protein sequences, they need to be compared with each other. It is common to make a multiple sequence alignment where gaps are inserted to line up homologous residues in columns. Automatic methods such as Clustal, Muscle, or Mafft have been widely used since the 1980s but have difficulty in making alignments of much more than a few thousand sequences. This is mainly due to the time required to calculate what is called the guide tree, a clustering of the sequences that is used to guide the multiple alignment. We have discovered that if you use simple chained guide trees, you can increase the accuracy of alignments and, in principle, make alignments of any size.
Keywords
Affiliated Institutions
Related Publications
The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools
CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system...
PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees
PANDIT is a database of homologous sequence alignments accompanied by estimates of their corresponding phylogenetic trees. It provides a valuable resource to those studying phyl...
MAFFT version 5: improvement in accuracy of multiple sequence alignment
The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i...
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individu...
MUSCLE: multiple sequence alignment with high accuracy and high throughput
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting,...
Publication Info
- Year
- 2014
- Type
- article
- Volume
- 111
- Issue
- 29
- Pages
- 10556-10561
- Citations
- 42
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1073/pnas.1405628111