Abstract
Abstract Motivation: To construct a multiple sequence alignment (MSA) of a large number (>∼10 000) of sequences, the calculation of a guide tree with a complexity of O(N2) to O(N3), where N is the number of sequences, is the most time-consuming process. Results: To overcome this limitation, we have developed an approximate algorithm, PartTree, to construct a guide tree with an average time complexity of O(N log N). The new MSA method with the PartTree algorithm can align ∼60 000 sequences in several minutes on a standard desktop computer. The loss of accuracy in MSA caused by this approximation was estimated to be several percent in benchmark tests using Pfam. Availability: The present algorithm has been implemented in the MAFFT sequence alignment package (). Contact: katoh@bioreg.kyushu-u.ac.jp Supplementary information: Supplementary information is available at Bioinformatics online.
Keywords
Affiliated Institutions
Related Publications
Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
Abstract Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologi...
Adding unaligned sequences into an existing alignment using MAFFT and LAST
Abstract Two methods to add unaligned sequences into an existing multiple sequence alignment have been implemented as the ‘–add’ and ‘–addfragments’ options in the MAFFT package...
Using <i>de novo</i> protein structure predictions to measure the quality of very large multiple sequence alignments
Abstract Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for t...
Protein multiple sequence alignment benchmarking through secondary structure prediction
Abstract Motivation Multiple sequence alignment (MSA) is commonly used to analyze sets of homologous protein or DNA sequences. This has lead to the development of many methods a...
PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information
Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remai...
Publication Info
- Year
- 2006
- Type
- article
- Volume
- 23
- Issue
- 3
- Pages
- 372-374
- Citations
- 108
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/btl592