Abstract

Abstract Motivation: To construct a multiple sequence alignment (MSA) of a large number (>∼10 000) of sequences, the calculation of a guide tree with a complexity of O(N2) to O(N3), where N is the number of sequences, is the most time-consuming process. Results: To overcome this limitation, we have developed an approximate algorithm, PartTree, to construct a guide tree with an average time complexity of O(N log N). The new MSA method with the PartTree algorithm can align ∼60 000 sequences in several minutes on a standard desktop computer. The loss of accuracy in MSA caused by this approximation was estimated to be several percent in benchmark tests using Pfam. Availability: The present algorithm has been implemented in the MAFFT sequence alignment package (). Contact: katoh@bioreg.kyushu-u.ac.jp Supplementary information: Supplementary information is available at Bioinformatics online.

Keywords

Benchmark (surveying)Computer scienceMultiple sequence alignmentAlgorithmTree (set theory)Sequence (biology)Construct (python library)SoftwareProcess (computing)Data miningR packageSequence alignmentSoftware packageTheoretical computer scienceMathematicsCombinatoricsBiologyComputational sciencePeptide sequence

Affiliated Institutions

Related Publications

Publication Info

Year
2006
Type
article
Volume
23
Issue
3
Pages
372-374
Citations
108
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

108
OpenAlex

Cite This

Kazutaka Katoh, Hiroyuki Toh (2006). PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics , 23 (3) , 372-374. https://doi.org/10.1093/bioinformatics/btl592

Identifiers

DOI
10.1093/bioinformatics/btl592