MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

2002 Nucleic Acids Research 16,606 citations

Abstract

A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

Keywords

Fast Fourier transformComputer scienceMultiple sequence alignmentSplit-radix FFT algorithmSequence (biology)Parallel computingBenchmark (surveying)AlgorithmHeuristicsSequence alignmentFourier transformComputational scienceBiologyMathematicsPeptide sequenceFourier analysisShort-time Fourier transform

Affiliated Institutions

Related Publications

Publication Info

Year
2002
Type
article
Volume
30
Issue
14
Pages
3059-3066
Citations
16606
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

16606
OpenAlex

Cite This

Kazutaka Katoh (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research , 30 (14) , 3059-3066. https://doi.org/10.1093/nar/gkf436

Identifiers

DOI
10.1093/nar/gkf436