Abstract

Significance To make sense of protein sequences, they need to be compared with each other. It is common to make a multiple sequence alignment where gaps are inserted to line up homologous residues in columns. Automatic methods such as Clustal, Muscle, or Mafft have been widely used since the 1980s but have difficulty in making alignments of much more than a few thousand sequences. This is mainly due to the time required to calculate what is called the guide tree, a clustering of the sequences that is used to guide the multiple alignment. We have discovered that if you use simple chained guide trees, you can increase the accuracy of alignments and, in principle, make alignments of any size.

Keywords

Simple (philosophy)Sequence (biology)Multiple sequence alignmentHeuristicTree (set theory)Computer scienceLimitingConstruct (python library)AlgorithmSequence alignmentOrder (exchange)Computational biologyData miningMathematicsBiologyArtificial intelligenceCombinatoricsPeptide sequenceGeneticsEngineering

Affiliated Institutions

Related Publications

Publication Info

Year
2014
Type
article
Volume
111
Issue
29
Pages
10556-10561
Citations
42
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

42
OpenAlex

Cite This

Kieran Boyce, Fabian Sievers, Desmond G. Higgins (2014). Simple chained guide trees give high-quality protein multiple sequence alignments. Proceedings of the National Academy of Sciences , 111 (29) , 10556-10561. https://doi.org/10.1073/pnas.1405628111

Identifiers

DOI
10.1073/pnas.1405628111