Multiple sequence alignment with hierarchical clustering

F. Corpet F. Corpet
1988 Nucleic Acids Research 5,327 citations

Abstract

An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is performed using the matrix of the pairwise alignment scores. The closest sequences are aligned creating groups of aligned sequences. Then close groups are aligned until all sequences are aligned in one group. The pairwise alignments included in the multiple alignment form a new matrix that is used to produce a hierarchical clustering. If it is different from the first one, iteration of the process can be performed. The method is illustrated by an example: a global alignment of 39 sequences of cytochrome c.

Keywords

Pairwise comparisonAlignment-free sequence analysisBiologyMultiple sequence alignmentHierarchical clusteringStructural alignmentSequence alignmentCluster analysisSequence (biology)Smith–Waterman algorithmComputational biologyDynamic programmingPattern recognition (psychology)GeneticsComputer scienceAlgorithmArtificial intelligencePeptide sequenceGene

MeSH Terms

AlgorithmsAmino Acid SequenceBacteriaBase SequenceCytochrome c GroupModelsGeneticMultigene Family

Affiliated Institutions

Related Publications

Publication Info

Year
1988
Type
article
Volume
16
Issue
22
Pages
10881-10890
Citations
5327
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

5327
OpenAlex
323
Influential
4303
CrossRef

Cite This

F. Corpet (1988). Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research , 16 (22) , 10881-10890. https://doi.org/10.1093/nar/16.22.10881

Identifiers

DOI
10.1093/nar/16.22.10881
PMID
2849754
PMCID
PMC338945

Data Quality

Data completeness: 86%