Abstract

Abstract A method for identifying clusters of points in a multidimensional Euclidean space is described and its application to taxonomy considered. It reconciles, in a sense, two different approaches to the investigation of the spatial relationships between the points, viz., the agglomerative and the divisive methods. A graph, the shortest dendrite of Florek etal. (1951a), is constructed on a nearest neighbour basis and then divided into clusters by applying the criterion of minimum within cluster sum of squares. This procedure ensures an effective reduction of the number of possible splits. The method may be applied to a dichotomous division, but is perfectly suitable also for a global division into any number of clusters. An informal indicator of the "best number" of clusters is suggested. It is a"variance ratio criterion" giving some insight into the structure of the points. The method is illustrated by three examples, one of which is original. The results obtained by the dendrite method are compared with those obtained by using the agglomerative method or Ward (1963) and the divisive method of Edwards and Cavalli-Sforza (1965). Keywords: numerical taxonomy cluster analysis minimum variance (WGSS) criterion for optimal grouping approximate grouping procedure shortest dendrite = minimum spanning tree variance ratio criterion for best number of groups

Keywords

MathematicsDivision (mathematics)Hierarchical clusteringCluster (spacecraft)Euclidean distanceCombinatoricsAlgorithmCluster analysisComputer scienceStatisticsGeometryArithmetic

Related Publications

Publication Info

Year
1974
Type
article
Volume
3
Issue
1
Pages
1-27
Citations
6351
Access
Closed

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

6351
OpenAlex
330
Influential
3711
CrossRef

Cite This

T. Calinski, J. Harabasz (1974). A dendrite method for cluster analysis. Communication in Statistics- Theory and Methods , 3 (1) , 1-27. https://doi.org/10.1080/03610927408827101

Identifiers

DOI
10.1080/03610927408827101

Data Quality

Data completeness: 77%