Abstract
An evaluation of four clustering methods and four external criterion measures was conducted with respect to the effect of the number of clusters, dimensionality, and relative cluster sizes on the recovery of true cluster structure. The four methods were the single link, complete link, group average (UPGMA), and Ward's minimum variance algorithms. The results indicated that the four criterion measures were generally consistent with each other, of which two highly similar pairs were identified. The tirst pair consisted of the Rand and corrected Rand statistics, and the second pair was the Jaccard and the Fowlkes and Mallows indexes. With respect to the methods, recovery was found to improve as the number of clusters increased and as the number of dimensions increased. The relative cluster size factor produced differential performance effects, with Ward's procedure providing the best recovery when the clusters were of equal size. The group average method gave equivalent or better recovery when the clusters were of unequal size.
Keywords
Affiliated Institutions
Related Publications
An Examination of Procedures for Determining the Number of Clusters in a Data Set
A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlappi...
Comparing three classification strategies for use in ecology
Abstract. We compare three common types of clustering algorithms for use with community data. TWINSPAN is divisive hierarchical, flexible‐UPGMA is agglomerative and hierarchical...
Strong Consistency of $K$-Means Clustering
A random sample is divided into the $k$ clusters that minimise the within cluster sum of squares. Conditions are found that ensure the almost sure convergence, as the sample siz...
Model-Based Gaussian and Non-Gaussian Clustering
Abstract : The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares cr...
Combining Mixture Components for Clustering
Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used....
Publication Info
- Year
- 1983
- Type
- article
- Volume
- PAMI-5
- Issue
- 1
- Pages
- 40-47
- Citations
- 170
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1109/tpami.1983.4767342