Abstract
Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled. We review a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology and discuss recent developments in model-based clustering for non-Gaussian data, high-dimensional datasets, large datasets, and Bayesian estimation.
Keywords
Affiliated Institutions
Related Publications
Detecting Features in Spatial Point Processes with Clutter via Model-Based Clustering
Abstract We consider the problem of detecting features, such as minefields or seismic faults, in spatial point processes when there is substantial clutter. We use model-based cl...
Nonlinear Dimensionality Reduction by Locally Linear Embedding
Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensional...
Pattern classification and scene analysis
Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theo...
Cluster-Sample Methods in Applied Econometrics
Inference methods that recognize the clustering of individual observations have been available for more than 25 years. Brent Moulton (1990) caught the attention of economists wh...
Introduction to Multivariate Analysis
Part One. Multivariate distributions. Preliminary data analysis. Part Two: Finding new underlying variables. Principal component analysis. Factor analysis. Part Three: Procedure...
Publication Info
- Year
- 2002
- Type
- article
- Volume
- 97
- Issue
- 458
- Pages
- 611-631
- Citations
- 4130
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1198/016214502760047131