Abstract
The two main topics of this paper are the introduction of the "optimally\ntuned improper maximum likelihood estimator" (OTRIMLE) for robust clustering\nbased on the multivariate Gaussian model for clusters, and a comprehensive\nsimulation study comparing the OTRIMLE to Maximum Likelihood in Gaussian\nmixtures with and without noise component, mixtures of t-distributions, and the\nTCLUST approach for trimmed clustering. The OTRIMLE uses an improper constant\ndensity for modelling outliers and noise. This can be chosen optimally so that\nthe non-noise part of the data looks as close to a Gaussian mixture as\npossible. Some deviation from Gaussianity can be traded in for lowering the\nestimated noise proportion. Covariance matrix constraints and computation of\nthe OTRIMLE are also treated. In the simulation study, all methods are\nconfronted with setups in which their model assumptions are not exactly\nfulfilled, and in order to evaluate the experiments in a standardized way by\nmisclassification rates, a new model-based definition of "true clusters" is\nintroduced that deviates from the usual identification of mixture components\nwith clusters. In the study, every method turns out to be superior for one or\nmore setups, but the OTRIMLE achieves the most satisfactory overall\nperformance. The methods are also applied to two real datasets, one without and\none with known "true" clusters.\n
Keywords
Affiliated Institutions
Related Publications
Model-Based Gaussian and Non-Gaussian Clustering
Abstract : The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares cr...
Estimating the components of a mixture of normal distributions
The problem of estimating the components of a mixture of two normal distributions, multivariate or otherwise, with common but unknown covariance matrices is examined. The maximu...
Combining Mixture Components for Clustering
Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used....
A unifying maximum-likelihood view of cumulant and polyspectral measures for non-Gaussian signal classification and estimation
Classification and estimation of non-Gaussian signals observed in additive Gaussian noise of unknown covariance are addressed using cumulants or polyspectra. By integrating idea...
On the degrees of freedom in shape-restricted regression
For the problem of estimating a regression function, $\\mu$ say,\nsubject to shape constraints, like monotonicity or convexity, it is argued that\nthe divergence of the maximum ...
Publication Info
- Year
- 2016
- Type
- article
- Volume
- 111
- Issue
- 516
- Pages
- 1648-1659
- Citations
- 69
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/01621459.2015.1100996