Abstract

The two main topics of this paper are the introduction of the "optimally\ntuned improper maximum likelihood estimator" (OTRIMLE) for robust clustering\nbased on the multivariate Gaussian model for clusters, and a comprehensive\nsimulation study comparing the OTRIMLE to Maximum Likelihood in Gaussian\nmixtures with and without noise component, mixtures of t-distributions, and the\nTCLUST approach for trimmed clustering. The OTRIMLE uses an improper constant\ndensity for modelling outliers and noise. This can be chosen optimally so that\nthe non-noise part of the data looks as close to a Gaussian mixture as\npossible. Some deviation from Gaussianity can be traded in for lowering the\nestimated noise proportion. Covariance matrix constraints and computation of\nthe OTRIMLE are also treated. In the simulation study, all methods are\nconfronted with setups in which their model assumptions are not exactly\nfulfilled, and in order to evaluate the experiments in a standardized way by\nmisclassification rates, a new model-based definition of "true clusters" is\nintroduced that deviates from the usual identification of mixture components\nwith clusters. In the study, every method turns out to be superior for one or\nmore setups, but the OTRIMLE achieves the most satisfactory overall\nperformance. The methods are also applied to two real datasets, one without and\none with known "true" clusters.\n

Keywords

OutlierCluster analysisMixture modelComputationEstimatorGaussianCovariance matrixMultivariate normal distributionCovarianceComputer scienceNoise (video)Robust statisticsAlgorithmMathematicsMultivariate statisticsStatisticsArtificial intelligencePhysics

Affiliated Institutions

Related Publications

Publication Info

Year
2016
Type
article
Volume
111
Issue
516
Pages
1648-1659
Citations
69
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

69
OpenAlex

Cite This

Pietro Coretto, Christian Hennig (2016). Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering. Journal of the American Statistical Association , 111 (516) , 1648-1659. https://doi.org/10.1080/01621459.2015.1100996

Identifiers

DOI
10.1080/01621459.2015.1100996