Cluster analysis of heterogeneous rank data

Ludwig Busse; Peter Orbanz; Joachim M. Buhmann

doi:10.1145/1273496.1273511

Abstract

Cluster analysis of ranking data, which occurs in consumer questionnaires, voting forms or other inquiries of preferences, attempts to identify typical groups of rank choices. Empirically measured rankings are often incomplete, i.e. different numbers of filled rank positions cause heterogeneity in the data. We propose a mixture approach for clustering of heterogeneous rank data. Rankings of different lengths can be described and compared by means of a single probabilistic model. A maximum entropy approach avoids hidden assumptions about missing rank positions. Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model. Experiments on both synthetic data and real-world data demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.

Keywords

Rank (graph theory)Cluster analysisInferenceRanking (information retrieval)Computer scienceEstimatorData miningMissing dataVotingCluster (spacecraft)Data modelingEntropy (arrow of time)StatisticsArtificial intelligenceMathematicsMachine learning

Affiliated Institutions

ETH Zurich CH

Related Publications

The EM Algorithm and Extensions

Debashis Kushary , Geoffrey J. McLachlan , Thriyambakam Krishnan

The first unified account of the theory, methodology, and applications of the EM algorithm and its extensionsSince its inception in 1977, the Expectation-Maximization (EM) algor...

1998 Technometrics 5108 citations

Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm

Ravi Varadhan , C. P. A. Roland

Abstract. The expectation‐maximization (EM) algorithm is a popular approach for obtaining maximum likelihood estimates in incomplete data problems because of its simplicity and ...

2008 Scandinavian Journal of Statistics 338 citations

Applied Missing Data Analysis

Craig K. Enders

Part 1. An Introduction to Missing Data. 1.1 Introduction. 1.2 Chapter Overview. 1.3 Missing Data Patterns. 1.4 A Conceptual Overview of Missing Data heory. 1.5 A More Formal De...

2010 6888 citations

Estimation and Hypothesis Testing in Finite Mixture Models

Murray Aitkin , Donald B. Rubin

SUMMARY Finite mixture models are a useful class of models for application to data. When sample sizes are not large and the number of underlying densities is in question, likeli...

1985 Journal of the Royal Statistical Soci... 302 citations

Maximum Likelihood from Incomplete Data Via the <i>EM</i> Algorithm

A. P. Dempster , N. M. Laird , Donald B. Rubin

Summary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone ...

1977 Journal of the Royal Statistical Soci... 48916 citations

Publication Info

Year: 2007
Type: article
Pages: 113-120
Citations: 102
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Cluster analysis of heterogeneous rank data

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

102

OpenAlex

Cite This

APA Style

                            
                                    Ludwig Busse, 
                                
                                    Peter Orbanz, 
                                
                                    Joachim M. Buhmann
                                
                            (2007). 
                            Cluster analysis of heterogeneous rank data. 
                            
                            , 113-120.
                            https://doi.org/10.1145/1273496.1273511

Identifiers

DOI: 10.1145/1273496.1273511