Abstract

Cluster analysis of ranking data, which occurs in consumer questionnaires, voting forms or other inquiries of preferences, attempts to identify typical groups of rank choices. Empirically measured rankings are often incomplete, i.e. different numbers of filled rank positions cause heterogeneity in the data. We propose a mixture approach for clustering of heterogeneous rank data. Rankings of different lengths can be described and compared by means of a single probabilistic model. A maximum entropy approach avoids hidden assumptions about missing rank positions. Parameter estimators and an efficient EM algorithm for unsupervised inference are derived for the ranking mixture model. Experiments on both synthetic data and real-world data demonstrate significantly improved parameter estimates on heterogeneous data when the incomplete rankings are included in the inference process.

Keywords

Rank (graph theory)Cluster analysisInferenceRanking (information retrieval)Computer scienceEstimatorData miningMissing dataVotingCluster (spacecraft)Data modelingEntropy (arrow of time)StatisticsArtificial intelligenceMathematicsMachine learning

Affiliated Institutions

Related Publications

Applied Missing Data Analysis

Part 1. An Introduction to Missing Data. 1.1 Introduction. 1.2 Chapter Overview. 1.3 Missing Data Patterns. 1.4 A Conceptual Overview of Missing Data heory. 1.5 A More Formal De...

2010 6888 citations

Publication Info

Year
2007
Type
article
Pages
113-120
Citations
102
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

102
OpenAlex

Cite This

Ludwig Busse, Peter Orbanz, Joachim M. Buhmann (2007). Cluster analysis of heterogeneous rank data. , 113-120. https://doi.org/10.1145/1273496.1273511

Identifiers

DOI
10.1145/1273496.1273511