Abstract

Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we present the novel idea of modeling the document collection as a bipartite graph between documents and words, using which the simultaneous clustering problem can be posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use a new spectral co-clustering algorithm that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. The spectral algorithm enjoys some optimality properties; it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitioning problem. We present experimental results to verify that the resulting co-clustering algorithm works well in practice.

Keywords

Bipartite graphComputer scienceCluster analysisSpectral clusteringGraphGraph partitionArtificial intelligenceTheoretical computer science

Affiliated Institutions

Related Publications

On clusterings

We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is sh...

2004 Journal of the ACM 842 citations

Publication Info

Year
2001
Type
article
Pages
269-274
Citations
1693
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1693
OpenAlex

Cite This

Inderjit S. Dhillon (2001). Co-clustering documents and words using bipartite spectral graph partitioning. , 269-274. https://doi.org/10.1145/502512.502550

Identifiers

DOI
10.1145/502512.502550