Co-clustering documents and words using bipartite spectral graph partitioning

Abstract

Both document clustering and word clustering are well studied problems. Most existing algorithms cluster documents and words separately but not simultaneously. In this paper we present the novel idea of modeling the document collection as a bipartite graph between documents and words, using which the simultaneous clustering problem can be posed as a bipartite graph partitioning problem. To solve the partitioning problem, we use a new spectral co-clustering algorithm that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings. The spectral algorithm enjoys some optimality properties; it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitioning problem. We present experimental results to verify that the resulting co-clustering algorithm works well in practice.

Keywords

Bipartite graphComputer scienceCluster analysisSpectral clusteringGraphGraph partitionArtificial intelligenceTheoretical computer science

Affiliated Institutions

The University of Texas at Austin US

Related Publications

New spectral methods for ratio cut partitioning and clustering

L. Hagen , Andrew B. Kahng

Partitioning of circuit netlists in VLSI design is considered. It is shown that the second smallest eigenvalue of a matrix derived from the netlist gives a provably good approxi...

1992 IEEE Transactions on Computer-Aided D... 1245 citations

Using Linear Algebra for Intelligent Information Retrieval

Michael W. Berry , Susan Dumais , Gavin O'Brien

Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users’ requests and those in or assigned to docum...

1995 SIAM Review 1482 citations

An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations

Bruce Hendrickson , Robert W. Leland

Efficient use of a distributed memory parallel computer requires that the computational load be balanced across processors in a way that minimizes interprocessor communication. ...

1995 SIAM Journal on Scientific Computing 481 citations

On clusterings

Ravi Kannan , Santosh Vempala , Adrian Vetta

We motivate and develop a natural bicriteria measure for assessing the quality of a clustering that avoids the drawbacks of existing measures. A simple recursive heuristic is sh...

2004 Journal of the ACM 842 citations

Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation

François Fouss , Alain Pirotte , Jean-Michel Renders +1 more

This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on...

2007 IEEE Transactions on Knowledge and Da... 1256 citations

Publication Info

Year: 2001
Type: article
Pages: 269-274
Citations: 1693
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Co-clustering documents and words using bipartite spectral graph partitioning

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1693

OpenAlex

Cite This

APA Style

                            
                                    Inderjit S. Dhillon
                                
                            (2001). 
                            Co-clustering documents and words using bipartite spectral graph partitioning. 
                            
                            , 269-274.
                            https://doi.org/10.1145/502512.502550

Identifiers

DOI: 10.1145/502512.502550