Abstract

Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.

Keywords

Computer scienceCluster analysisLinear subspaceCliqueData miningScalabilityClustering high-dimensional dataTheoretical computer scienceCluster (spacecraft)Artificial intelligenceDatabaseMathematicsProgramming language

Affiliated Institutions

Related Publications

Publication Info

Year
1998
Type
article
Volume
27
Issue
2
Pages
94-105
Citations
656
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

656
OpenAlex

Cite This

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos et al. (1998). Automatic subspace clustering of high dimensional data for data mining applications. ACM SIGMOD Record , 27 (2) , 94-105. https://doi.org/10.1145/276305.276314

Identifiers

DOI
10.1145/276305.276314