A Simple Linear Time (1+ &#8714;) -Approximation Algorithm for k-Means Clustering in Any Dimensions

Abstract

We present the first linear time (1+ε)-approximation algorithm for the k-means problem for fixed k and ε. Our algorithm runs in O(nd) time, which is linear in the size of the input. Another feature of our algorithm is its simplicity – the only technique involved is random sampling. 1.

Keywords

SimplicityAlgorithmSimple (philosophy)Cluster analysisApproximation algorithmTime complexityComputer scienceSimple random sampleSampling (signal processing)SIMPLE algorithmMathematicsFeature (linguistics)Artificial intelligencePhysics

Affiliated Institutions

Related Publications

Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

Emmanuel J. Candès , Terence Tao

Suppose we are given a vector f in a class F ⊂ ℝN, e.g., a class of digital signals or digital images. How many linear measurements do we need to make about f to be able to reco...

2006 IEEE Transactions on Information Theory 6819 citations

Kernel k-means

Inderjit S. Dhillon , Yuqiang Guan , Brian Kulis

Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have ...

2004 1184 citations

Selection of Variables for Fitting Equations to Data

John W. Gorman , R. J. Toman

Selecting a suitable equation to represent a set of multifactor data that was collected for other purposes in a plant, pilot-plant, or laboratory can be troublesome. If there ar...

1966 Technometrics 271 citations

Similarity Search in High Dimensions via Hashing

Aristides Gionis , Piotr Indyk , Rajeev Motwani

The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasin...

1999 3096 citations

Fast Kernels for String and Tree Matching

S. V. N. Vishwanathan , Alexander J. Smola

In this paper we present a new algorithm suitable for matching discrete objects such as strings and trees in linear time, thus obviating dynamic programming with quadratic time ...

2004 The MIT Press eBooks 293 citations

Publication Info

Year: 2004
Type: article
Pages: 454-462
Citations: 241
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

A Simple Linear Time (1+ &#8714;) -Approximation Algorithm for k-Means Clustering in Any Dimensions

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

241

OpenAlex

Cite This

APA Style

                            
                                    Amit Kumar, 
                                
                                    Yogish Sabharwal, 
                                
                                    Subhankar Sen
                                
                            (2004). 
                            A Simple Linear Time (1+ &amp;#8714;) -Approximation Algorithm for k-Means Clustering in Any Dimensions. 
                            
                            , 454-462.
                            https://doi.org/10.1109/focs.2004.7

Identifiers

DOI: 10.1109/focs.2004.7