Abstract
Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if p(n)/n→0. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ⪢ n.
Keywords
Affiliated Institutions
Related Publications
Principal component analysis
Abstract Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative d...
A Global Geometric Framework for Nonlinear Dimensionality Reduction
Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of ...
PCA-SIFT: a more distinctive representation for local image descriptors
Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid (June 2003) rec...
Cross-Validatory Choice of the Number of Components From a Principal Component Analysis
A method is described for choosing the number of components to retain in a principal component analysis when the aim is dimensionality reduction. The correspondence between prin...
Learning Eigenfunctions Links Spectral Embedding and Kernel PCA
In this letter, we show a direct relation between spectral embedding methods and kernel principal components analysis and how both are special cases of a more general learning p...
Publication Info
- Year
- 2009
- Type
- article
- Volume
- 104
- Issue
- 486
- Pages
- 682-693
- Citations
- 858
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1198/jasa.2009.0121