Abstract
We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.
Keywords
Affiliated Institutions
Related Publications
Cross-Validatory Choice of the Number of Components From a Principal Component Analysis
A method is described for choosing the number of components to retain in a principal component analysis when the aim is dimensionality reduction. The correspondence between prin...
Generalized Collinearity Diagnostics
Abstract Working in the context of the linear model y = Xβ + ε, we generalize the concept of variance inflation as a measure of collinearity to a subset of parameters in β (deno...
Sparsity and incoherence in compressive sampling
We consider the problem of reconstructing a sparse signal x^0\\in{\\bb R}^n from a limited number of linear measurements. Given m randomly selected samples of Ux0, where U is an...
Principal component analysis
Abstract Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative d...
Tensor Decompositions and Applications
This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or N-way array. Decompositions...
Publication Info
- Year
- 2009
- Type
- article
- Volume
- 10
- Issue
- 3
- Pages
- 515-534
- Citations
- 1563
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/biostatistics/kxp008