Abstract

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as circumflexX = sigma(k=1)(K) d(k)u(k)v(k)(T), where d(k), u(k), and v(k) minimize the squared Frobenius norm of X - circumflexX, subject to penalties on u(k) and v(k). This results in a regularized version of the singular value decomposition. Of particular interest is the use of L(1)-penalties on u(k) and v(k), which yields a decomposition of X using sparse vectors. We show that when the PMD is applied using an L(1)-penalty on v(k) but not on u(k), a method for sparse principal components results. In fact, this yields an efficient algorithm for the "SCoTLASS" proposal (Jolliffe and others 2003) for obtaining sparse principal components. This method is demonstrated on a publicly available gene expression data set. We also establish connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006). In addition, we show that when the PMD is applied to a cross-products matrix, it results in a method for penalized canonical correlation analysis (CCA). We apply this penalized CCA method to simulated data and to a genomic data set consisting of gene expression and DNA copy number measurements on the same set of samples.

Keywords

Singular value decompositionPrincipal component analysisSparse PCAMathematicsMatrix normCanonical correlationRobust principal component analysisMatrix (chemical analysis)Low-rank approximationSparse matrixSparse approximationSet (abstract data type)AlgorithmMatrix decompositionCombinatoricsApplied mathematicsComputer scienceStatisticsPhysicsEigenvalues and eigenvectorsChemistryMathematical analysisHankel matrixComputational chemistry

Affiliated Institutions

Related Publications

Generalized Collinearity Diagnostics

Abstract Working in the context of the linear model y = Xβ + ε, we generalize the concept of variance inflation as a measure of collinearity to a subset of parameters in β (deno...

1992 Journal of the American Statistical A... 1512 citations

Principal component analysis

Abstract Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative d...

2010 Wiley Interdisciplinary Reviews Compu... 9554 citations

Publication Info

Year
2009
Type
article
Volume
10
Issue
3
Pages
515-534
Citations
1563
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1563
OpenAlex

Cite This

Daniela Witten, Robert Tibshirani, Trevor Hastie (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics , 10 (3) , 515-534. https://doi.org/10.1093/biostatistics/kxp008

Identifiers

DOI
10.1093/biostatistics/kxp008