Abstract

In many multivariate statistical techniques, a set of linear functions of the original p variables is produced. One of the more difficult aspects of these techniques is the interpretation of the linear functions, as these functions usually have nonzero coefficients on all p variables. A common approach is to effectively ignore (treat as zero) any coefficients less than some threshold value, so that the function becomes simple and the interpretation becomes easier for the users. Such a procedure can be misleading. There are alternatives to principal component analysis which restrict the coefficients to a smaller number of possible values in the derivation of the linear functions, or replace the principal components by “principal variables.” This article introduces a new technique, borrowing an idea proposed by Tibshirani in the context of multiple regression where similar problems arise in interpreting regression equations. This approach is the so-called LASSO, the “least absolute shrinkage and selection operator,” in which a bound is introduced on the sum of the absolute values of the coefficients, and in which some coefficients consequently become zero. We explore some of the properties of the new technique, both theoretically and using simulation studies, and apply it to an example.

Keywords

Lasso (programming language)Principal component analysisMathematicsApplied mathematicsFunctional principal component analysisLinear regressionContext (archaeology)Principal component regressionInterpretation (philosophy)Multivariate statisticsFunction (biology)Mathematical optimizationStatisticsComputer science

Affiliated Institutions

Related Publications

Principal component analysis

Abstract Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative d...

2010 Wiley Interdisciplinary Reviews Compu... 9554 citations

Multicollinearity

Abstract Multicollinearity refers to the linear relation among two or more variables. It is a data problem which may cause serious difficulty with the reliability of the estimat...

2010 Wiley Interdisciplinary Reviews Compu... 840 citations

Publication Info

Year
2003
Type
article
Volume
12
Issue
3
Pages
531-547
Citations
796
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

796
OpenAlex

Cite This

Ian T. Jolliffe, Nickolay T. Trendafilov, Mudassir Uddin (2003). A Modified Principal Component Technique Based on the LASSO. Journal of Computational and Graphical Statistics , 12 (3) , 531-547. https://doi.org/10.1198/1061860032148

Identifiers

DOI
10.1198/1061860032148