Linear Model Selection by Cross-Validation

Jun Shao

doi:10.2307/2290328

Abstract

Abstract We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the Akaike information criterion (AIC), the C p , and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n → ∞. We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n v -out cross-validation with n v , the number of observations reserved for validation, satisfying n v /n → 1 as n → ∞. This is a somewhat shocking discovery, because nv/n → 1 is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-n v -out cross-validation method are provided, and results from a simulation study are presented.

Keywords

Selection (genetic algorithm)Cross-validationModel selectionLinear modelStatisticsComputer scienceMathematicsEconometricsArtificial intelligence

Affiliated Institutions

University of Ottawa CA

Related Publications

A test of significance for partial least squares regression

Ian Wakeling , Jeff Morris

Abstract Partial least squares (PLS) regression is a commonly used statistical technique for performing multivariate calibration, especially in situations where there are more v...

1993 Journal of Chemometrics 124 citations

Selection bias in gene extraction on the basis of microarray gene-expression data

Christophe Ambroise , Geoffrey J. McLachlan

In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue s...

2002 Proceedings of the National Academy o... 1436 citations

Heuristics of instability and stabilization in model selection

Leo Breiman

In model selection, usually a "best" predictor is chosen from a collection ${\\hat{\\mu}(\\cdot, s)}$ of predictors where $\\hat{\\mu}(\\cdot, s)$ is the minimum least-squares p...

1996 The Annals of Statistics 1141 citations

brms: An R Package for Bayesian Multilevel Models Using Stan

Paul‐Christian Bürkner

The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan. A wide range of distributions and link functions are supported, al...

2017 Journal of Statistical Software 8224 citations

Model Determination using Predictive Distributions with Implementation via Sampling-Based Methods

Alan E. Gelfand , Dipak K. Dey , Haibin Chang

Abstract Model determination is divided into the issues of model adequacy and model selection. Predictive distributions are used to address both issues. This seems natural since...

1992 660 citations

Publication Info

Year: 1993
Type: article
Volume: 88
Issue: 422
Pages: 486-486
Citations: 324
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Linear Model Selection by Cross-Validation

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

324

OpenAlex

Cite This

APA Style

                            
                                    Jun Shao
                                
                            (1993). 
                            Linear Model Selection by Cross-Validation. 
                            Journal of the American Statistical Association
                            , 88
                            (422)
                            , 486-486.
                            https://doi.org/10.2307/2290328

Identifiers

DOI: 10.2307/2290328