Abstract
Abstract We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the Akaike information criterion (AIC), the C p , and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n → ∞. We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n v -out cross-validation with n v , the number of observations reserved for validation, satisfying n v /n → 1 as n → ∞. This is a somewhat shocking discovery, because nv/n → 1 is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-n v -out cross-validation method are provided, and results from a simulation study are presented.
Keywords
Affiliated Institutions
Related Publications
A test of significance for partial least squares regression
Abstract Partial least squares (PLS) regression is a commonly used statistical technique for performing multivariate calibration, especially in situations where there are more v...
Selection bias in gene extraction on the basis of microarray gene-expression data
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue s...
Heuristics of instability and stabilization in model selection
In model selection, usually a "best" predictor is chosen from a collection ${\\hat{\\mu}(\\cdot, s)}$ of predictors where $\\hat{\\mu}(\\cdot, s)$ is the minimum least-squares p...
<b>brms</b>: An <i>R</i> Package for Bayesian Multilevel Models Using <i>Stan</i>
The brms package implements Bayesian multilevel models in R using the probabilistic programming language Stan. A wide range of distributions and link functions are supported, al...
Model Determination using Predictive Distributions with Implementation via Sampling-Based Methods
Abstract Model determination is divided into the issues of model adequacy and model selection. Predictive distributions are used to address both issues. This seems natural since...
Publication Info
- Year
- 1993
- Type
- article
- Volume
- 88
- Issue
- 422
- Pages
- 486-486
- Citations
- 324
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.2307/2290328