Abstract
Abstract A regression model is fitted to an observed set of data. How accurate is the model for predicting future observations? The apparent error rate tends to underestimate the true error rate because the data have been used twice, both to fit the model and to check its accuracy. We provide simple estimates for the downward bias of the apparent error rate. The theory applies to general exponential family linear models and general measures of prediction error. Special attention is given to the case of logistic regression on binary data, with error rates measured by the proportion of misclassified cases. Several connected ideas are compared: Mallows's Cp , cross-validation, generalized cross-validation, the bootstrap, and Akaike's information criterion.
Keywords
Affiliated Institutions
Related Publications
Model Selection and Akaike's Information Criterion (AIC): The General Theory and its Analytical Extensions
During the last fifteen years, Akaike's entropy-based Information Criterion (AIC) has had a fundamental impact in statistical model evaluation problems. This paper studies the g...
Partial least squares regression and projection on latent structure regression (PLS Regression)
Abstract Partial least squares (PLS) regression ( a.k.a. projection on latent structures) is a recent technique that combines features from and generalizes principal component a...
Further analysis of the data by Akaike's information criterion and the finite corrections
Using Akaike's information criterion, three examples of statistical data are reanalyzed and show reasonably definite conclusions. One is concerned with the multiple comparison p...
Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
Acoustic models used in hidden Markov model/neural-network (HMM/NN) speech recognition systems are usually trained with a frame-based cross-entropy error criterion. In contrast,...
Cross-Validatory Choice and Assessment of Statistical Predictions
Summary A generalized form of the cross-validation criterion is applied to the choice and assessment of prediction using the data-analytic concept of a prescription. The example...
Publication Info
- Year
- 1986
- Type
- article
- Volume
- 81
- Issue
- 394
- Pages
- 461-461
- Citations
- 419
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.2307/2289236