How Biased is the Apparent Error Rate of a Prediction Rule?

1986 Journal of the American Statistical Association 419 citations

Abstract

Abstract A regression model is fitted to an observed set of data. How accurate is the model for predicting future observations? The apparent error rate tends to underestimate the true error rate because the data have been used twice, both to fit the model and to check its accuracy. We provide simple estimates for the downward bias of the apparent error rate. The theory applies to general exponential family linear models and general measures of prediction error. Special attention is given to the case of logistic regression on binary data, with error rates measured by the proportion of misclassified cases. Several connected ideas are compared: Mallows's Cp , cross-validation, generalized cross-validation, the bootstrap, and Akaike's information criterion.

Keywords

Akaike information criterionStatisticsMathematicsLogistic regressionInformation CriteriaMean squared prediction errorRegressionWord error rateData setSet (abstract data type)Observational errorEconometricsModel selectionComputer scienceArtificial intelligence

Affiliated Institutions

Related Publications

Publication Info

Year
1986
Type
article
Volume
81
Issue
394
Pages
461-461
Citations
419
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

419
OpenAlex

Cite This

Bradley Efron (1986). How Biased is the Apparent Error Rate of a Prediction Rule?. Journal of the American Statistical Association , 81 (394) , 461-461. https://doi.org/10.2307/2289236

Identifiers

DOI
10.2307/2289236