Abstract

We performed a Monte Carlo study to evaluate the effect of the number of events per variable (EPV) analyzed in logistic regression analysis. The simulations were based on data from a cardiac trial of 673 patients in which 252 deaths occurred and seven variables were cogent predictors of mortality; the number of events per predictive variable was (252/7 =) 36 for the full sample. For the simulations, at values of EPV = 2, 5, 10, 15, 20, and 25, we randomly generated 500 samples of the 673 patients, chosen with replacement, according to a logistic model derived from the full sample. Simulation results for the regression coefficients for each variable in each group of 500 samples were compared for bias, precision, and significance testing against the results of the model fitted to the original sample. For EPV values of 10 or greater, no major problems occurred. For EPV values less than 10, however, the regression coefficients were biased in both positive and negative directions; the large sample variance estimates from the logistic model both overestimated and underestimated the sample variance of the regression coefficients; the 90% confidence limits about the estimated values did not have proper coverage; the Wald statistic was conservative under the null hypothesis; and paradoxical associations (significance in the wrong direction) were increased. Although other factors (such as the total number of events, or sample size) may influence the validity of the logistic model, our findings indicate that low EPV can lead to major problems.

Keywords

Logistic regressionStatisticsMathematicsStatisticRegression analysisSample size determinationVariance (accounting)VariablesSample (material)Confidence intervalWald testMonte Carlo methodVariable (mathematics)EconometricsStatistical hypothesis testing

Affiliated Institutions

Related Publications

Publication Info

Year
1996
Type
article
Volume
49
Issue
12
Pages
1373-1379
Citations
8241
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

8241
OpenAlex

Cite This

Peter Peduzzi, John Concato, Elizabeth Kemper et al. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology , 49 (12) , 1373-1379. https://doi.org/10.1016/s0895-4356(96)00236-3

Identifiers

DOI
10.1016/s0895-4356(96)00236-3