Abstract

The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes. We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML. We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.

Keywords

Logistic regressionMachine learningRegressionStatisticsArtificial intelligenceComputer scienceRegression analysisLogistic model treeMedicineMathematics

MeSH Terms

AlgorithmsArea Under CurveHumansLogistic ModelsModelsTheoreticalOutcome AssessmentHealth CarePredictive Value of TestsSensitivity and SpecificitySupervised Machine Learning

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
review
Volume
110
Pages
12-22
Citations
1674
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1674
OpenAlex
39
Influential
1455
CrossRef

Cite This

Evangelia Christodoulou, Jie Ma, Gary S. Collins et al. (2019). A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology , 110 , 12-22. https://doi.org/10.1016/j.jclinepi.2019.02.004

Identifiers

DOI
10.1016/j.jclinepi.2019.02.004
PMID
30763612

Data Quality

Data completeness: 90%