Abstract

Missing data frequently complicates data analysis for scientific investigations. The development of statistical methods to address missing data has been an active area of research in recent decades. Multiple imputation, originally proposed by Rubin in a public use dataset setting, is a general purpose method for analyzing datasets with missing data that is broadly applicable to a variety of missing data settings. We review multiple imputation as an analytic strategy formissing data. Wedescribe and evaluate a number of software packages that implement this procedure, and contrast the interface, features, and results. We compare the packages, and detail shortcomings and useful features. The comparisons are illustrated using examples from an artificial dataset and a study of child psychopathology. We suggest additional features as well as discuss limitations and cautions to consider when using multiple imputation as an analytic strategy for incomplete data settings.

Keywords

Missing dataImputation (statistics)Computer scienceData miningData scienceMachine learning

Affiliated Institutions

Related Publications

Multiple Imputation of Missing Values

Following the seminal publications of Rubin about thirty years ago, statisticians have become increasingly aware of the inadequacy of “complete-case” analysis of datasets with m...

2004 The Stata Journal Promoting communica... 2310 citations

Publication Info

Year
2001
Type
article
Volume
55
Issue
3
Pages
244-254
Citations
597
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

597
OpenAlex

Cite This

Nicholas J. Horton, Stuart R. Lipsitz (2001). Multiple Imputation in Practice. The American Statistician , 55 (3) , 244-254. https://doi.org/10.1198/000313001317098266

Identifiers

DOI
10.1198/000313001317098266