Multiple imputation of discrete and continuous data by fully conditional specification

Abstract

The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification (FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical properties are known. JM is theoretically sound, but the joint model may lack flexibility needed to represent typical data features, potentially leading to bias. FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish. Simulation work shows that FCS behaves very well in the cases studied. The present paper reviews and compares the approaches. JM and FCS were applied to pubertal development data of 3801 Dutch girls that had missing data on menarche (two categories), breast development (five categories) and pubic hair development (six stages). Imputations for these data were created under two models: a multivariate normal model with rounding and a conditionally specified discrete model. The JM approach introduced biases in the reference curves, whereas FCS did not. The paper concludes that FCS is a useful and easily applied flexible alternative to JM when no convenient and realistic joint distribution can be specified.

Keywords

Missing dataImputation (statistics)Computer scienceMultivariate statisticsData miningParametric statisticsMultivariate normal distributionJoint probability distributionConditional probability distributionEconometricsStatisticsMathematicsMachine learning

Affiliated Institutions

Utrecht University NL

Related Publications

Semiparametric ARCH Models

Robert F. Engle , Gloria González‐Rivera

This article introduces a semiparametric autoregressive conditional heteroscedasticity (ARCH) model that has conditional first and second moments given by autoregressive moving ...

1991 Journal of Business and Economic Stat... 477 citations

MCMC Methods for Multi-Response Generalized Linear Mixed Models: TheMCMCglmmRPackage

Jarrod D. Hadfield

Generalized linear mixed models provide a flexible framework for modeling a range of data, although with non-Gaussian response variables the likelihood cannot be obtained in clo...

2010 Journal of Statistical Software 4603 citations

Multiple Tests for Different Sets of Variables Using a Data‐Driven Ordering of Hypotheses, with an Application to Gene Expression Data

Siegfried Kropf , Jürgen Läuter

A multiple parametric test procedure is proposed, which considers tests of means of several variables. The single variables or subsets of variables are ordered according to a da...

2002 Biometrical Journal 31 citations

Testing the Conditional Independence and Monotonicity Assumptions of Item Response Theory

Paul R. Rosenbaum

When item characteristic curves are nondecreasing functions of a latent variable, the conditional or local independence of item responses given the latent variable implies nonne...

1984 Psychometrika 202 citations

Maximum Likelihood from Incomplete Data Via the EM Algorithm

A. P. Dempster , N. M. Laird , Donald B. Rubin

Summary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone ...

1977 Journal of the Royal Statistical Soci... 48916 citations

Publication Info

Year: 2007
Type: article
Volume: 16
Issue: 3
Pages: 219-242
Citations: 2681
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Multiple imputation of discrete and continuous data by fully conditional specification

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

2681

OpenAlex

Cite This

APA Style

                            
                                    Stef van Buuren
                                
                            (2007). 
                            Multiple imputation of discrete and continuous data by fully conditional specification. 
                            Statistical Methods in Medical Research
                            , 16
                            (3)
                            , 219-242.
                            https://doi.org/10.1177/0962280206074463

Identifiers

DOI: 10.1177/0962280206074463

Multiple imputation of discrete and continuous data by fully conditional specification

Abstract

Keywords

Affiliated Institutions

Related Publications

Semiparametric ARCH Models

MCMC Methods for Multi-Response Generalized Linear Mixed Models: The<b>MCMCglmm</b><i>R</i>Package

Multiple Tests for Different Sets of Variables Using a Data‐Driven Ordering of Hypotheses, with an Application to Gene Expression Data

Testing the Conditional Independence and Monotonicity Assumptions of Item Response Theory

Maximum Likelihood from Incomplete Data Via the <i>EM</i> Algorithm

Publication Info

External Links

Social Impact

Citation Metrics

Cite This

Identifiers