Imputation Methods for Missing Categorical Questionnaire Data: A Comparison of Approaches

Finch, W. Holmes

doi:10.6339/JDS.2010.08(3).612

Journal of Data Science

Similar articles

Predictive Mean Matching Imputation Procedure Based on Machine Learning Models for Complex Survey Data

Sixia Chen Chao Xu

https://doi.org/10.6339/24-JDS1135

Pub. online: 10 Jul 2024 Type: Statistical Data Science

Open Access

Journal: Journal of Data Science Volume 22, Issue 3 (2024): Special issue: The Government Advances in Statistical Programming (GASP) 2023 conference, pp. 456–468

Abstract

Edition and Imputation of Multiple Time Series Data Generated by Repetitive Surveys

Victor M. Guerrero Blanca I. Gaspar

https://doi.org/10.6339/JDS.2010.08(4).623

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 8, Issue 4 (2010), pp. 555–577

Abstract

Parametric Fractional Imputation for Longitudinal Data with Intermittent Missing Values

Ahmed M. Gad Hanan E. G. Ahmed

https://doi.org/10.6339/JDS.201904_17(2).0005

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 17, Issue 2 (2019), pp. 331–348

Abstract

Missing Information as a Diagnostic Tool for Latent Class Analysis

Ofer Harel Diana Miglioretti

https://doi.org/10.6339/JDS.2007.05(2).333

Pub. online: 4 Aug 2022 Type: Research Article

Open Access

Journal: Journal of Data Science Volume 5, Issue 2 (2007), pp. 269–288

Abstract

Comparison of Methods for Imputing Social Network Data

Ziqian Xu Jiarui Hai Yutong Yang All authors (4)

https://doi.org/10.6339/22-JDS1045

Pub. online: 20 Apr 2022 Type: Data Science Reviews

Open Access

Journal: Journal of Data Science Volume 21, Issue 3 (2023): Special Issue: Advances in Network Data Science, pp. 599–618

Abstract

Social network data often contain missing values because of the sensitive nature of the information collected and the dependency among the network actors. As a response, network imputation methods including simple ones constructed from network structural characteristics and more complicated model-based ones have been developed. Although past studies have explored the influence of missing data on social networks and the effectiveness of imputation procedures in many missing data conditions, the current study aims to evaluate a more extensive set of eight network imputation techniques (i.e., null-tie, Reconstruction, Preferential Attachment, Constrained Random Dot Product Graph, Multiple Imputation by Bayesian Exponential Random Graph Models or BERGMs, k-Nearest Neighbors, Random Forest, and Multiple Imputation by Chained Equations) under more practical conditions through comprehensive simulation. A factorial design for missing data conditions is adopted with factors including missing data types, missing data mechanisms, and missing data proportions, which are applied to generated social networks with varying numbers of actors based on 4 different sets of coefficients in ERGMs. Results show that the effectiveness of imputation methods differs by missing data types, missing data mechanisms, the evaluation criteria used, and the complexity of the social networks. More complex methods such as the BERGMs have consistently good performances in recovering missing edges that should have been present. While simpler methods like Reconstruction work better in recovering network statistics when the missing proportion of present edges is low, the BERGMs work better when more present edges are missing. The BERGMs also work well in recovering ERGM coefficients when the networks are complex and the missing data type is actor non-response. In conclusion, researchers analyzing social networks with incomplete data should identify the network structures of interest and the potential missing data types before selecting appropriate imputation methods.

RSS

Export citation

Copy and paste formatted citation

Download citation in file

Authors