An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective imputation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal component analysis (PCA), which alternates the use of PCA and the Nearest-Neighbour Imputation (NNI) method in a forward, sequential procedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the iterative PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.

Solaro, N., Barbiero, A., Manzi, G., Ferrari, P. (2018). A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 88(18), 3588-3619 [10.1080/00949655.2018.1530773].

A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns

Solaro, N
Primo
;
2018

Abstract

An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective imputation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal component analysis (PCA), which alternates the use of PCA and the Nearest-Neighbour Imputation (NNI) method in a forward, sequential procedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the iterative PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.
Articolo in rivista - Articolo scientifico
62-04; 62-07; 62H25; 62H99; Forward imputation; iterative principal component analysis; Mahalanobis distance; missForest; missing data; Monte Carlo simulation; multivariate exponential power distribution; multivariate skew-normal distribution; nearest-neighbour imputation;
English
2018
88
18
3588
3619
reserved
Solaro, N., Barbiero, A., Manzi, G., Ferrari, P. (2018). A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 88(18), 3588-3619 [10.1080/00949655.2018.1530773].
File in questo prodotto:
File Dimensione Formato  
257305.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 3.24 MB
Formato Adobe PDF
3.24 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/207845
Citazioni
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 10
Social impact