An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective imputation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal component analysis (PCA), which alternates the use of PCA and the Nearest-Neighbour Imputation (NNI) method in a forward, sequential procedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the iterative PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.
Solaro, N., Barbiero, A., Manzi, G., Ferrari, P. (2018). A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 88(18), 3588-3619 [10.1080/00949655.2018.1530773].
A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns
Solaro, N
Primo
;
2018
Abstract
An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective imputation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal component analysis (PCA), which alternates the use of PCA and the Nearest-Neighbour Imputation (NNI) method in a forward, sequential procedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the iterative PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.File | Dimensione | Formato | |
---|---|---|---|
257305.pdf
Solo gestori archivio
Tipologia di allegato:
Publisher’s Version (Version of Record, VoR)
Dimensione
3.24 MB
Formato
Adobe PDF
|
3.24 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.