Bicocca Open Archive

We examined the setting in which a variable that is subject to missingness is used both as an inclusion/exclusion criterion for creating the analytic sample and subsequently as the primary exposure in the analysis model that is of scientific interest. An example is cancer stage, where patients with stage IV cancer are often excluded from the analytic sample, and cancer stage (I to III) is an exposure variable in the analysis model. We considered two analytic strategies. The first strategy, referred to as “exclude-then-impute,” excludes subjects for whom the observed value of the target variable is equal to the specified value and then uses multiple imputation to complete the data in the resultant sample. The second strategy, referred to as “impute-then-exclude,” first uses multiple imputation to complete the data and then excludes subjects based on the observed or filled-in values in the completed samples. Monte Carlo simulations were used to compare five methods (one based on “exclude-then-impute” and four based on “impute-then-exclude”) along with the use of a complete case analysis. We considered both missing completely at random and missing at random missing data mechanisms. We found that an impute-then-exclude strategy using substantive model compatible fully conditional specification tended to have superior performance across 72 different scenarios. We illustrated the application of these methods using empirical data on patients hospitalized with heart failure when heart failure subtype was used for cohort creation (excluding subjects with heart failure with preserved ejection fraction) and was also an exposure in the analysis model.

Austin, P., Giardiello, D., van Buuren, S. (2023). Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model. STATISTICS IN MEDICINE, 42(10), 1525-1541 [10.1002/sim.9685].

Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model

Austin, PC;Giardiello, D;van Buuren, S

2023

Abstract

We examined the setting in which a variable that is subject to missingness is used both as an inclusion/exclusion criterion for creating the analytic sample and subsequently as the primary exposure in the analysis model that is of scientific interest. An example is cancer stage, where patients with stage IV cancer are often excluded from the analytic sample, and cancer stage (I to III) is an exposure variable in the analysis model. We considered two analytic strategies. The first strategy, referred to as “exclude-then-impute,” excludes subjects for whom the observed value of the target variable is equal to the specified value and then uses multiple imputation to complete the data in the resultant sample. The second strategy, referred to as “impute-then-exclude,” first uses multiple imputation to complete the data and then excludes subjects based on the observed or filled-in values in the completed samples. Monte Carlo simulations were used to compare five methods (one based on “exclude-then-impute” and four based on “impute-then-exclude”) along with the use of a complete case analysis. We considered both missing completely at random and missing at random missing data mechanisms. We found that an impute-then-exclude strategy using substantive model compatible fully conditional specification tended to have superior performance across 72 different scenarios. We illustrated the application of these methods using empirical data on patients hospitalized with heart failure when heart failure subtype was used for cohort creation (excluding subjects with heart failure with preserved ejection fraction) and was also an exposure in the analysis model.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				missing data; Monte Carlo simulations; multiple imputation;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				19-feb-2023
			
	Data di pubblicazione
	
				2023
			
	Rivista
	
				STATISTICS IN MEDICINE
			
	Numero del volume
	
				42
			
	Fascicolo
	
				10
			
	Pagina iniziale
	
				1525
			
	Pagina finale
	
				1541
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1002/sim.9685
			
	URL alternativo
	
				https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9685
			
	Fulltext
	
				open
			
	Citazione
	
				Austin, P., Giardiello, D., van Buuren, S. (2023). Impute-then-exclude versus exclude-then-impute: Lessons when imputing a variable used both in cohort creation and as an independent variable in the analysis model. STATISTICS IN MEDICINE, 42(10), 1525-1541 [10.1002/sim.9685].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Austin-2023-Statistics in Medicine-VoR.pdf accesso aperto Descrizione: CC BY-NC 4.0 This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 3.38 MB Formato Adobe PDF Visualizza/Apri	3.38 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/520644

Citazioni

1

1

Social impact