Comparison of propensity score based methods for estimating marginal hazard ratios with composite unweighted and weighted endpoints: simulation study and application to hepatocellular carcinoma

Pacifico, C

Introduction My research activity aims to use the data from the HERCOLES study, a retrospective study on hepatocarcinoma, as an application example for the comparison of statistical methods for estimating the marginal effect of a certain treatment on standard survival endpoints (unweighted) and weighted composite endpoints. This last approach, unexplored to date, is motivated by the need to take into account the different clinical relevance of cause-specific events. In particular, death is considered the worst event but a greater relevance is also given to local recurrence compared to non-local one. To evaluate the statistical performance of these methods, two simulation protocols were developed. Methods To remove or reduce the effect of confounders (characteristics of the subject and other baseline factors that determine systematic differences between treatment groups) in order to quantify a marginal effect, it is necessary to use appropriate statistical methods, based on the Propensity Score (PS): the probability that a subject is assigned to a treatment conditional on the covariates measured at baseline. In my thesis I considered some of the PS-based methods available in literature (Austin 2013): - PS as a covariate with spline transformation - PS as a stratified categorical covariate with respect to quantiles - Pairing for PS - Inverse probability weighting (IPW) The marginal effect of the unweighted composite endpoint is measured in terms of marginal hazard ratio (HR) estimated using a Cox model. As regards the weighted composite endpoint, the estimator of the treatment effect is the non-parametric estimator of the ratio between cumulative hazards proposed by Ozga and Rauch (2019). Simulation protocol The data generation mechanism is similar for both simulation studies. In both simulation protocols, the data generation mechanism is similar to that used by Austin (2013). Specifically, with regard to the unweighted endpoint (Disease Free Survival), I simulated three scenarios by considering respectively three values for the marginal HR: HR=1 (scenario a); HR=1.5 (scenario b) and HR=2 (scenario c). In each scenario, I simulated 10,000 datasets consisting of 1,000 subjects and for the estimate of the PS I generated 12 confounders. The simulation study for the weighted endpoint provides for the same scenarios (a, b, c) combined with three types of weights for the two single endpoints: (w1,w2)=(1,1); (w1,w2)=(1,0.5); (w1,w2)=(1,0.8). In each scenario I simulated 1,000 data sets consisting of 1,000 subjects and for the estimate of the PS I generated 3 confounders. Furthermore, I considered only the two methods considered in the literature to be the most robust: IPW and PS pairing (Austin 2016). Results The results relating to the unweighted composite endpoint confirm what is already known in the literature: IPW is the most robust method based on PS, followed by matching for PS. The innovative aspect of my thesis concerns the implementation of simulation studies for the evaluation of the performance of PS-based methods in estimating the marginal effect of a certain treatment with respect to a weighted composite survival endpoint: the IPW is confirmed as the most accurate and precise method.

Introduzione La mia attività di ricerca si propone di utilizzare i dati dello studio HERCOLES, uno studio retrospettivo sull’epatocarcinoma, come esempio applicativo per il confronto di metodi statistici per la stima dell’effetto marginale di un certo trattamento su endpoint di sopravvivenza standard (non pesati) ed endpoint compositi pesati. Quest’ultimo approccio, non ancora esplorato, è motivato dalla necessità di tenere conto della diversa rilevanza clinica degli eventi causa-specifici. In particolare, la morte è considerata l'evento peggiore ma una rilevanza maggiore è data anche alla recidiva locale rispetto a quella non locale. Per confrontare la performance statistiche di tali metodi sono stati sviluppati due protocolli di simulazioni. Metodi Per rimuovere o ridurre l'effetto dei confondenti (caratteristiche del soggetto e da altri fattori al basale che determinano differenze sistematiche tra i gruppi di trattamento) al fine di quantificare un effetto marginale, è necessario l’utilizzo di metodi statistici appropriati, basati sul Propensity Score (PS):la probabilità che un soggetto sia assegnato ad un trattamento condizionatamente alle covariate misurate al basale. Nella mia tesi ho considerato alcuni tra i metodi disponibili in letteratura basati sul PS (Austin 2013): - PS come covariata con trasformazione spline - PS come covariata categorica stratificata rispetto ai quantili - Appaiamento per PS - Inverse probability weighting (IPW) L’effetto marginale dell’endpoint composito non pesato è misurato in termini di hazard ratio (HR) marginale stimato tramite un modello di Cox. Per quanto riguarda l’endpoint composito pesato, lo stimatore dell’effetto del trattamento è lo stimatore non-parametrico del rapporto tra hazard cumulativi proposto da Ozga e Rauch (2019). Protocollo simulazioni Il meccanismo di generazione dei dati è simile per entrambi gli studi di simulazione. In entrambi i protocolli di simulazione, Il meccanismo di generazione dei dati è simile a quello utilizzato da Austin (2013). Nello specifico, per quanto riguarda l’endpoint non pesato (DFS), ho simulato tre scenari considerando rispettivamente tre valori per l'HR marginale: HR=1 (scenario a); HR=1.5 (scenario b) and HR=2 (scenario c). In ogni scenario ho simulato 10.000 set di dati composti da 1.000 soggetti e per la stima del PS ho generato 12 confondenti. Lo studio di simulazione per l’endpoint pesato prevede gli stessi scenari (a,b,c) combinati con tre tipologie di pesi per i due endpoints singoli: (w1,w2)=(1,1); (w1,w2)=(1,0.5); (w1,w2)=(1,0.8). In ogni scenario ho simulato 1.000 set di dati composti da 1.000 soggetti e per la stima del PS ho generato 3 confondenti. Inoltre ho considerato solo i due metodi considerati in letteratura i più robusti: IPW e appaiamento per PS (Austin 2016). Risultati I risultati relativi all’endpoint composito non pesato confermano quanto già noto in letteratura: l’IPW è il metodo basato su PS più robusto, seguito dall’appaiamento per PS. L’aspetto innovativo della mia tesi riguarda l’implementazione di studi di simulazione per la valutazione della performance dei metodi basati sul PS nello stimare l’effetto marginale di un certo trattamento rispetto ad un endpoint di sopravvivenza composito pesato: l’IPW si conferma il metodo più accurato e preciso.

(2021). Comparison of propensity score based methods for estimating marginal hazard ratios with composite unweighted and weighted endpoints: simulation study and application to hepatocellular carcinoma. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2021).