eteppo

3 Examples of Target Trial Emulation

Published: 2024-09-01

  1. Initial thoughts
  2. Surgical operation volume and patient mortality
  3. Corticosteroids and COVID19-mortality
  4. Screening colonoscopy and CRC
  5. Coming soon... (maybe)

Initial thoughts

I tried to outline the target trial emulation process in previous posts. This process has been designed to guide you towards asking well-defined causal questions and estimating answers to those questions in a way that can prevent common biases. In this post I want to go through some published applications of this process to understand it better.

The number of articles mentioning 'target trial emulation' might be growing exponentially right now. Hype-sense starts tickling: is it just a new buzzword? If you ask me (no-one does but this is my notepad), this will be a fantastic general framework for research (and teaching) at large. At some point we can stop using the word and use the methods as usual, as this framework is just about clearer explicit causal inference.

You can already stumble upon some confusion. TTE studies on completely observational data are still completely observational. 'Trial' refers to the research question and the 'synthetic patients' are part of a statistical calculation which doesn't make the result any harder to interpret. On the contrary, TTE produces far more easy-to-interpret results.

For observational and especially non-research data, the yield won't be as good as it has been in some top-expert RCT replication studies but it'll still be much better than before. Hopefully also robustness assessments will become more common since the researcher is forced to lay down their assumptions more explicitly.

Zuo (2023) already looked at TTE papers published until 2022 and it shows some core elements missing.

For randomized trials, we'll see much more bias-adjusted results along the naive intention-to-treat results – good. The key to better results will be really in the causal knowledge that is put to use. Hopefully the knowledge will be explicitly drawn and so we could see much more criticism on really specific substantive assumptions.

Surgical operation volume and patient mortality

Madenci (2021) demonstrated TTE by studying interventions on the number of operations that surgeons do and tried to estimate their effects on patient mortality. The basic structure of their research question as a target trial is the following (you might like to review my post on the rough TTE process as I use the same structure here):

Target trial

Target measures are intention-to-treat (\(z = 1\)) and per-protocol (\(z=1, \bar{a} = x\)), both with full follow-up (\(\bar{c}=\bar{0}\)) – difference of average mortalities in the comparison groups. TODO actual measure here.

Data analysis plan is based on estimating a logistic regression model for the binomial outcome \(Y\) given the continuous intervention \(X\), that is, \(E[Y|X] = \text{logistic}(\alpha_0 + \alpha f(X))\) where \(f\) is restricted cubic splines for non-linearity. Biases are handled as follows:

The authors share the code but it's a bit hard to read. I won't present it here but it's an important part of the plan (prespecified).

Emulation

Target measure is the effect of initiating the treatment strategy and remaining uncensored \(a_0 = x, \bar{c}=\bar{0}\) (observational analog of per-protocol) – specifically, the risk difference of each intervention arm compared with a baseline-maintaining intervention is estimated.

The authors provide simplified DAGs to illustrate the assumed confounding structure in the simple and sustained target trials. I tried to fill them a bit and drew the corresponding SWIG below. Note that the missing data mechanisms or conditioning on colliders is not included – use your imagination to fill that in too (sorry). \(L\) and \(U\) are vectors of variables so the graph can represent many different graphs where \(L\) is enough to achieve exchageability.

SWIG single SWIG sustained

Data analysis plan is the same as that of the target trial except for

Again the code is shared publicly for the extra emulation parts too but I don't present it here (it's also missing a license which would allow me to do that).

Pre- and post-inference checks

The authors don't report prespecifying pre- and post-inference checks. Anyway, their reported checks include checks on nonrandom violations of positivity and the amount of nonadherence to the sustained strategies. The authors also report a sensitivity analysis using a different way of handling zero operations during the outcome follow-up (by restriction to one or more operations, instead of 0-mortality coding). Also the effect of adjusting for time-varying and all confounders was explored.

Results

The authors report the usual table 1 of baseline variable summaries as well as table 2 and figure 3 of marginal means, and table 3 of mean differences (compared to 0, that is, baseline-maintaining treatment arm) are reported. The estimates are very flat but the sustained intervention estimates have fairly high sampling uncertainty (frequentist bootstrap-interval of the series of point-predictive regressions).

Corticosteroids and COVID19-mortality

Hoffman (2022) studied the effect of corticosteroid treatment on COVID19-mortality to show the difference between target trial emulation and traditional outcome regression. What's cool is that their TTE result agrees well with the RCT meta-analysis result that was already available at this point in history.

Target trial

Emulation

Pre- and post-inference checks

Results

Absolute difference between the treatment strategies was estimated to be about 5.7–7.5% (95% confidence interval). This close to the RCT meta-analysis (dominated by the RECOVERY trial) yielding an odds ratio 0.53–0.82 (95% CI).

Screening colonoscopy and CRC

García-Albéniz (2017) studied the effect of one-time screening colonoscopy on colorectal cancer risk on older people using insurance claim data.

Target trial

Emulation

Pre- and post-inference checks

The authors report that adjustment for confounders didn't materially change the result. They also checked how dropping health consciousness variables (like PSA-screening) affects the result.Sensitivity to a binary unmeasured confounder was explored using a simple function of the found risk difference and confounder prevalence and association. Sensitivity to calendar time was also checked.

Results

The 8-year risk was 0.4–0.8% (95%CI) points higher in the no-screening arm. Cumulative risk functions show a 1% risk at baseline in the screening arm and then slow growth until the no-screening arm surpasses the screening arm in cumulative risk after 4–5 years.

Coming soon... (maybe)

CC BY-SA 4.0 Eero Teppo. Last modified: March 23, 2025. Website built with Franklin.jl and the Julia programming language.