3 Examples of Target Trial Emulation

Published: 2024-09-01

Initial thoughts
Surgical operation volume and patient mortality
Corticosteroids and COVID19-mortality
Screening colonoscopy and CRC
Coming soon... (maybe)

Initial thoughts

I tried to outline the target trial emulation process in previous posts. This process has been designed to guide you towards asking well-defined causal questions and estimating answers to those questions in a way that can prevent common biases. In this post I want to go through some published applications of this process to understand it better.

The number of articles mentioning 'target trial emulation' might be growing exponentially right now. Hype-sense starts tickling: is it just a new buzzword? If you ask me (no-one does but this is my notepad), this will be a fantastic general framework for research (and teaching) at large. At some point we can stop using the word and use the methods as usual, as this framework is just about clearer explicit causal inference.

You can already stumble upon some confusion. TTE studies on completely observational data are still completely observational. 'Trial' refers to the research question and the 'synthetic patients' are part of a statistical calculation which doesn't make the result any harder to interpret. On the contrary, TTE produces far more easy-to-interpret results.

For observational and especially non-research data, the yield won't be as good as it has been in some top-expert RCT replication studies but it'll still be much better than before. Hopefully also robustness assessments will become more common since the researcher is forced to lay down their assumptions more explicitly.

Zuo (2023) already looked at TTE papers published until 2022 and it shows some core elements missing.

For randomized trials, we'll see much more bias-adjusted results along the naive intention-to-treat results – good. The key to better results will be really in the causal knowledge that is put to use. Hopefully the knowledge will be explicitly drawn and so we could see much more criticism on really specific substantive assumptions.

Surgical operation volume and patient mortality

Madenci (2021) demonstrated TTE by studying interventions on the number of operations that surgeons do and tried to estimate their effects on patient mortality. The basic structure of their research question as a target trial is the following (you might like to review my post on the rough TTE process as I use the same structure here):

Target trial

Eligibility criteria: Surgeons with one or more CABG operations during each of two subsequent 90-day intervals and willing to change operative volume. Patients seeking their first CABG operation during outcome follow-up.
Treatment strategies: An absolute change to the baseline operative volume of surgeons lasting for one (trial 1 and 3) or four (trials 2 and 4) 90-day intervals. Implementation details of the assigned change is left to the surgeons to decide.
- Simple single-interval interventions (trial 1): A random integer between -5 and 5 is added to the baseline (k = -1) operation volume (11-arm trial). Intervention period is k = 0 and follow-up period is k = 1.
- Simple sustained interventions (trial 2): The same as before but the assigned operation volume must be sustained up to k = 3 (four intervals) except during one interval to allow for reasonable deviations. Follow-up period is k = 4.
- Practical single-interval interventions (trial 3): The same as the single-interval intervention but patients are randomly redistributed between above-median and below-median surgeons so that the number of patients is similar as it was during baseline. This mechanism is a bit complicated so I don't copy it here but just to note that the authors reported it in the supplement.
- Practical sustained interventions (trial 4): The same as the sustained intervention but patients are randomly redistributed between above-median and below-median surgeons so that the number of patients is similar as it was during baseline.
Treatment assignment: Randomized unblinded.
Time zero: Surgeon follow-up starts at treatment assignment and outcome follow-up after the intervention period.
End of follow-up: Surgeon follow-up ends 90 days after the end of intervention and outcome follow-up ends, for each patient, after 90 days from operation for each patient.
Primary outcome: 90-day all-cause post-operative mortality in first-CABG patients that were operated during the surgeon follow-up.

Target measures are intention-to-treat (\(z = 1\)) and per-protocol (\(z=1, \bar{a} = x\)), both with full follow-up (\(\bar{c}=\bar{0}\)) – difference of average mortalities in the comparison groups. TODO actual measure here.

Data analysis plan is based on estimating a logistic regression model for the binomial outcome \(Y\) given the continuous intervention \(X\), that is, \(E[Y|X] = \text{logistic}(\alpha_0 + \alpha f(X))\) where \(f\) is restricted cubic splines for non-linearity. Biases are handled as follows:

Selection bias from conditioning on loss to non-adherence and loss to follow-up in single-interval trials are adjusted by standardizing over a list of confounders including surgeon, hospital, and patient characteristics (the list is given below).
Selection bias from conditioning on loss to non-adherence and loss to follow-up in sustained intervention trials is adjusted using inverse probability weighting over baseline and time-varying confounders (list below). IPW-estimation is based on fitting 1) negative binomial regression models for operative volume at each time interval given the previous volume and the confounder history and 2) on fitting logistic regression models for loss to non-adherence and follow-up given the previous volume and the same confounder history. Time-varying intercepts and coefficients are used, and the overdispersion parameter is estimated separately with a similar gamma regression model.
As a sensitivity analysis, the target measure is re-estimated under an intervention on censoring that completely prevents missing data from loss to non-adherence and follow-up. This is done by re-estimating the time-varying inverse probability weights conditional on no censoring. Time-fixed covariates are adjusted using standardization.

The authors share the code but it's a bit hard to read. I won't present it here but it's an important part of the plan (prespecified).

Emulation

Eligibility criteria: The same as the target trial using the insurance database.
Treatment strategies: The same as in the target trial using the insurance data.
Treatment assignment: Surgeons are assigned to the observed operation volume during all 90-day intervals. This assignment is assumed conditionally randomized given a list of surgeon and hospital variables in the insurance data and hospital surveys. Positivity and consistency are assumed.
Time zero: The same as in the target trial using the insurance data.
End of follow-up: The same as in the target trial using the insurance data.
Primary outcome: The same as in the target trial using the insurance data.

Target measure is the effect of initiating the treatment strategy and remaining uncensored \(a_0 = x, \bar{c}=\bar{0}\) (observational analog of per-protocol) – specifically, the risk difference of each intervention arm compared with a baseline-maintaining intervention is estimated.

The authors provide simplified DAGs to illustrate the assumed confounding structure in the simple and sustained target trials. I tried to fill them a bit and drew the corresponding SWIG below. Note that the missing data mechanisms or conditioning on colliders is not included – use your imagination to fill that in too (sorry). \(L\) and \(U\) are vectors of variables so the graph can represent many different graphs where \(L\) is enough to achieve exchageability.

SWIG single SWIG sustained

Data analysis plan is the same as that of the target trial except for

confounding adjustment using IPW to emulate baseline and sequential randomization
running multiple trials for time intervals where surgeons meet eligibility criteria multiple times and
using the cloning-censoring technique when the observed operation volume is compatible with multiple treatment strategies.

Again the code is shared publicly for the extra emulation parts too but I don't present it here (it's also missing a license which would allow me to do that).

Pre- and post-inference checks

The authors don't report prespecifying pre- and post-inference checks. Anyway, their reported checks include checks on nonrandom violations of positivity and the amount of nonadherence to the sustained strategies. The authors also report a sensitivity analysis using a different way of handling zero operations during the outcome follow-up (by restriction to one or more operations, instead of 0-mortality coding). Also the effect of adjusting for time-varying and all confounders was explored.

Results

The authors report the usual table 1 of baseline variable summaries as well as table 2 and figure 3 of marginal means, and table 3 of mean differences (compared to 0, that is, baseline-maintaining treatment arm) are reported. The estimates are very flat but the sustained intervention estimates have fairly high sampling uncertainty (frequentist bootstrap-interval of the series of point-predictive regressions).

Corticosteroids and COVID19-mortality

Hoffman (2022) studied the effect of corticosteroid treatment on COVID19-mortality to show the difference between target trial emulation and traditional outcome regression. What's cool is that their TTE result agrees well with the RCT meta-analysis result that was already available at this point in history.

Target trial

Eligibility criteria: Primary (non-transferred) adult patients admitted to certain hospitals, with a SARS-CoV-2-positive (RT-PCR) nasopharyngeal sample on the day of admission, but without prior chronic use of corticosteroids.
Treatment strategies: Standard of care is combined with a corticosteroid (at least 0.5 mg/kg methylprednisolone-equivalents per 24-hours, for 6 days) if patient has severe hypoxia (initiation of high-flow nasal cannula or more intensive ventilation, or StO2 less than 93% after 6 litres of oxygen supplementation via nasal cannula). Control strategy is standard of care without corticosteroid use.
Treatment assignment: randomized unblinded with perfect adherence.
Time zero: At treatment assignment within the first day of hospitalization.
End of follow-up: 28 days from treatment assignment with no loss to follow-up.
Primary outcome: all-cause mortality.
Target measure: Absolute difference in proportions in assignment-adhering groups.
Data analysis: Difference of sample proportions (and maybe some simple confidence interval around the sample estimate, the authors don't specify this).

Emulation

Eligibility criteria: The same as in the target trial according to the target hospitals' medical records from March 1st to May 15th in 2020.
Treatment strategies: The same as in target trial. (Standard of care is not specifically defined here. It is just what else the doctors did in these hospitals.)
Treatment assignment: Patient are assigned to observed treatment and no loss to follow-up which are assumed to be conditionally (sequentially) randomized given a certain list of variables covering demographics, obesity, co-morbidities, and hospital location at baseline, as well as certain vital signs, lab results, and cotreatments over time. Missing data indicators were also used as confounders.
Time zero: At treatment initiation.
End of follow-up: The same as target trial except for loss to follow-up by discharge or transfers to external hospitals.
Primary outcome: The same as in target trial.
Target measure: The same as in target trial (joint effect of treatment strategy and no loss to follow-up).
Data analysis: SDR-estimation (sequentially doubly robust) of the g-formula using the super learner algorithm (with cross-validation to prevent overfitting) to approximate the treatment and outcome models.

Pre- and post-inference checks

The authors report a sensitivity analysis for missing data. They checked how prognostic the presence of values is for the outcome (expected since sicker patients get more tests). In primary analysis, they included missing data indicators as confounders to account for this and imputed medians, but in a sensitivity analysis they did the sequential imputation using MICE (multiple imputation chained equations).
The authors report a sensitivity analysis where they remove cotreatments as confounders.

Results

Absolute difference between the treatment strategies was estimated to be about 5.7–7.5% (95% confidence interval). This close to the RCT meta-analysis (dominated by the RECOVERY trial) yielding an odds ratio 0.53–0.82 (95% CI).

Screening colonoscopy and CRC

García-Albéniz (2017) studied the effect of one-time screening colonoscopy on colorectal cancer risk on older people using insurance claim data.

Target trial

Eligibility criteria: 70-74 year old people without current gastrointestinal symptoms or history of colorectal cancer. Continuous 5-year enrollment in an insurance program without adenoma, IBD, colectomy, or CRC screening. At least two preventive care visits (visit, vaccine, other screening) within two years.
Treatment strategies: Screening colonoscopy at baseline with usual routine after. Control is no screening colonoscopy at baseline, just usual care.
Treatment assignment: randomized unblinded.
Time zero: At treatment assignment.
End of follow-up: Colorectal cancer diagnosis, death, loss to follow-up, 8 years, or study end date.
Primary outcome: Colorectal cancer diagnosis within 8 years from baseline.
Target measures: Absolute difference of proportions by adhered treatment assignment (per-protocol).
Data analysis: Absolute difference of sample proportions comparing assignment-adhering groups, adjusting for assumed confounders through adherence and loss to follow-up with inverse-probability weighting.

Emulation

Eligibility criteria: The same except the history of CRC is defined only within 5 years.
Treatment strategies: The same as in target trial.
Treatment assignment: Patients are assigned to treatment that they received within seven days following eligibility (received screening or not). This is assumed conditionally randomized given calendar month, demographics, previous use of preventive care, and comorbidities.
Time zero: Same.
End of follow-up: Same.
Primary outcome: Same.
Target measures: Absolute difference of proportions by received treatment (observational analog of per-protocol).
Data analysis: Same except multiple target trials were emulated for each week and all the results were pooled. Sampling uncertainty was calculated by bootstrapping the whole analysis.

Pre- and post-inference checks

The authors report that adjustment for confounders didn't materially change the result. They also checked how dropping health consciousness variables (like PSA-screening) affects the result.Sensitivity to a binary unmeasured confounder was explored using a simple function of the found risk difference and confounder prevalence and association. Sensitivity to calendar time was also checked.

Results

The 8-year risk was 0.4–0.8% (95%CI) points higher in the no-screening arm. Cumulative risk functions show a 1% risk at baseline in the screening arm and then slow growth until the no-screening arm surpasses the screening arm in cumulative risk after 4–5 years.

Coming soon... (maybe)

CC BY-SA 4.0 Eero Teppo. Last modified: March 23, 2025. Website built with Franklin.jl and the Julia programming language.