Skip to main content eteppo

Target Trial Emulation Summarized

Published: 2023-08-12
Updated: 2023-08-12

Sidenote on the graphs: I haven’t found very convenient software for drawing directed graphs (nodes, edges, arrows) so I drew them by hand (Wacom tablet) in Inkscape (svg).

Causal inference from observational data can be thought of as emulating a hypothetical perfect experiment (target trial) that would directly answer the research question (target measure). Similarly, causal inference from experimental data can be thought of as emulating a target trial. All data collection designs have sources of error — deviations from the target trial — and the same methods apply if an attempt is made to model and minimize them.

But why are experiments so much better for estimating causal effects (potentially and by average)? Simply put, experiments allow more control over the data generation so that it may better reflect the target trial which would have no relevant sources of error left.

In this post, I want to summarize the most relevant points of control that set apart target trials, average RCTs, and observational studies. In a later post, I want to specify how to do this in practice using a general method (simulation which approximates the g-formula).

  1. Eligibility at time zero
  2. Well-defined interventions
  3. Random assignment at time zero
  4. Adherence to assignment
  5. Mediators of interest
  6. Follow up
  7. Time zero and end of follow up (timing)

1. Eligibility criteria at time zero

Eligibility criteria on a SWIG

Experiments often have clear eligibility criteria. They restrict the population in which the effect will be estimated so that the estimate (average) makes practical, scientific, and/or statistical sense. On the other hand, observational data may consist of samples of some convenient or general populations without reference to specific interventions. The data may or may not contain relevant variables for a target trial of interest.

While clarity always reduces error, the criteria themselves can make the estimate more or less informative for a certain purpose. When emulating a target trial, be it using experimental or more observational data, one should always choose the most relevant eligibility criteria depending on the causal question. Emulating a target trial is possible if relevant criteria have been measured precisely enough and the criteria are met in a large enough number of units.

2. Well-defined interventions

Well-defined interventions on a SWIG

Experiments have somewhat clear interventions by design while observational studies may study variables that don’t have clear interventional equivalents at all. (Note that effects of interventions are ultimately defined by a contrast of two alternative interventions, both of which need to be well-defined.)

This is surprisingly important. Infact, it is difficult to talk about causality at all without sufficiently well-defined interventions because this is a requirement for consistency. Ultimately, estimates for ill-defined interventions are much less interpretable, reliable, and actionable.

3. Random assignment at time zero

Random assignment on a SWIG

Random assignment of interventions, when done properly, makes the assignment groups exchangeable at the moment of randomization. In other words, a comparison of the randomized groups is guaranteed to have no confounding (specifically, bias becomes noise). The assignment doesn’t have to be simple random 1:1 — what’s relevant is that the probability of being assigned to each intervention is known for each unit.

But randomization is not fool-proof by any means. Randomizing basically changes the sources of bias to something more manageable compared to the bias in nonrandomized studies. When enough control is exerted on the relevant variables in an experiment — or the bias-causing variables are measured precisely and taken into account in an appropriate analysis — a valid and efficient result is likely. The same is possible using non-randomized observations albeit for different kinds of sources of bias.

4. Adherence to assignment

Adherence to assignment on a SWIG

As we can see, assignment and intervention are two different variables. When the assignment is not adhered to, the intervention probability is not known anymore and we are back to risk of bias. In experiments, it’s possible to anticipate the problem and spend more resources on ensuring the best possible adherence to the interventions of interest. Unfortunately, it is common practice to ignore the error in randomized studies and instead estimate the effect of the assignment on the outcome (intention-to-treat). Another choice has been to estimate the effect of initiating the intervention (ignoring whether the intervention continued per protocol after that).

Emulating target trials where everyone followed the assigned intervention up to the end of follow-up (per-protocol) is possible if enough variables in the causal graph are known and measured precisely. This is the same for randomized and nonrandomized data except for the kinds of variables that need to be taken into account.

5. Mediators of interest

Unintended mediators on a SWIG

Second, even if the assignment was followed perfectly, the assignment may affect the outcome through other paths except the intervention, in which case the effect of assignment is again not the same as the effect of intervention. Moreover, the intervention itself can affect the outcome through other paths than intended. Again, the effect of the intervention mediated through the intended paths won’t be the same as the effect of the assignment or the total effect of the intervention. In experiments, it is sometimes possible to use blinding and placebo and other tailor-made control interventions (or perhaps joint interventions) to block unwanted mediators.

Emulating target trials where the effects of assignment and intervention are mediated in a desired way is possible if relevant control interventions and mediators have been measured precisely. The attempt starts from finding an appropriate control group. Remember: when we say “the effect of intervention”, we actually mean “the effect of the contrast between intervention and control”. In observational studies, placebos are not used as such, so some similar inert interventions may be used to mimic a placebo (for example, some other drug or dosage may be assumed not to affect the outcome). In general, other active treatments can offer good contrast to the intervention of interest so that the influence of many unwanted mediators are minimized. Often the relevant question itself is not about the placebo-contrast but rather about the contrast to some alternative treatment.

Here’s another graphical example of SWIGs from an article by Ocampo and Bather (2022). It represents a randomized controlled trial with events happening during the follow-up which can mediate the effect of the intervention. It’s possible to estimate the joint effect of the intervention and intervening on the mediators.

Controlling for unintended mediators on a SWIG

A final note: Sometimes people may be actually interested in the effect of assignment to intervention rather than the intervention itself, in order to assess how difficult the intervention is to adhere to, or how well the treatment has been implemented in a specific area. This is simple to estimate using randomized data (in the trial setting) but it may not be that easy using observational data.

In medicine, assignment may be emulated using prescriptions or similar. All in all, it may be best to think of interventions on their own — even prescribing is just an intervention which may or may not cause drug intake as instructed and which should be improved to match the effect of ideal drug treatment.

6. Follow-up

Missing outcome data on a SWIG

Missing outcome data (censoring) is a common source of error for both randomized experimental and observational data. Using only the fully-observed subset in analysis opens the door for selection bias. Emulating target trials where everyone has a fully observed follow up is possible by estimating the joint effect of intervention and no censoring. This requires knowledge of common causes of censoring and outcome (or other variables) to achieve exchangeability. Also consistency and positivity must hold. The causes of censoring are numerous and requires judgement as to whether it’s even possible to imagine interventions that would eliminate a particular cause of censoring completely.

(I’m personally still confused how to think about it as an intervention and the answer I have now that we should realize that the following graphs are the same. Don’t ask we how, yet.)

Intervening on censoring to control for selection bias

Selection bias can be caused by many kinds of ‘conditioning on common effect’ structures. Below I drew another example from an article by Breskin and others (2018) showing a situation where the treatment causes harms which lead to drop-out and common causes of harm and outcome exist. Selection bias is caused by conditioning on having data which opens the backdoor path through WaW^a. In this case it is possible to adjust for WaW^a (in practice use P(W=wA=a)P(W=w|A=a) as weights in g-formula) to eliminate the selection bias because no backdoor paths to AA exist.

Selection bias has many forms

7. Time zero and end of follow up

Experiments often have clear timing which can prevent a surprising amount of bias. Time zero is the point where eligibility criteria are confirmed, interventions are assigned, and measurement of outcomes starts. After some time from assignment (grace period), the interventions are started. End of follow-up is the point where all measurement ends. Information must not flow backwards in time.

When emulating a target trial with observational data, one might even need to emulate multiple trials that start at all the possible times that eligibility criteria are known to be met. The grace period between time zero and intervention also creates a problem in observational data without clear assignments because observations are compatible with multiple interventions during the grace period. This undefined zone can be solved with the cloning-censoring technique where observations are cloned to every compatible assignment and then censored after the assignment becomes incompatible with observed intervention.

The causal structures are quite difficult here so let’s look at two examples from an article by Yang & Burgess (2024). Pay attention to the backdoors of the effect of A1A_1 on Y2Y_2.

First, eligibility must not the based on information measured after assignment. In a target trial, eligibility and assignment happen at the same time. The graph below shows an example causal structure where eligibility is based on the outcome at time one. The result is selection bias.

Immortal time bias example 1

Second, assignment to interventions must not be based on information measured after eligibility. In a target trial, eligibility and assignment happen at the same time. The graph below shows an example causal structure where intervention depends on previous outcome. The result in confounding.

Immortal time bias example 2

We can see why it’s called immortal time bias if we consider the outcome to be death. For all of the units that receive the intervention A1A_1, the time between 0 and 1 is immortal because they couldn’t have received the intervention otherwise. On the other hand, the controls can die any time during the follow-up — so the early deaths are missing in data for A1A_1.