Skip to main content eteppo

Single-World Intervention Graphs in 10 Examples

Published: 2024-08-24
Updated: 2024-08-24

Sorry this is work in progress. I need to learn the underlying math better to understand what’s going on in many places. My goal here is to show how it’s possible to use the simplest possible derivation rules to go from an arbitrary SWIG to a valid formula for the counterfactual quantities of interest. It’s not nice to rely on specific results for every different situation when there seems to be a fairly straightforward general framework that would work in almost any situation.

In science, precise theory is necessary to make progress efficiently. Natural language and statistics alone cause a mess extremely easily, even when using top notch data collection like randomized controlled experiments — these ‘languages’ must be just too vague to talk about the right thing (causality and scientific knowledge in general).

I think causal diagrams should be thought of as the first mathematically rigorous language for aspiring scientific fields. Yes, you can’t derive mechanistic differential equations like in physics, but mathematical rigour is much closer than that. Even if it was difficult to draw plausible graphs because knowledge is so thin, I think one should give it maximum effort — this effort alone is a way to highlight ignorance and take the knowledge forward. As the knowledge improves, old data can be analyzed in light of the new knowledge. This is just science.

A causal graph like SWIG is a simplified visual interface to a mathematical theory. Although it’s possible to use the causal theory without the corresponding graphs, the knowledge and reasoning becomes less clear. So in this post I try to collect examples where we have a graph (knowledge) and then derive a quantity of interest which is identifiable with observable data.

1. Basic ideas

Basic ideas SWIG

The graph represents a set of SWIGs for each specific value of aa. It is actually called a single-world intervention template (SWIT) but they are usually called just SWIGs.

  • AA represents the observed ‘natural’ value of the (random) variable and a| a represents an intervention on the variable, setting it to a certain fixed value aa. The arrow from aa reminds us that we assume AA to affect YY even though actually aa is set to a fixed constant and cannot be really thought of as a cause in that particular single world anymore (it cannot vary). Other similar weird arrows can exist in SWIGs, unfortunately — I’ll show some of them later.
  • YaY^a is the outcome (random variable) in this hypothetical world.
  • UU and WW are some independent causes of AA and YaY^a (these letters are used for unobserved variables by convention). As the number of variables grows, these would clutter the graph so they are not normally drawn — they exist implicitly.

Okay, say that we want to know the effect of AA on YY, like E[Ya=0]E[Ya=1]E[Y^{a=0}] - E[Y^{a=1}], which is the difference of the mean outcome in two hypothetical worlds a=0a=0 for all and a=1a=1 for all. (We assume that these values are possible for everyone in our population [positivity].) To estimate this effect, we’d need to know E[Ya]E[Y^a] but it is not observed. So what does the graph tell us that could bridge this to what we can obserse?

Using the rules of d-separation, the graph tells us that YaY^a is independent of AA so we know that E[Ya]=E[YaA=a]E[Y^{a^{*}}] = E[Y^{a^{*}}|A=a]: mean outcome in the whole population under some intervention is the same as the mean outcome under that intervention in subsets given by observed values of AA, that is, conditional on AA.

What to do with E[YaA=a]E[Y^{a^{*}}|A=a]? Another powerful assumption is consistency. This means that aa^{*} is a sufficiently well-defined intervention on AA (not compatible with wildly different things with different effects) and the observed values of AA correspond to the intervened values. When this is the case, we know that the observed YY is the same as YaY^{a^{*}} when aa^{*} is the same as the observed A=aA = a. So, for example, E[Ya=1A=1]E[Y^{a=1}|A=1] is the same as E[YA=1]E[Y|A=1]. In general, E[YaA=a]=E[YA=a]E[Y^a|A=a] = E[Y|A=a].

So, we can estimate the E[Ya]E[Y^a] by estimating E[YA=a]E[Y|A=a], such as in general by modelling E[YA]E[Y|A] and then predicting E[YA=1]E[Y|A=1] and E[YA=0]E[Y|A=0].

2. Common causes

Common causes SWIG

LL and UU are common causes of AA and YY. UU is unobserved but some effects of UU are observed, P1P_1 and P2P_2. Causal common causes are called causal confounders and variables that could be used as proxies of causal confounders are called surrogate or proxy confounders.

We are again interested in E[Ya]E[Y^a].

Using the rules of d-separation, the graph tells us that YaY^a is independent of AA conditional on LL and UU or conditional on LL, WW, P1P_1, and P2P_2. This is because common causes open backdoors in the graph and these can be closed by conditioning on them. We assume it is indeed possible to observe values of AA for everyone in the conditioning subsets (positivity). Unfortunately we can’t condition on the unobserved variables so we might decide to settle with LL, P1P_1, and P2P_2. Conditioning can be shown on the graph with a box around the variable.

We can start here from the fact that E[Ya]E[Y^{a^{*}}] could be recovered from a conditional expectation E[YaL=l,P1=p1,P2=p2]E[Y^{a^{*}}|L = l, P_1 = p_1, P_2 = p_2] by averaging over all possible values of l,p1,p2{l, p_1, p_2}:

l,p1,p2E[YaL=l,P1=p1,P2=p2]×P(L=l,P1,=p1,P2=p2)\sum_{l, p_1, p_2} E[Y^{a^{*}}|L = l, P_1 = p_1, P_2 = p_2] \times P(L=l, P_1, = p_1, P_2 = p_2)

Next, we hope that AA is approximately independent of YaY^a given only L,P1,P2{L, P_1, P_2} so that the conditional expectation above should be close to the one in subsets of AA too, like E[YaA=a,L=l,P1=p1,P2=p2]E[Y^{a^{*}}|A=a, L=l, P_1 = p_1, P_2 = p_2].

And assuming consistency, we can again arrive to E[YA=a,L=l,P1=p1,P2=p2]E[Y|A=a, L=l, P_1 = p_1, P_2 = p_2] for the conditional expectation:

E[Ya]l,p1,p2E[YA=a,L=l,P1=p1,P2=p2]×P(L=l,P1,=p1,P2=p2)E[Y^{a^{*}}] \approx \sum_{l, p_1, p_2} E[Y| A = a, L = l, P_1 = p_1, P_2 = p_2] \times P(L=l, P_1, = p_1, P_2 = p_2)

This method is a form of g-formula which can be represented in multiple ways:

  • Standardization (above): lE[YA=a,L=l]×P(L=l)\sum_l E[Y|A=a, L=l] \times P(L=l)
  • Iterated conditional expectations: E[E[YA=a,L]]E[E[Y|A=a, L]] (in general using simulation from fitted models)
  • Inverse probability weighting: E[I(A=a)Yf[AL]]E\Big[\frac{I(A=a)Y}{f[A|L]}\Big] (in general using iteratively reweighted least squares model fitting or similar)
  • Robust methods combining some of above

3. Conditioning on common effects

SWIG missing…

The variables S1S_1, S2S_2, and CaC^a are common causes of AA and YY or variables associated with them. When our observations are conditional (selected) on them or their descendants (S3aS^a_3), a backdoor path opens in the graph and the result is selection bias in the association of AA and YY.

We are again interested in E[Ya]E[Y^{a^{*}}].

Using the rules of d-separation, a backdoor is open through S2S_2 and this could be blocked by conditioning on LL or KK. Similarly, the backdoor path via S1aS_1^a can be blocked conditioning on JJ. However, we are out of luck with S3aS_3^a; the path via CaC^a will remain open whether or not we condition on CaC^a.

We check the same for the selected variables. YaY^a is independent of S2S_2 conditional on KK and independent of S1aS_1^a conditional on JJ (AaA|a is a constant, not a confounder here). YaY^a is independent of S3aS_3^a conditional on CaC^a.

We should get better data but what’s the closest we can get? We could ignore selection bias caused by conditioning on S3aS_3^a.

Start from the weighted average again E[YaK=k,L=l,J=j]×P(K=k,L=l,J=j)\sum E[Y^{a^{*}}|K=k, L=l, J=j] \times P(K=k, L=l, J=j). Given the conditional independencies from the graph, we can again subset the conditioning further since the expectation should be the in these subsets:

E[YaA=a,S1a=1,S2=1,K=k,L=l,J=j]×P(K=k,L=l,J=j)\sum E[Y^{a^{*}}|A=a,S_1^a=1,S_2=1,K=k,L=l,J=j] \times P(K=k, L=l, J=j)

Assuming consistency and using the iterated expectations representation we could then write E[E[YA=a,S1=1,S2=1,K,L,J]]E[E[Y|A=a, S_1 = 1, S_2 = 1, K, L, J]]. And perhaps hope for E[Ya]E[E[YA=a,S1=1,S2=1,S3=1,K,L,J]]E[Y^{a^{*}}] \approx E[E[Y|A=a, S_1 = 1, S_2 = 1, S_3 = 1, K, L, J]].

4. Censoring

Censoring means missing outcome data which forces us to again condition on being uncensored. This can cause selection bias as before.

Censoring SWIG

We are again interested in E[Ya]E[Y^{a^{*}}].

Using the rules of d-separation, the graph tells us that YaY^a is independent of AA conditional on LL, and independent of CaC^a conditional on LL.

Total probability, exchangeability (and positivity), and consistency propel us forward as before: E[Ya]=E[E[YaL]]=E[E[YaA=a,Ca=0,L]]=E[E[YA=a,C=0,L]]E[Y^{a^{*}}] = E[E[Y^{a^{*}}|L]] = E[E[Y^{a^{*}}|A=a, C^a=0, L]] = E[E[Y|A=a, C=0, L]]

5. Missing data

Previously we thought of missing data as conditioning on a variable. We can also think of missing data as measurement error (for each variable separately) where the hypothetical fully observed variable causes the observed partly observed variable with some missing data mechanism, often labelled R{0,1}R \in \{0,1\}.

Missing data SWIG

We are again interested in E[Ya]E[Y^{a^{*}}].

Using d-separation, the graph tells us that YaY^a is independent of AA given LL and independent of RAR_A given LL. The only new thing we need to remember is that, if we subset into the non-missing subset, we can use the missing-containing variables Aˉ\bar{A} in place of its hypothetical fully-observed source AA as these have the same values in the non-missing data subset.

So the chain of inference could go like this: E[Ya]=E[E[YaL]]=E[E[YaA=a,RA=0,L]]=E[E[YaAˉa=a,RA=0,L]]=E[E[YAˉ=a,RA=0,L]]E[Y^{a^{*}}] = E[E[Y^{a^{*}}|L]] = E[E[Y^{a^{*}}|A=a, R_A = 0, L]] = E[E[Y^{a^{*}}|\bar{A}^a=a, R_A = 0, L]] = E[E[Y|\bar{A}=a, R_A = 0, L]]

6. Measurement bias

Measurement bias SWIG

If we have measurement error instead of just missing values, we have no convenient subsets to aim for. Dealing with measurement bias at the analysis stage may be possible by modelling the error, including correlation between errors (caused by some unobserved WW) and measured sources of measurement bias (here from treatment AA).

Measurement error in confounders also results into measurement bias since the mismeasured confounder is not enough to control for confounding by the true confounder.

7. Immortal time bias

In progress…

8. Treatment-confounder feedback

Treatment-confounder feedback SWIG

LL is a time-varying confounder of the effect of the time-varying treatment AA on the end-of-follow-up outcome YY.

We are interested in E[Ya0,a1]E[Y^{a_0, a_1}].

Start again from the law of total expectation in terms of the confounders E[Ya0,a1L0=l0,L1a0=l1]×P(L0=l0,L1a0=l1)\sum E[Y^{a_0, a_1}| L_0 = l_0, L_1^{a_0} = l_1] \times P(L_0=l_0, L_1^{a_0}=l_1). Then we can factorize the joint probability as P(L0=l0)×P(L1a0=l1L0=l0)P(L_0 = l_0) \times P(L_1^{a_0} = l_1|L_0 = l_0).

Now we use the assumed independencies. L1a0L_1^{a_0} is independent of A0A_0 conditional on L0L_0 so we can add A0A_0 to P(L1a0=l1L0=l0,A0=a0)P(L_1^{a_0} = l_1|L_0 = l_0, A_0 = a_0). Assuming consistency makes this equal to the observable P(L1=l1L0=l0,A0=a0)P(L_1 = l_1|L_0 = l_0, A_0 = a_0).

Both A0A_0 and A1a0A_1^{a_0} are independent of Ya0,a1Y^{a_0, a_1} conditional on the confounders so we can add them to the conditioning of the expectation E[Ya0,a1L0=l0,L1a0=l1,A0=a0,A1a0=a1]E[Y^{a_0, a_1}| L_0 = l_0, L_1^{a_0} = l_1, A_0 = a_0, A_1^{a_0} = a_1]. Finally assuming consistency we have the observable E[YL0=l0,L1=l1,A0=a0,A1=a1]E[Y| L_0 = l_0, L_1 = l_1, A_0 = a_0, A_1 = a_1]. In total, we have the g-formula…

l0,l1E[YL0=l0,L1=l1,A0=a0,A1=a1]×P(L1=l1L0=l0,A0=a0)×P(L0=l0)\sum_{l_0, l_1} E[Y| L_0 = l_0, L_1 = l_1, A_0 = a_0, A_1 = a_1] \times P(L_1 = l_1|L_0 = l_0, A_0 = a_0) \times P(L_0 = l_0)

(Something may be wrong with this…)

9. Mediator-confounder feedback

In progress…

10. Dynamic treatment strategies

In progress…