Single-World Intervention Graphs in 10 Examples
Sorry this is work in progress. I need to learn the underlying math better to understand what’s going on in many places. My goal here is to show how it’s possible to use the simplest possible derivation rules to go from an arbitrary SWIG to a valid formula for the counterfactual quantities of interest. It’s not nice to rely on specific results for every different situation when there seems to be a fairly straightforward general framework that would work in almost any situation.
In science, precise theory is necessary to make progress efficiently. Natural language and statistics alone cause a mess extremely easily, even when using top notch data collection like randomized controlled experiments — these ‘languages’ must be just too vague to talk about the right thing (causality and scientific knowledge in general).
I think causal diagrams should be thought of as the first mathematically rigorous language for aspiring scientific fields. Yes, you can’t derive mechanistic differential equations like in physics, but mathematical rigour is much closer than that. Even if it was difficult to draw plausible graphs because knowledge is so thin, I think one should give it maximum effort — this effort alone is a way to highlight ignorance and take the knowledge forward. As the knowledge improves, old data can be analyzed in light of the new knowledge. This is just science.
A causal graph like SWIG is a simplified visual interface to a mathematical theory. Although it’s possible to use the causal theory without the corresponding graphs, the knowledge and reasoning becomes less clear. So in this post I try to collect examples where we have a graph (knowledge) and then derive a quantity of interest which is identifiable with observable data.
1. Basic ideas
The graph represents a set of SWIGs for each specific value of . It is actually called a single-world intervention template (SWIT) but they are usually called just SWIGs.
- represents the observed ‘natural’ value of the (random) variable and represents an intervention on the variable, setting it to a certain fixed value . The arrow from reminds us that we assume to affect even though actually is set to a fixed constant and cannot be really thought of as a cause in that particular single world anymore (it cannot vary). Other similar weird arrows can exist in SWIGs, unfortunately — I’ll show some of them later.
- is the outcome (random variable) in this hypothetical world.
- and are some independent causes of and (these letters are used for unobserved variables by convention). As the number of variables grows, these would clutter the graph so they are not normally drawn — they exist implicitly.
Okay, say that we want to know the effect of on , like , which is the difference of the mean outcome in two hypothetical worlds for all and for all. (We assume that these values are possible for everyone in our population [positivity].) To estimate this effect, we’d need to know but it is not observed. So what does the graph tell us that could bridge this to what we can obserse?
Using the rules of d-separation, the graph tells us that is independent of so we know that : mean outcome in the whole population under some intervention is the same as the mean outcome under that intervention in subsets given by observed values of , that is, conditional on .
What to do with ? Another powerful assumption is consistency. This means that is a sufficiently well-defined intervention on (not compatible with wildly different things with different effects) and the observed values of correspond to the intervened values. When this is the case, we know that the observed is the same as when is the same as the observed . So, for example, is the same as . In general, .
So, we can estimate the by estimating , such as in general by modelling and then predicting and .
2. Common causes
and are common causes of and . is unobserved but some effects of are observed, and . Causal common causes are called causal confounders and variables that could be used as proxies of causal confounders are called surrogate or proxy confounders.
We are again interested in .
Using the rules of d-separation, the graph tells us that is independent of conditional on and or conditional on , , , and . This is because common causes open backdoors in the graph and these can be closed by conditioning on them. We assume it is indeed possible to observe values of for everyone in the conditioning subsets (positivity). Unfortunately we can’t condition on the unobserved variables so we might decide to settle with , , and . Conditioning can be shown on the graph with a box around the variable.
We can start here from the fact that could be recovered from a conditional expectation by averaging over all possible values of :
Next, we hope that is approximately independent of given only so that the conditional expectation above should be close to the one in subsets of too, like .
And assuming consistency, we can again arrive to for the conditional expectation:
This method is a form of g-formula which can be represented in multiple ways:
- Standardization (above):
- Iterated conditional expectations: (in general using simulation from fitted models)
- Inverse probability weighting: (in general using iteratively reweighted least squares model fitting or similar)
- Robust methods combining some of above
3. Conditioning on common effects
SWIG missing…
The variables , , and are common causes of and or variables associated with them. When our observations are conditional (selected) on them or their descendants (), a backdoor path opens in the graph and the result is selection bias in the association of and .
We are again interested in .
Using the rules of d-separation, a backdoor is open through and this could be blocked by conditioning on or . Similarly, the backdoor path via can be blocked conditioning on . However, we are out of luck with ; the path via will remain open whether or not we condition on .
We check the same for the selected variables. is independent of conditional on and independent of conditional on ( is a constant, not a confounder here). is independent of conditional on .
We should get better data but what’s the closest we can get? We could ignore selection bias caused by conditioning on .
Start from the weighted average again . Given the conditional independencies from the graph, we can again subset the conditioning further since the expectation should be the in these subsets:
Assuming consistency and using the iterated expectations representation we could then write . And perhaps hope for .
4. Censoring
Censoring means missing outcome data which forces us to again condition on being uncensored. This can cause selection bias as before.
We are again interested in .
Using the rules of d-separation, the graph tells us that is independent of conditional on , and independent of conditional on .
Total probability, exchangeability (and positivity), and consistency propel us forward as before:
5. Missing data
Previously we thought of missing data as conditioning on a variable. We can also think of missing data as measurement error (for each variable separately) where the hypothetical fully observed variable causes the observed partly observed variable with some missing data mechanism, often labelled .
We are again interested in .
Using d-separation, the graph tells us that is independent of given and independent of given . The only new thing we need to remember is that, if we subset into the non-missing subset, we can use the missing-containing variables in place of its hypothetical fully-observed source as these have the same values in the non-missing data subset.
So the chain of inference could go like this:
6. Measurement bias
If we have measurement error instead of just missing values, we have no convenient subsets to aim for. Dealing with measurement bias at the analysis stage may be possible by modelling the error, including correlation between errors (caused by some unobserved ) and measured sources of measurement bias (here from treatment ).
Measurement error in confounders also results into measurement bias since the mismeasured confounder is not enough to control for confounding by the true confounder.
7. Immortal time bias
In progress…
8. Treatment-confounder feedback
is a time-varying confounder of the effect of the time-varying treatment on the end-of-follow-up outcome .
We are interested in .
Start again from the law of total expectation in terms of the confounders . Then we can factorize the joint probability as .
Now we use the assumed independencies. is independent of conditional on so we can add to . Assuming consistency makes this equal to the observable .
Both and are independent of conditional on the confounders so we can add them to the conditioning of the expectation . Finally assuming consistency we have the observable . In total, we have the g-formula…
(Something may be wrong with this…)
9. Mediator-confounder feedback
In progress…
10. Dynamic treatment strategies
In progress…