Causal Inference With Time Varying Variables
This is my summary of the Part III of the book Causal Inference: What If by Hernán and Robins.
Let’s say we have three variables A, L, and Y, and the causality is like follows. (I use dagitty.net to draw the graphs.)
But what if all of them are time-varying? It doesn’t even matter if we have measured these variables over time. If you want to get valid causal estimates, you need to have valid causal assumptions – and this includes the assumptions through time. Ignoring past values of causes can easily create bias.
With time-varying variables, the number of alternative assumptions explodes, and it becomes harder and harder to visualize.
Importantly, it is very likely that there are some unmeasured confounders U
too. Causes and confounders start affecting each other – there is treatment-confounder feedback.
Infact these kind of structures completely break traditional methods. If you go and only stratify by L
, you will cause bias because they are colliders. g-methods are needed even in simple cases to compare different time varying causes A0-A1-A2. The goal is the same, exchangeability, but now at every time point A0-A1-A2. The goal is effectively to remove arrows with g-methods until we have data that has been generated from a system like this.
The causes A are sequentially randomized with some known distributions – only optionally depending on the previous value of the cause. As always, unmeasured confounders are manageable if their paths can be closed by adjusting for measured variables. Unfortunately, finding sufficient adjustment sets is a bit more difficult with time-varying variables.
First, it is better to use single-world intervention graphs instead of directed acyclic graphs. Let’s take this graph for an example.
Notation is quite confusing and requires some time to get used to. Capital names are observed values and lowercase names are interventions. Parentheses refer to counterfactual situations. Pipes |
split the node into two so that incoming paths point to the left symbol and outgoing paths originate from the right symbol. So, A0|a0
means the variable A
was counterfactually set to the value a
at time 0
. L1(a0)
means the observed value of the variable L
at time 1
when variable A
was set to a
at time 0
.
Above, we can see A
is exchangeable at time 0
without adjustment. Now, below, we check time 1
and see that exchangeability is fine when L1(a0)
and A0|a0
are adjusted. If a0
is set to a constant value for everyone, it is automatically adjusted.
Another complication comes when we let the interventions be dynamic and depend on those other variables L
. Remember the pipe |
splits the node, so in the above graph, L
doesn’t affect the intervention a1
.
First note that we split the composite nodes with the |
pipes. Then we change the notation from a0
and a1
as the intervention values to some function g
that takes g0
as its first value. This can be called a dynamic treatment strategy. Then we allow the intervention g
at time 1
to depend on the value of L
so there is a new path between them. And again, the cause A
is exchangeable at time 0
without any adjustments.
Below, we again check exchangeability for the cause A
at time 1
. Adjusting for L1
(and automatically for the constant g0
) is again enough. Note the cause is the intervention g1
, not the observed A1
. This is truly confusing but becomes easier over time and repetition.
Depending on the causal structure, it may be possible to get exchangeability for all strategies, for static strategies, or for no strategy. It is absolutely crucial to think carefully about plausible causal structures and then measure as many counfounders (or their proxies) as possible. This study design is in the heart of epidemiology and any other field that wishes to learn scientific causal models about the world.
To make the graphs more realistic, we should add measurement error and censoring to them. Censoring is especially interesting because you don’t need any new ideas to deal with it: not being censored can be thought of just as a time-varying cause along the intervention of interest.
So how to properly adjust to achieve sequential exchangeability? Often you should use the g-formula (standardization), inverse-probability weighting, and g-estimation, and see if they agree. Or you can use a robust method that combines them and is resistant to some model misspecifications.
The basic idea of the plug-in g-formula is to predict the mean counterfactual outcome under a joint intervention of interest as a weighted average of the predicted observed outcomes conditional on all observed causes and confounders. The weights are the probabilities of those confounders at each time point conditional on their histories. History just means the variables that need to be adjusted to get conditional exchangeability for the intervention at that time point. If the intervention of interest is dynamic, the probabilities of interventions are added as further weights to the whole weighted average.
The basic idea of inverse-probability weighting is to predict the mean counterfactual outcome under a joint intervention of interest as the predicted observed outcomes conditional on the observed causes in a pseudopopulation represented with weights in the model fitting process. The weights are the inverse of the probabilities of the observed causes at each time point conditional on their histories. History again just means the variables that need to be adjusted to get conditional exchangeability for the cause at that time point.
It is possible to combine these two methods by effectively just using the inverse-probability weights as another conditioning variable in the plug-in g-formula (standardization) method. This method (targeted maximum likelihood estimator, TMLE) gives valid results when either the outcome model (standardization) or the cause model (inverse-probability) are misspecified. More precisely you need to fit many models for every time point and there are methods that are robust to misspecifications in many of these models.
The basic idea of g-estimation is to predict, at each time point, the counterfactual outcome under one intervention as a function of the observed outcome and the observed cause multiplied by an effect parameter, and then, search which value for the effect satisfies exchangeability in a model that predicts the observed cause conditional on the confounders and the predicted counterfactual outcome at that time point.