Consider you have defined a statistical model for the potential outcomes $Y_i(0)$ and $Y_i(1)$ for each experimental unit $i=1,\cdots, N$, as in Rubin (1978)'s model-based inference for causal effects [see also Rubin and Imbens (2015)]. You fit the model to the data and use the fitted model to impute the values you don't observe and obtain the unit-level treatment effect: $Y_i(1) - Y_i(0)$. Within a Bayesian framework, you will typically do the imputation multiple times drawing the model parameters from the posterior distribution to get the distribution of the unit-level treatment effect (and hence compute the average treatment effect or ATE).

If you are interested in the Quantile Treatment Effect (QTE) you can proceed in two ways.

  • You can compute the desired quantile for the $N$ simulated $Y_i(0)$ and $Y_i(1)$ and then take the difference $Q(Y_i(1)) - Q(Y_i(0))$.

  • Alternatively, you can compute the quantile of the unit-level treatment effect: $Q(Y_i(1) - Y_i(0))$.

I have seen people treating these two quantities as interchangeable, but I believe they have a very different interpretation. Any thoughts on how to interpret these quantities? Thanks

  • $\begingroup$ Thank you for the question. Im currently writting a paper where I need to explain this differences. and this discussion just sparked some ideas! $\endgroup$
    – Fcold
    Jun 17, 2020 at 19:50

1 Answer 1


I think they will be the same in a setting like a homogenous treatment effect world ($Y1=Y0 + m$) or even an affine transformation ($Y1=k \cdot Y0 + m$) that preserves rank (i.e., $k>0$), but in general they will not overlap, so your concern is valid.

The second is definitely the more interesting counterfactual quantity, but people will often calculate the first because they lack the individual-level counterfactual data to calculate the second quantity (or a model to fill it in). This shortcut makes some sense if you are not worried about rank reversals. Note that this problem does not arise with means.

To see the difference between the two, suppose $Y0$ is symmetric about 0 (say $N(0,1)$), and $Y1=-k \cdot Y0$ and we care about the 95 percentile. The 95th percentile of the treatment effect is very large since those are the folks who go from the negative bottom of the $Y0$ distribution to the positive top of $Y1$. But differences between the two 95th percentiles will be more modest if $k$ is not too large. It could even be negative if there is shrinkage in the support of $Y1$ (say for $k=0.5$ above), leading you to make the wrong inference about the sign of the 95th percentile of the effect (much less its magnitude).

If the treatment is a small change, you might be willing to assume away rank reversals or highly non-linear transformations where the link between the two methods does not hold.

Here's a toy example illustrating the last example with $Y0 \sim N(0,1)$ and $Y1=-0.5 \cdot Y0 + 0$. I have plotted the distributions of $Y0$, $Y1$ and $Y1-Y0$, along with the 95th percentile for each. As you can see, the 95th quantile of the effect is $2.5$, whereas the difference between the 95th quantiles is $0.82 - 1.59 = -0.77$.

enter image description here

  • $\begingroup$ Thanks for your reply. Would you be able to work out the math of your example? The quantile distribution of $Y_0\sim N(0,1)$ is $\Phi(b)$ and of $Y_1 = m + k Y_0$ is $m + k \Phi(b)$. The quantile distribution of $Y_1-Y_0$ is... $\endgroup$
    – mrb
    Jun 18, 2020 at 12:19
  • $\begingroup$ @mrb $\Delta=Y1-Y0=k \cdot Y0+m - Y0=(k-1)\cdot Y0+m.$ If Y0 is standard normal, then $\Delta \sim N(\mu=m,\sigma^2=(k-1)^2)$. I don't know if this is what you mean by quantile distribution. To me that phrasing sounds more like the sample distribution of the qth quantile. $\endgroup$
    – dimitriy
    Jun 18, 2020 at 17:58

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.