Say we want to know the value of some parameter . The definition of a credible interval is easy to parse. Your 90% credible interval is the interval
, after you observe evidence
, just in case:
This just says that the probability that the value of is in
is 0.9. Simple enough.
The definition of a confidence interval in this setting is more complicated and a bit harder to parse. We have to start by defining a function from values of to intervals in
. We will call this function
. The output of this function:
is what turns out to be our confidence interval, but only if we make sure to construct
according to certain constraints. Recall that
is a random variable, meaning it could take any of several values before we observe it, and so
is also a random variable, and we do not know its value before we observe
.
is a random variable that can take intervals in
as values. Here are the constraints that
must satisfy in order for its output to be a 90% confidence interval.
If you’re used to thinking in terms of probability theory, this formula seems pretty weird. Let’s try parsing it in English. It says something like: for any value that might take, we need it to be the case that 90% of the time, that value is in the interval
. The value of
must depend on the value of
since
is supposedly evidence about the value of
. So what we have to do in order to construct a good
is make sure that conditioning on the value of
, 90% of the time
is going to take a value which our
function maps to an interval which includes
.
It might be best to walk though a simpler example. Suppose that so that it can only take one of those four values. Suppose that
. Now it turns out that assuming the value of
, 80% of the time,
takes the same value, and 10% of the time
takes a value one less than
. The other 10% of the time
randomly chooses one of the remaining three values. So for example:
And similar for other values of . Now instead of having a confidence interval in this setting, we are going to have to use a confidence set, but the definition is exactly the same. We have to construct a function
that takes values of
and returns sets of possible values of
. Furthermore, in order for the output of
to be a confidence interval, we need
to be a set which contains the true value of
90% of the time conditioned on the value of
. This is easy enough to do in this example. We define
.
Suppose that the value of is 2, then 80% of the time, the value of
is also 2, and so
, which contains 2. Still supposing that
, 10% of the time, the value of
is 1, and so
, which contains 2. This means that if
, then 90% of the time,
is in the set
. This same argument could be used for all values of x, and so we have that:
Which makes a confidence interval, or rather, a confidence set in this case, but the reasoning is exactly the same for intervals.
So you may ask at this point why anyone would ever care about confidence intervals. The reason I wanted to write this post is that I think I have a good explanation, which if it isn’t novel, is certainly rarely given.
The standard criticism of confidence intervals is that the probability you assign to the value of being in
is not necessarily 90% conditioning on the value of
! Suppose that we had a flat prior over the values of
in the previous setting, and that we observe that the value of
is 3. This gives us a 90% confidence set of
. In this case
is 0.93, not 0.9. There are cases where your posterior probability of the confidence set is lower than the value of the confidence interval as well.
Here is the thing though, in order for us to agree about the posterior probability of being in
we had to agree about the prior over
. We would not have to agree about our prior over
to agree about the 90% confidence interval. In fact, we could assign any prior over
and we would still end up with the same confidence interval (this might give you a hint as to how to construct cases where the confidence interval value and posterior probability diverge significantly).
Consider what happens if we take the open part of the constraint we defined on :
and take the expectation of that expression wrt
. This is the following average:
We know from our constraints on that each term being averaged here, eg,
, is greater than 0.9, and this means that the whole average is also greater than 0.9. Also, reducing this average using the law of total probability, we end up with the following expression:
. In other words, that whole average gives us our prior probability that the value of
is in
whatever
turns out to be. This gives us the following theorem:
Our prior probability that the true value is in must be greater than or equal to 0.9, and this holds no matter what our prior is! The prior has simplified out of the formula by the law of total probability.
Now this does not mean that we agree very much, at best it means that we would have agreed before we saw the value of about the probability of
being in
if we somehow knew what
was without gaining any information about
. Still, this is an interesting way to think about confidence intervals, and we can it helps us spot cases where using confidence intervals instead of credible intervals makes sense in practice. For one, it makes sense to use confidence intervals if you think that your prior has been chosen adversarially, or if you are sufficiently skeptical of your prior. For example, if you think your prior might favor the conclusion you wanted to be true, even after you try to account for that bias. It also makes sense to use confidence intervals in cases where we do not trust each others’ priors, or where our priors are not relevant to some social epistemic practice, such as science.
An interesting question I would like to work out some day is when in principle it makes sense to use confidence intervals instead of credible intervals. What are the exact formal conditions where betting according to confidence intervals gives you a better expected payout than betting according to your credible intervals? It’s not enough for your prior to have been chosen adversarially, or to be biased in favor of conclusions for no good reason. Your prior has to be messed up enough, perhaps messed up enough that you’re better off ignoring it, since confidence intervals do seem to ignore it overall. There should be some formal condition that tells us exactly what it means for your prior to be “messed up enough”, but until someone finds it, I suggest using confidence intervals whenever you or those you wish to convince might reasonably be skeptical of your prior.