Beta believe it

update your priors!

Mar 19, 2025

Suppose you have a coin where the probability of heads, p, is drawn from ∼U(0, 1).

Clearly the fair value of p without any additional information would just be 0.5. If you don’t know how I arrived at that then I don’t really know how you stumbled upon my SubStack in the first place, but go ask your buddy Claude or something.

Now suppose you flip this coin one time and it’s heads? What’s your new fair value of p? Still 0.5? Surely it’s not less than 0.5, right? What about if you flip it a second time and it’s heads again?

Intuitive Answer

Let’s break this down flip by flip. After the first flip there are only two possible outcomes: 0 Heads and 1 Head. The prior is that p∼U(0, 1) so if the first flip is heads then our fair should be the higher of two draws from the prior distribution since there were only two possible outcomes and it landed at the higher one. What’s the fair value of the maximum of two IID variables drawn from ∼U(0, 1)? Well the distribution is uniform so we expect uniformity in the ‘spacing’ of draws from it. So in expectation we would have: 0, space, smaller, space, larger, space, 1. There are 3 ‘spaces’ so each one should be 1/3. The expectancy of the larger IID variable drawn from ∼U(0, 1) is thus 2/3 which is also our fair value of p after the first flip.

The same logic follows that after a second flip being heads as well that the fair value would be the expected value of the largest of 3 IID variables drawn from ∼U(0, 1). This works out to be 3/4.

Generalizing

Anytime we flip the coin N times we will have a total of N+1 different outcomes since number of heads can be anything from 0,1,2,…,N. Subsequently there will be N+2 ‘spaces’ among N+1 numbers drawn from ∼U(0, 1) so each space will be 1/(N+2). If we have H heads then we have the H+1th number from the possibilities of 0,1,2,…,N. So there are H+1 ‘spaces’ leading up to this number in expectation and therefore

\(\mathbb{E}[p \mid H,N] = \frac{H+1}{N+2} \)

And this checks out intuitively too because anytime we have the same number of heads as tails our fair value of p becomes 1/2. Furthermore, initial flips adjust our fair value of p more than subsequent flips which also makes sense because the additional information that the 100th flip adds is minuscule in comparison to the information that the first flip gives.

Beta Distribution

Maybe you have a more formal background in statistics and don’t just napkin math everything out like I do. If so, then you’re probably familiar using Bayes’ theorem with a Beta prior, in which can we can clearly see that the hyperparameters of Beta(α, β) would be α=1 and β=1 since α represents the prior number of pseudo-observations of heads before seeing any actual flips and likewise β is the prior number of pseudo-observations of tails before seeing any actual flips (uniform prior).

\(p∣H,N∼Beta(α+H,β+(N−H))\)

\(\mathbb{E}[p \mid H,N] = \frac{H+α}{N+α+β} = \frac{H+1}{N+2} \)

More Formal Explanation of the ‘Spaces’

\(\text{Let } X_1, X_2, \dots, X_k \sim U(0,1)\)

\(\text{Define the order statistics: } X_{(1)} \leq X_{(2)} \leq \dots \leq X_{(k)}\)

\(\text{The total interval is: } X_{(0)} = 0, \quad X_{(k+1)} = 1\)

\(\sum_{j=0}^{k} (X_{(j+1)} - X_{(j)}) = 1\)

\(\mathbb{E}[X_{(j+1)} - X_{(j)}] \text{ is the same for all } j\)

\((k+1) \mathbb{E}[X_{(j+1)} - X_{(j)}] = 1\)

\(\mathbb{E}[X_{(j+1)} - X_{(j)}] = \frac{1}{k+1}\)

Bits, Bytes, and Bets

Discussion about this post