We discuss why a naive application of a CLT approximation fails, motivating us to develop Hoeffding's inequality.
20 March 2025
Consider the following example. Given n independent, fair coin flips, estimate the chace that we see 75% heads. Rephrasing, we define the random variable Sn:=#heads for n flips, and we seek an upper bound on P(Sn≥43n). Observe that we've framed this question as asking for a tail bound. Since Sn∼Binom(n,21), its expected value and variance is given by ESn=2n and VSn=4n. As a first pass, we use Chebyshev's to estimate
Can we get exponential bounds on the tail? One idea of attack is to the de Moivre-Laplace CLT which states that a properly normalized Sn converges to the standard Gaussian N(0,1). As we'll calculate below, the Gaussian has exponentially tails, so for large n we can replace a normalized Sn with the Gaussian to get our desired tail bound. Let X∼N(0,1), then using a change of variables y=x−t we compute
The approximation is by de Moivre-Laplace CLT, while the inequality is by the Gaussian tails computation. All good? No. What fails is this argument is that the approximation error large, on the order of O(N−21), so in fact the final inequality would be on this order. This is worse than our initial Chebyshev's bound! The relevant fact here is the Berry-Esseen CLT theorem, which is a quantitative, pre-asymptotic version of CLT.
Theorem [Berry-Esseen].Suppose that Xi is a sequence of iid random variables with mean 0 and variance 1, and X1 has a finite third moment. Let Y∼N(0,1). Then,
P(n1i=1∑nXi≥t)−P(Y≥t)≤NC.
Note that we've chosen this normalization so it has mean 0 and varaince 1 which makes it comprable to N(0,1). The asymptotics of Berry-Esseen are actually sharp, by considering specific binomial distributions. To circumvent this issue, we develop Hoeffding's inequality which will directly give an exponential tail bound.
Theorem [Hoeffding].Suppose X1,...,Xn are bounded random variables such that Xi∈[ai,bi]. Then,