Failure of the Central Limit Theorem

We discuss why a naive application of a CLT approximation fails, motivating us to develop Hoeffding's inequality.

20 March 2025

Consider the following example. Given nn independent, fair coin flips, estimate the chace that we see 75%75\% heads. Rephrasing, we define the random variable Sn:=#heads for n flipsS_n:=\# \text{heads for } n \text{ flips}, and we seek an upper bound on P(Sn34n)\mathbb{P}(S_n\geq \frac{3}{4}n). Observe that we've framed this question as asking for a tail bound. Since SnBinom(n,12),S_n\sim \text{Binom}(n,\frac{1}{2}), its expected value and variance is given by ESn=n2\mathbb{E}S_n=\frac{n}{2} and VSn=n4\mathbb{V}S_n=\frac{n}{4}. As a first pass, we use Chebyshev's to estimate

P(Sn34n)=P(SnESn34n12n)P(SnESnn4)VSn(n4)2=4n.\begin{equation} \begin{split} \mathbb{P}\left(S_n\geq \frac{3}{4}n\right)&=\mathbb{P}\left(S_n - \mathbb{E}S_n\geq \frac{3}{4}n-\frac{1}{2}n\right)\\ &\leq \mathbb{P}\left(|S_n - \mathbb{E}S_n|\geq \frac{n}{4}\right)\\ &\leq \frac{\mathbb{V}S_n}{(\frac{n}{4})^2}\\ &=\frac{4}{n}. \end{split} \end{equation}

Can we get exponential bounds on the tail? One idea of attack is to the de Moivre-Laplace CLT which states that a properly normalized SnS_n converges to the standard Gaussian N(0,1)\mathcal{N}(0,1). As we'll calculate below, the Gaussian has exponentially tails, so for large nn we can replace a normalized SnS_n with the Gaussian to get our desired tail bound. Let XN(0,1)X\sim \mathcal{N}(0,1), then using a change of variables y=xty=x-t we compute

P(Xt)=t12πex22dx=012πet22etyey221dy12πet220etydy=1t=1t2πet22.\begin{equation} \begin{split} \mathbb{P}(X\geq t)&=\int_t^\infty \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx\\ &=\int_0^\infty \frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}}e^{-ty}\underbrace{e^{-\frac{y^2}{2}}}_{\leq 1}dy\\ &\leq\frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}}\underbrace{\int_0^\infty e^{-ty}dy}_{=\frac{1}{t}}\\ &=\frac{1}{t\sqrt{2\pi}}e^{-\frac{t^2}{2}}. \end{split} \end{equation}

We're in good shape. Let's carry out this mode of attack. Let us define the normalization Zn:=SnESnV[Sn]Z_n:=\frac{S_n-\mathbb{E}S_n}{\sqrt{\mathbb{V}[S_n]}}. Then, for large nn,

P(Sn34n)=P(Snn2n4)=P(Snn2n4n4n4)=P(Znn4)12πe(n4)221n2πen8.\begin{equation} \begin{split} \mathbb{P}\left(S_n\geq \frac{3}{4}n\right)&=\mathbb{P}\left(S_n-\frac{n}{2}\geq \frac{n}{4}\right)\\ &=\mathbb{P}\left(\frac{S_n- \frac{n}{2}}{\sqrt{\frac{n}{4}}}\geq \frac{\frac{n}{4}}{\sqrt{\frac{n}{4}}}\right)\\ &=\mathbb{P}\left(Z_n\geq \sqrt{\frac{n}{4}}\right)\\ &\approx \frac{1}{\sqrt{2\pi}}e^{-\frac{\left(\sqrt{\frac{n}{4}}\right)^2}{2}}\\ &\leq\frac{1}{n\sqrt{2\pi}}e^{-\frac{n}{8}}. \end{split} \end{equation}

The approximation is by de Moivre-Laplace CLT, while the inequality is by the Gaussian tails computation. All good? No. What fails is this argument is that the approximation error large, on the order of O(N12)O(N^{-\frac{1}{2}}), so in fact the final inequality would be on this order. This is worse than our initial Chebyshev's bound! The relevant fact here is the Berry-Esseen CLT theorem, which is a quantitative, pre-asymptotic version of CLT.

\text{}

Theorem [Berry-Esseen]. Suppose that XiX_i is a sequence of iid random variables with mean 00 and variance 11, and X1X_1 has a finite third moment. Let YN(0,1)Y\sim \mathcal{N}(0,1). Then,

P(1ni=1nXit)P(Yt)CN.\begin{equation} \left|\mathbb{P}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^nX_i\geq t\right)-\mathbb{P}(Y\geq t) \right|\leq \frac{C}{\sqrt{N}}. \end{equation}

Note that we've chosen this normalization so it has mean 00 and varaince 11 which makes it comprable to N(0,1)\mathcal{N}(0,1). The asymptotics of Berry-Esseen are actually sharp, by considering specific binomial distributions. To circumvent this issue, we develop Hoeffding's inequality which will directly give an exponential tail bound.

\text{}

Theorem [Hoeffding]. Suppose X1,...,XnX_1,...,X_n are bounded random variables such that Xi[ai,bi]X_i\in [a_i,b_i]. Then,

P(1ni=1n(XiEXi)t)exp(2nt2i=1n(biai)2).\begin{equation} \mathbb{P}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^n(X_i-\mathbb{E}X_i)\geq t\right)\leq \exp\left(-\frac{2nt^2}{\sum_{i=1}^n (b_i-a_i)^2}\right). \end{equation}

In particular, if all XiX_i are drawn from the same distribution, have mean 00, and Xi[a,b]X_i\in[a,b], then

P(1ni=1nXit)exp(2t2(ba)2).\begin{equation} \mathbb{P}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^nX_i\geq t\right)\leq \exp\left(-\frac{2t^2}{(b-a)^2}\right). \end{equation}