Home » Miscellanea » Probability & Statistics » Hoeffding’s Inequality

Hoeffding’s Inequality

The Hoeffding’s inequality is a crucial result in probability theory as it provides an upper bound on the probability that the sum of a sample of independent random variables deviates from its expected value.

Though the bound holds for general independent random variables (not necessarily identically distributed) with some slight modifications, here we reduce to the special case of considering a sample of n independent and identically distributed (i.i.d.) Bernoulli random variables X_1, X_2 \ldots X_n, that is a Bernoulli process. By definition, the outcome of each variable is either 0 (failure) or 1 (success), namley X_i\in \{0,1\}~\forall i\{1,\ldots,n\}. Moreover, the probability of success (resp., failure) is described by the same parameter \theta (resp., 1-\theta) for all the variables X_i. More formally, we state that X_i \sim \mathcal{B}(\theta)~\forall i\{1,\ldots,n\}, where \mathcal{B} denotes the Bernoulli distribution and \theta is the parameter governing the distribution.

Generally speaking, if we want to provide a (single-point) estimate \hat{\theta} of the true population parameter \theta using the sample of n observed Bernoulli random variables we just need to measure the frequency of successes out of the total number of trials, that is how many variables out of n did output 1. This is often referred as the Maximum Likelihood Estimation (MLE) as it provides an estimate which maximizes the probability of seeing the observed data.

\hat{\theta} = \frac{1}{n}\sum_{i=1}^n X_i

Note also that \hat{\theta} and \theta can be considered as the sample mean and the population mean, respectively.

Hence, the (one-sided) inequality proposed by Hoeffding states that:
P(\hat{\theta} - \theta \geq \epsilon) \leq e^{-2n\epsilon^2}
By symmetry, the above inequality is also valid for the other side of the difference:
P(-\hat{\theta} +\theta \geq \epsilon) \leq e^{-2n\epsilon^2}
Eventually, adding up the two one-sided inequalities we obtain the two-sided inequality as follows:
P(|\hat{\theta} -\theta| \geq \epsilon) \leq 2e^{-2n\epsilon^2}

More generally, the Hoeffding’s inequality states that the probability of the estimation error |\hat{\theta} -\theta| being greater than a positive constant \epsilon > 0 is bounded to the quantity 2e^{-2n\epsilon^2}

Intuitively, we want the estimate \hat{\theta} being close to the true \theta. Therefore, for a fixed and very small \epsilon, we want the above probability small as well, namely we would like 2e^{-2n\epsilon^2} to be close to 0!
If we look carefully at the inequality we notice that the probability doesn’t depend on \theta at all, which is nice. Also notice that it decays exponentially with n which is also very good: as one might expect the more observations are collected the more likely the probability of a bad estimation error would be small.
However, the exponential function includes also the squared term \epsilon^2 which could go rapidly to 0 as \epsilon is getting smaller, thereby causing the overall power of the exponential function -2n\epsilon^2 going to 0. That means that the “price” we have to pay in terms of number of observations needed if we want a very restrictive bound on the error could be very high.
In the unlucky edge case where -2n\epsilon^2 \leadsto 0, the bound is in fact practically useless since it would simply tell us that:
P(|\hat{\theta} - \theta| \geq \epsilon) \leq 2e^{0} \longrightarrow P(|\hat{\theta} - \theta| \geq \epsilon) \leq 2
which is something that we already know from the basic axioms of probability! (In fact, those axioms provide even stricter bounds since the probability of any event to occur ranges always between 0 and 1).

The probability described by the Hoeffding’s inequality can be interpreted as the level of significance \alpha (i.e. the upper bound on the probability of making an error) for a confidence interval around \theta of size 2\epsilon.

P(|\hat{\theta} - \theta| \geq \epsilon) = P(\hat{\theta} \notin [\theta -\epsilon, \theta + \epsilon]) \leq 2e^{-2n\epsilon^2}\leq \alpha

Solving the equation above for n gives us the number of observations (or the number of times the Bernoulli trial should be repeated) in order for \hat{\theta}=\theta\pm \epsilon with (1-\alpha) confidence level.

2e^{-2n \epsilon^2} \leq \alpha

e^{-2n\epsilon^2} \leq \frac{\alpha}{2}

Applying the \textrm{log}_{e} = \textrm{ln} to both sides of the inequality above results into

\textrm{ln}(e^{-2n\epsilon^2}) \leq \textrm{ln}\big(\frac{\alpha}{2}\big)

-2n \epsilon^2 \leq \textrm{ln} \big(\frac{\alpha}{2}\big)

-n \leq \frac{1}{2 \epsilon^2}\textrm{ln} \big(\frac{\alpha}{2}\big)

n \geq -\frac{1}{2 \epsilon^2}\textrm{ln} \big(\frac{\alpha}{2}\big)

Therefore, this gives us a lower bound on the number of observations we would need in order to be at least 100(1-\alpha) \% confident that our estimate \hat{\theta} differs at most \epsilon from the true value \theta.

As an example, consider how large the sample size n would have to be at least if we want to be (at least) 95\% confident (i.e. \alpha = 0.05) that our estimate is correct within an error \epsilon = 0.1. According to the result above, it mush hold

n \geq - \frac{1}{2\cdot 0.1^2}\textrm{ln}\bigg(\frac{0.05}{2}\bigg)

n \geq - \frac{1}{2\cdot 0.01}\textrm{ln}(0.025)

n \geq - \frac{1}{\frac{2}{100}}\textrm{ln}(0.025)

n \geq (-50)\cdot\textrm{ln}(0.025)

n \geq (-50)\cdot (-3.69) \approx 185

So, with 185 observations we can be at least 95\% confident that our estimation error is bound by \epsilon = 0.1. Of course, if we would like a smaller error bound and/or a smaller significance level (resp., higher confidence level) n should increase accordingly.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: