At the end of my previous post I claimed that proving that $latex \sum_{s=2}^k \frac{{k \choose s} {n-k \choose k-s}}{{n \choose k}} 2^{{s \choose 2}} ,$ tends to $latex 0$ when $latex n$ tends to infinity and $latex k=(2-\epsilon)\log_2(n)$ was unpleasant. It appears that I was wrong, and I would like to show here a simple proof that was suggested to me by my colleague Philippe Rigollet.
The key observation is that the term $latex \frac{{k \choose s} {n-k \choose k-s}}{{n \choose k}}$ is the probability that an hypergeometric random variable $latex H(k,k,n)$ takes value $latex s$. Hypergeometric distributions are nasty, but when $latex n$ is of larger order than $latex k$ (which is the case here), they are well approximated by a simple binomial $latex B(n, k/n)$. In the present case this suggests the upper bound $latex \frac{{k \choose s} {n-k \choose k-s}}{{n \choose k}} \leq {k \choose s} \left(\frac{k}{n}\right)^s .$ The latter inequality can be proved formally using the identity $latex {n-1 \choose k-1} = \frac{k}{n} {n \choose k} .$ Now one can simply use the inequalities $latex 2^{{s \choose 2}} \leq 2^{s \frac{k}{2}},$ and $latex {k \choose s} \leq k^s$ to obtain that the original unpleasant sum is upper bounded by $latex \sum_{s=2}^{k} 2^{s (\frac{k}{2} + 2\log_2(k) – \log_2(n) )}$ which clearly tends to $latex 0$ when $latex n$ tends to infinity and $latex k=(2-\epsilon)\log_2(n)$.
By Philippe Rigollet January 17, 2013 - 3:58 pm
I’ll try myself at posting my two (extra) cent about this proof with latex on WordPress.
The formula can be written as $latex E[2^{{X \choose 2}}I(X\ge 2)]$ where $latex X$ has a hypergeometric distribution with the following parameters: $latex n$ is the population size, $latex k$ is the number of success and $latex k$ is the number of draws. I like to view this distribution as an experiment where one puts $latex k$ pebbles in $latex n$ slots at random (with only one pebble per slot) and then $latex X$ denotes the number of pebbles in the slots numbered $latex 1$ through $latex k$. This distribution enjoys a remarkable property, sometimes referred to as negative association: the fact that a pebble is in one of the first $latex k$ slots decreases the probability that another pebble is in one of these first $latex k$ slots (indeed one slot is already taken). Let’s see how we can use this property:
First, $latex E[2^{{X \choose 2}}I(X\ge 2)]= E[2^{{X \choose 2}}]-P(X \le 1)$. We now show that each term tends to 1.
Let’s begin with the expected value:
$latex
E[2^{{X \choose 2}}]\le E[2^{\frac{X^2}{2}}]\le E[2^{\frac{k}{2}X}]= E[2^{\frac{k}{2}\sum_{i=1}^kY_i}]$ where $latex Y_i$ indicates if pebble $latex i$ is in one of the first $latex k$ slots. Now, negative association becomes useful. It implies that $latex E[2^{\frac{k}{2}\sum_{i=1}^kY_i}]\le \prod_{i=1}^kE[2^{\frac{k}{2}Y_i}]$. But each $latex Y_i$ is a Bernoulli random variable with parameter $latex k/n$ so that $latex E[2^{\frac{k}{2}\sum_{i=1}^kY_i}]\le \big((2^{\frac{k}2}-1)\frac{k}n +1\big)^k$. For $latex k=(2-\varepsilon)\log_2(n)$, we find that this quantity tends to 1 and $latex n \to \infty$.
We now deal with $latex P(X \le 1)$. A sufficient condition for this event to hold is that all the $latex k$ pebbles are in slots numbered at least $latex k+1$. Therefore $latex P(X \le 1) \ge P(X= 0) =1 -P(X\ge 1)$. By a union bound $latex P(X\ge 1) \le k^2/n \to 0$.