Our proof of the lower bound for the clique number of random geometric graphs (see Theorem 1 in the previous post) is a little bit contrived, but it might actually have some value. Our approach can be described as follows. Suppose that our goal is to study some property of some random variable . (Here and is some statement on .) Then you can try to apply the following recipe:
- Set up a statistical hypothesis testing problem where the random variable arises naturally as a ‘good’ test.
- Study the type I and type II error of this test, assuming that the property is true.
- Derive a statistical lower bound on the risk (i.e., the sum of the type I and type II error) of any test.
- Conclude that if was true, the test based on would beat the statistical lower bound, and thus is false.
There are other ways to phrase the above recipe. I used here the language of statistics, but I could have explained everything in terms of distances between distributions. Let us see now how we can apply the steps described above for our particular problem.
Step 1. Let be i.i.d. random variables. The two hypothesis on the distribution of this sequence are the following:
: , v.s. : , for some with ,
In words, under each coordinate of is an independent , while under there exists a subset of coordinates which have a covariance of between each other. Now let us consider be the vector of coordinates of the sequence of observations , and let . Under we would assume the vectors to be uniformly distributed on , while under we would expect a ‘cluster’ of size (of order) to appear (if is ‘large enough’). Thus a natural test is to build the random geometric graph corresponding to the vectors , compute its clique number , and reject the null hypothesis if is ‘too large’.
From now on we shall assume that
and that the test reject if .
Step 2. By definition the type I error of the test is smaller than . Let us now consider the type II error:
where in the last probability are arbitrary points in . Now using Chernoff’s bound, and the moment generating function of two correlated one-dimensional Gaussian random variables, it is easy to check (see Lemma 10 in Devroye et al.) that
In particular we conclude that if then the risk of our test is smaller than .
Step 3. Using (almost) standard techniques (see Theorem 2.1. here, and Theorem 1 here), one can show that if (where is a positive numerical constant), then there does not exist a test with a risk smaller than .
Step 4. If one take of order , then Step 3 shows that there are no test with risk smaller than (note that we can assume that , otherwise our theorem is already proved). Plugging this value of in the result of Step 2, we see that we must necessarily have , which gives exactly the lower bound we were trying to prove.
At this point we don’t know if the above ‘statistical method’ can be applied successfully to other (difficult) problems, so an interesting direction for further research would be to find such problems!
This post concludes my first sequence of posts on random graphs. The next sequence is going to be much longer, and it will be focused on optimization (more on that in the next post).