Clique number of random geometric graphs in high dimension

I would like to discuss now another class of random graphs, called random geometric graphs. In general a random geometric graph arises by taking $latex {n}&fg=000000$ i.i.d. random variables in some metric space, and then putting an edge between two of these random variables if the distance between them is smaller than some pre-specified quantity. In this post I’d like to focus on a specific class of random geometric graph defined as follows. Let $latex {X_1, \hdots, X_n}&fg=000000$ be i.i.d. random variables uniformly distributed on the Euclidean sphere $latex {\mathbb{S}^{d-1} = \{x \in {\mathbb R}^d: \|x\| =1\}}&fg=000000$. We define $latex {G_{n,d} = ([n], E_{n,d})}&fg=000000$ by

$latex \displaystyle (i,j) \in E_{n,d} \; \text{if} \; i \neq j, \; \text{and} \; X_i^{\top} X_j > 0 .&fg=000000$

Similarly to the Erdös-Rényi model, one has $latex {\mathop{\mathbb P}((i,j) \in E_{n,d}) = 1/2}&fg=000000$, but now these events are not mutually independent anymore! Indeed, knowing that $latex {i}&fg=000000$ is connected to both $latex {j}&fg=000000$ and $latex {k}&fg=000000$ gives some information on whether $latex {j}&fg=000000$ and $latex {k}&fg=000000$ are connected. This lack of independence between edges changes dramatically the behavior of the graph. An interesting consequence of this is that random geometric graphs might be much better models for real-world networks than the classical Erdös-Rényi random graph. Thus, quite naturally, random geometric graphs have also been extensively studied, see for example this book. However, most of the literature focuses on the case where $latex {d}&fg=000000$ is fixed and $latex {n}&fg=000000$ tends to infinity. In this post I would like to discuss some high-dimensional effect that happens when both $latex {d}&fg=000000$ and $latex {n}&fg=000000$ grow together to infinity. I’ll focus the discussion on the clique number $latex {\omega(G_{n,d})}&fg=000000$.

Let us first consider the standard case where $latex {d}&fg=000000$ is fixed and $latex {n}&fg=000000$ tends to infinity. Then one would expect to have roughly of order of $latex {n/2^d}&fg=000000$ points in the positive orthant $latex {{\mathbb R}^d_+ \cap \mathbb{S}^{d-1}}&fg=000000$ (recall that there are $latex {2^d}&fg=000000$ orthants in $latex {{\mathbb R}^d}&fg=000000$). In particular, since all points in the positive orthant have a positive scalar product with each other, these points will all be connected and thus one would expect $latex {\omega(G_{n,d}) = \Omega(n/2^d)}&fg=000000$. This argument can easily be made formal, and one can show that for $latex {d= o(\log n)}&fg=000000$, one has $latex {\mathop{\mathbb E} \omega(G_{n,d}) = \Omega(n^{1-\epsilon})}&fg=000000$, $latex {\forall \epsilon > 0}&fg=000000$. Thus in small dimension ($latex {d=o(\log n)}&fg=000000$) random geometric graphs behaves very differently than Erdös-Rényi random graphs, as in the former case one has cliques of size almost linear, while for the latter case they are of logarithmic size (see previous post).

Let us now consider the other extreme, where $latex {n}&fg=000000$ is fixed and $latex {d}&fg=000000$ tends to infinity. Then, intuitively, it is clear that the coordinates that will ‘decide’ if two vertices are connected will be almost surely different for each pair of vertices, which means that the geometry of the problem is lost and that we are back to the Erdös-Rényi model. This argument can easily be made formal by resorting to the multidimensional Central Limit Theorem, and one can show that the total variation between $latex {G_n}&fg=000000$ and $latex {G_{n,d}}&fg=000000$ tends to $latex {0}&fg=000000$ as $latex {d}&fg=000000$ tends to infinity. In particular the clique number $latex {\omega(G_{n,d})}&fg=000000$ is of logarithmic size when $latex {n}&fg=000000$ is large, and $latex {d}&fg=000000$ is much larger than $latex {n}&fg=000000$. In fact, by doing a much more subtle analysis, one can show that this last statement is true for much smaller values of $latex {d}&fg=000000$. More precisely, one can show that if $latex {d = \Omega(\log^3 n)}&fg=000000$ then $latex {\omega(G_{n,d})}&fg=000000$ is almost surely equivalent to $latex {2 \log_2(n)}&fg=000000$. In fact the clique number of a random geometric graph is already small at $latex {d = \Omega(\log^2 n)}&fg=000000$ where one can prove that almost surely $latex {\omega(G_{n,d}) = O(\log^3 n)}&fg=000000$. All of these results can be found in the very nice paper by Devroye, György, Lugosi, and Udina: High-dimensional random geometric graphs and their clique number. A natural question left open by this work is where exactly the transition between polynomial size cliques and logarithmic cliques happen. The above results show that it has to be after $latex {d=o(\log(n))}&fg=000000$ and before $latex {d=\Omega(\log^2(n))}&fg=000000$. The following theorem gives the answer:

Theorem 1 (Arias-Castro, Bubeck and Lugosi 2012) There exists some numerical constant $latex {c>0}&fg=000000$ such that with probability at least $latex {1/2}&fg=000000$,

$latex \displaystyle \omega(G_{n,d}) \geq c \exp\left(c \frac{\log^2(n)}{d} \right) .&fg=000000$

Surprisingly, this result shows a phase transition at $latex {d \sim \log^2(n)}&fg=000000$. Indeed for $latex {d=o(\log^2(n))}&fg=000000$ the above inequality implies that the clique number is larger than any poly-logarithmic term, that is $latex {\text{Median}(\omega(G_{n,d})) = \Omega(\log^a(n)), \forall a >0}&fg=000000$. On the other hand Devroye et al. proved that at $latex {d \geq 9 \log^2(n)}&fg=000000$ one already has almost surely $latex {\omega(G_{n,d}) = O(\log^3 n)}&fg=000000$. Thus the clique number of the random geometric graph $latex {G_{n,d}}&fg=000000$ becomes logarithmic brutally at $latex {d \sim \log^2(n)}&fg=000000$. In the next blog post I will discuss our proof of the above theorem.

This entry was posted in Random graphs. Bookmark the permalink.