Guest post by Miklos Racz: The fundamental limits of dimension estimation in random geometric graphs

This post is a continuation of the previous one, where we explored how to detect geometry in a random geometric graph. We now consider the other side of the coin: when is it impossible to detect geometry and what are techniques for proving this? We begin by discussing the G(n,p,d) model introduced in the previous post and then turn to a more general setup, proving a robustness result on the threshold dimension for detection. The proof of the latter result also gives us the opportunity to learn about the fascinating world of entropic central limit theorems.

Barrier to detecting geometry: when Wishart becomes GOE

Recall from the previous post that G(n,p,d) is a random geometric graph where the underlying metric space is the d-dimensional unit sphere \mathbb{S}^{d-1} = \left\{ x \in \mathbb{R}^d : \left\| x \right\|_2 = 1 \right\}, and where the latent labels of the nodes are i.i.d. uniform random vectors in \mathbb{S}^{d-1}. Our goal now is to show the impossibility result of Bubeck, Ding, Eldan, and Racz: if d \gg n^{3}, then it is impossible to distinguish between G(n,p,d) and the Erdos-Renyi random graph G(n,p). More precisely, we have that

(1)   \begin{equation*} \mathrm{TV} \left( G(n,p), G(n,p,d) \right) \to 0 \end{equation*}

when d \gg n^{3} and n \to \infty, where \mathrm{TV} denotes total variation distance.

There are essentially three main ways to bound the total variation of two distributions from above: (i) if the distributions have nice formulas associated with them, then exact computation is possible; (ii) through coupling the distributions; or (iii) by using inequalities between probability metrics to switch the problem to bounding a different notion of distance between the distributions. Here, while the distribution of G(n,p,d) does not have a nice formula associated with it, the main idea is to view this random geometric graph as a function of an n \times n Wishart matrix with d degrees of freedom—i.e., a matrix of inner products of n d-dimensional Gaussian vectors—denoted by W(n,d). It turns out that one can view G(n,p) as (essentially) the same function of an n \times n GOE random matrix—i.e., a symmetric matrix with i.i.d. Gaussian entries on and above the diagonal—denoted by M(n). The upside of this is that both of these random matrix ensembles have explicit densities that allow for explicit computation. We explain this connection here in the special case of p = 1/2 for simplicity; see Bubeck et al. for the case of general p.

Recall that if Y_{1} is a standard normal random variable in \mathbb{R}^d, then Y_1 / \left\| Y_1 \right\| is uniformly distributed on the sphere \mathbb{S}^{d-1}. Consequently we can view G\left( n, 1/2, d \right) as a function of an appropriate Wishart matrix, as follows. Let Y be an n \times d matrix where the entries are i.i.d. standard normal random variables, and let W \equiv W (n,d) = YY^T be the corresponding n \times n Wishart matrix. Note that W_{ii} = \left\langle Y_i, Y_i \right\rangle = \left\| Y_i \right\|^2 and so \left\langle Y_i / \left\| Y_i \right\|, Y_j / \left\| Y_j \right\| \right\rangle = W_{ij} / \sqrt{W_{ii} W_{jj}}. Thus the n \times n matrix A defined as

    \[ A_{i,j} = \begin{cases} 1 & \text{if } W_{ij} \geq 0 \text{ and } i \neq j\\ 0 & \text{otherwise} \end{cases} \]

has the same law as the adjacency matrix of G\left(n,1/2,d\right). Denote the map that takes W to A by H, i.e., A = H \left( W \right).

In a similar way we can view G \left( n, 1/2 \right) as a function of an n \times n matrix drawn from the Gaussian Orthogonal Ensemble (GOE). Let M\left( n \right) be a symmetric n \times n random matrix where the diagonal entries are i.i.d. normal random variables with mean zero and variance 2, and the entries above the diagonal are i.i.d. standard normal random variables, with the entries on and above the diagonal all independent. Then B = H \left( M(n) \right) has the same law as the adjacency matrix of G(n,p). Note that B only depends on the sign of the off-diagonal elements of M\left(n \right), so in the definition of B we can replace M\left( n \right) with M \left( n, d \right) := \sqrt{d} M \left( n \right) + d I_n, where I_n is the n \times n identity matrix.

We can thus conclude that

    \begin{align*} \mathrm{TV} \left( G(n,1/2,d), G(n,1/2) \right) &= \mathrm{TV} \left( H \left( W(n,d) \right), H \left( M(n,d) \right) \right) \\ &\leq \mathrm{TV} \left( W(n,d), M(n,d) \right). \end{align*}

The densities of these two random matrix ensembles are explicit and well known (although we do not state them here), which allow for explicit calculations. The outcome of these calculations is the following result, proven independently and simultaneously by Bubeck et al. and Jiang and Li.

Theorem [Bubeck, Ding, Eldan, and Racz; Jiang and Li]

Define the random matrix ensembles W\left( n, d \right) and M \left( n, d \right) as above. If d / n^3 \to \infty, then

    \begin{equation*}\label{eq:Wishart_GOE} \mathrm{TV} \left( W \left( n, d \right), M \left( n, d \right) \right) \to 0. \end{equation*}

We conclude that it is impossible to detect underlying geometry whenever d \gg n^{3}.

The universality of the threshold dimension

How robust is the result presented above? We have seen that the detection threshold is intimately connected to the threshold of when a Wishart matrix becomes GOE. Understanding the robustness of this result on random matrices is interesting in its own right, and this is what we will pursue in the remainder of this post, which is based on a recent paper by Bubeck and Ganguly.

Let \mathbb{X} be an n \times d random matrix with i.i.d. entries from a distribution \mu that has mean zero and variance 1. The n \times n matrix \mathbb{X} \mathbb{X}^{T} is known as the Wishart matrix with d degrees of freedom. As we have seen above, this arises naturally in geometry, where \mathbb{X} \mathbb{X}^{T} is known as the Gram matrix of inner products of n points in \mathbb{R}^{d}. The Wishart matrix also appears naturally in statistics as the sample covariance matrix, where d is the number of samples and n is the number of parameters. (Note that in statistics the number of samples is usually denoted by n, and the number of parameters is usually denoted by p; here our notation is taken with the geometric perspective in mind.)

We consider the Wishart matrix with the diagonal removed, and scaled appropriately:

    \[ \mathcal{W}_{n,d} = \frac{1}{\sqrt{d}} \left( \mathbb{X} \mathbb{X}^{T} - \mathrm{diag} \left( \mathbb{X} \mathbb{X}^{T} \right) \right). \]

In many applications—such as to random graphs as above—the diagonal of the matrix is not relevant, so removing it does not lose information. Our goal is to understand how large does the dimension d have to be so that \mathcal{W}_{n,d} is approximately like \mathcal{G}_{n}, which is defined as the n \times n Wigner matrix with zeros on the diagonal and i.i.d. standard Gaussians above the diagonal. In other words, \mathcal{G}_{n} is drawn from the Gaussian Orthogonal Ensemble (GOE) with the diagonal replaced with zeros.

A simple application of the multivariate central limit theorem gives that if n is fixed and d \to \infty, then \mathcal{W}_{n,d} converges to \mathcal{G}_{n} in distribution. The main result of Bubeck and Ganguly establishes that this holds as long as d \, \widetilde{\gg} \, n^{3} under rather general conditions on the distribution \mu.

Theorem [Bubeck and Ganguly]

If the distribution \mu is log-concave, i.e., if it has density f(\cdot) = e^{-\varphi(\cdot)} for some convex function \varphi, and if \frac{d}{n^{3} \log^{2} \left( d \right)} \to \infty, then

(2)   \begin{equation*} \mathrm{TV} \left( \mathcal{W}_{n,d}, \mathcal{G}_{n} \right) \to 0. \end{equation*}

On the other hand, if \mu has a finite fourth moment and \frac{d}{n^{3}} \to 0, then

(3)   \begin{equation*} \mathrm{TV} \left( \mathcal{W}_{n,d}, \mathcal{G}_{n} \right) \to 1. \end{equation*}

This result extends Theorem 1 from the previous post and Theorem 1 from above, and establishes n^{3} as the universal critical dimension (up to logarithmic factors) for sufficiently smooth measures \mu: \mathcal{W}_{n,d} is approximately Gaussian if and only if d is much larger than n^{3}. For random graphs, as seen above, this is the dimension barrier to extracting geometric information from a network: if the dimension is much greater than the cube of the number of vertices, then all geometry is lost. In the setting of statistics this means that the Gaussian approximation of a Wishart matrix is valid as long as the sample size is much greater than the cube of the number of parameters. Note that for some statistics of a Wishart matrix the Gaussian approximation is valid for much smaller sample sizes (e.g., the largest eigenvalue behaves as in the limit even when the number of parameters is on the same order as the sample size (Johnstone, 2001)).

To distinguish the random matrix ensembles, we have seen in the previous post that signed triangles work up until the threshold dimension in the case when \mu is standard normal. It turns out that the same statistic works in this more general setting; when the entries of the matrices are centered, this statistic can be written as A \mapsto \mathrm{Tr} \left( A^{3} \right). We leave the details as an exercise for the reader.

We note that for (2) to hold it is necessary to have some smoothness assumption on the distribution \mu. For instance, if \mu is purely atomic, then so is the distribution of \mathcal{W}_{n,d}, and thus its total variation distance to \mathcal{G}_{n} is 1. The log-concave assumption gives this necessary smoothness, and it is an interesting open problem to understand how far this can be relaxed.

We will see the proof (and in particular the connection to entropic CLT!) in the next post.

This entry was posted in Uncategorized. Bookmark the permalink.