This post is a continuation of the previous one, where we explored how to detect geometry in a random geometric graph. We now consider the other side of the coin: when is it impossible to detect geometry and what are techniques for proving this? We begin by discussing the model introduced in the previous post and then turn to a more general setup, proving a robustness result on the threshold dimension for detection. The proof of the latter result also gives us the opportunity to learn about the fascinating world of entropic central limit theorems.
Barrier to detecting geometry: when Wishart becomes GOE
Recall from the previous post that is a random geometric graph where the underlying metric space is the
-dimensional unit sphere
, and where the latent labels of the nodes are i.i.d. uniform random vectors in
. Our goal now is to show the impossibility result of Bubeck, Ding, Eldan, and Racz: if
, then it is impossible to distinguish between
and the Erdos-Renyi random graph
. More precisely, we have that
(1)
when and
, where
denotes total variation distance.
There are essentially three main ways to bound the total variation of two distributions from above: (i) if the distributions have nice formulas associated with them, then exact computation is possible; (ii) through coupling the distributions; or (iii) by using inequalities between probability metrics to switch the problem to bounding a different notion of distance between the distributions. Here, while the distribution of does not have a nice formula associated with it, the main idea is to view this random geometric graph as a function of an
Wishart matrix with
degrees of freedom—i.e., a matrix of inner products of
-dimensional Gaussian vectors—denoted by
. It turns out that one can view
as (essentially) the same function of an
GOE random matrix—i.e., a symmetric matrix with i.i.d. Gaussian entries on and above the diagonal—denoted by
. The upside of this is that both of these random matrix ensembles have explicit densities that allow for explicit computation. We explain this connection here in the special case of
for simplicity; see Bubeck et al. for the case of general
.
Recall that if is a standard normal random variable in
, then
is uniformly distributed on the sphere
. Consequently we can view
as a function of an appropriate Wishart matrix, as follows. Let
be an
matrix where the entries are i.i.d. standard normal random variables, and let
be the corresponding
Wishart matrix. Note that
and so
. Thus the
matrix
defined as
has the same law as the adjacency matrix of . Denote the map that takes
to
by
, i.e.,
.
In a similar way we can view as a function of an
matrix drawn from the Gaussian Orthogonal Ensemble (GOE). Let
be a symmetric
random matrix where the diagonal entries are i.i.d. normal random variables with mean zero and variance 2, and the entries above the diagonal are i.i.d. standard normal random variables, with the entries on and above the diagonal all independent. Then
has the same law as the adjacency matrix of
. Note that
only depends on the sign of the off-diagonal elements of
, so in the definition of
we can replace
with
, where
is the
identity matrix.
We can thus conclude that
The densities of these two random matrix ensembles are explicit and well known (although we do not state them here), which allow for explicit calculations. The outcome of these calculations is the following result, proven independently and simultaneously by Bubeck et al. and Jiang and Li.
Theorem [Bubeck, Ding, Eldan, and Racz; Jiang and Li]
Define the random matrix ensembles
and
as above. If
, then
We conclude that it is impossible to detect underlying geometry whenever .
The universality of the threshold dimension
How robust is the result presented above? We have seen that the detection threshold is intimately connected to the threshold of when a Wishart matrix becomes GOE. Understanding the robustness of this result on random matrices is interesting in its own right, and this is what we will pursue in the remainder of this post, which is based on a recent paper by Bubeck and Ganguly.
Let be an
random matrix with i.i.d. entries from a distribution
that has mean zero and variance
. The
matrix
is known as the Wishart matrix with
degrees of freedom. As we have seen above, this arises naturally in geometry, where
is known as the Gram matrix of inner products of
points in
. The Wishart matrix also appears naturally in statistics as the sample covariance matrix, where
is the number of samples and
is the number of parameters. (Note that in statistics the number of samples is usually denoted by
, and the number of parameters is usually denoted by
; here our notation is taken with the geometric perspective in mind.)
We consider the Wishart matrix with the diagonal removed, and scaled appropriately:
In many applications—such as to random graphs as above—the diagonal of the matrix is not relevant, so removing it does not lose information. Our goal is to understand how large does the dimension have to be so that
is approximately like
, which is defined as the
Wigner matrix with zeros on the diagonal and i.i.d. standard Gaussians above the diagonal. In other words,
is drawn from the Gaussian Orthogonal Ensemble (GOE) with the diagonal replaced with zeros.
A simple application of the multivariate central limit theorem gives that if is fixed and
, then
converges to
in distribution. The main result of Bubeck and Ganguly establishes that this holds as long as
under rather general conditions on the distribution
.
Theorem [Bubeck and Ganguly]
If the distribution
is log-concave, i.e., if it has density
for some convex function
, and if
, then
(2)
On the other hand, if
has a finite fourth moment and
, then
(3)
This result extends Theorem 1 from the previous post and Theorem 1 from above, and establishes as the universal critical dimension (up to logarithmic factors) for sufficiently smooth measures
:
is approximately Gaussian if and only if
is much larger than
. For random graphs, as seen above, this is the dimension barrier to extracting geometric information from a network: if the dimension is much greater than the cube of the number of vertices, then all geometry is lost. In the setting of statistics this means that the Gaussian approximation of a Wishart matrix is valid as long as the sample size is much greater than the cube of the number of parameters. Note that for some statistics of a Wishart matrix the Gaussian approximation is valid for much smaller sample sizes (e.g., the largest eigenvalue behaves as in the limit even when the number of parameters is on the same order as the sample size (Johnstone, 2001)).
To distinguish the random matrix ensembles, we have seen in the previous post that signed triangles work up until the threshold dimension in the case when is standard normal. It turns out that the same statistic works in this more general setting; when the entries of the matrices are centered, this statistic can be written as
. We leave the details as an exercise for the reader.
We note that for (2) to hold it is necessary to have some smoothness assumption on the distribution . For instance, if
is purely atomic, then so is the distribution of
, and thus its total variation distance to
is
. The log-concave assumption gives this necessary smoothness, and it is an interesting open problem to understand how far this can be relaxed.
We will see the proof (and in particular the connection to entropic CLT!) in the next post.