Last wednesday, Sourav Chatterjee was speaking in the Mathematics Colloquium here at Princeton. While he is most famous for his contributions to probability, especially using Stein’s method, this talk was rather statistical. The problem was interesting and its solution simple enough that it can fit in this blog post. Sourav’s full paper can be found on ArXiv.
I present a slightly simpler proof of his main result. It yields a bound that depends on a more natural quantity that may be much smaller than the one involving the nuclear norm and that appears in the original paper Chatterjee (2012).
The problem
Let be an symmetric matrix with elements . Let be the undirected random graph on vertices obtained by placing edges independently with probability respectively.
The goal is to produce an estimator of , i.e., a matrix , that is measurable with respect to and such that the Frobenius norm is small with high probability.
The title of this post suggests that we need some structure on the . Sourav’s article points to a variety of examples where structure appears in the spectrum of . As a result, the method and analysis described below are spectral.
Thresholded eigen-decomposition
Let denote the adjacency matrix of and observe that the matrix is symmetric, has zero mean and its elements are independent and lie in the interval almost surely. This is not quite a Wigner matrix (the variances are not identical) but up to multiplicative constant, its operator norm is of the order , just like a Wigner matrix. Formally, the following lemma holds. Its proof is postponed to the end of the post.
Since we can decompose our observed matrix , the eigenvalues of that are smaller than should disappear in the noise. Moreover, those that are larger than should be visible in the spectrum on .
Consider the eigen-decompositions of and :
where the eigenvalues are ordered by decreasing order of magnitude and define the following estimator,
Note that Sourav Chatterjee uses instead of but his proof requires much more complicated arguments for essentially the same result (see the original paper). Also, his result holds in expectation and we allow here a tunable level of confidence.
Proof: Assume that we are on the event . We begin by a simple consequence of the lemma. Let and define the matrix
The lemma together with Weyl’s inequality implies that
It yields so that for all (by definition of ). Therefore, . Similarly, we find that the complement of is such that .
We now return to the main part of the proof. By the triangle inequality, we have
and we treat each term independently.
On the one hand, . On the other hand,
where we used the definition of and the lemma. Together, the above three displays yield
Discussion
This is not exactly the bound from Chatterjee (2012). Indeed, the main Theorem 2.1 in Chatterjee (2012) yields
where is the nuclear norm of .
Note first that our theorem yields a bound on the median:
How do the two bounds compare? Note first that our bound is always at least as good as Chatterjee’s bound:
Moreover, there may be a large gap between the two quantities. For example, if is the identity matrix, then
This can also be the case with low rank matrices. Take the matrix the all-one matrix , where is a vector of ones. In this case is the only nonzero eigenvalue and
Moreover, the ratio between these quantities can be made arbitrary close to 0 by taking arbitrary close to 0. Indeed,
Proof of the lemma
Let be a -net of the unit Euclidean ball of . Using a volume packing argument (see Vershynin (2011)), it is not hard to show that we can always take of cardinality . Moreover, for any vector ,
It implies
Using a union bound, we find for any ,
We conclude a Chernoff bound to get for any ,
Using Hoeffding’s lemma, since , we get
Putting together the above three displays and optimizing over yields
It completes the proof of the lemma.