Lectures 2 and 3. Proof of Kadison-Singer (1)

The goal of the following lectures is to prove the following result about random matrices due to Marcus, Spielman, and Srivastava. This result readily implies a positive resolution of the Kadison-Singer conjecture, as was explained in detail in the previous lecture.

Theorem. Let v_1,\ldots,v_n be independent random vectors in \mathbb{C}^d whose distribution has finite support. Suppose that

    \[\mathbf{E}\sum_{i=1}^nv_iv_i^* = I, \qquad \mathbf{E}\|v_i\|^2 \le \varepsilon\mbox{ for all }i.\]

Then

    \[\left\|\sum_{i=1}^nv_iv_i^*\right\| \le (1+\sqrt{\varepsilon})^2\]

with positive probability.

Define the random matrix A and its characteristic polynomial p_A as

    \[A = \sum_{i=1}^n v_iv_i^*,\qquad p_A(z) = \mathrm{det}(zI-A).\]

As A is positive definite, one representation of the matrix norm is \|A\|=\mathrm{maxroot}(p_A). The proof of the above theorem consists of two main parts:

  1. Show that \mathrm{maxroot}(p_A)\le \mathrm{maxroot}(\mathbf{E}p_A) with positive probability.
  2. Show that \mathrm{maxroot}(\mathbf{E}p_A)\le (1+\sqrt{\varepsilon})^2.

The present post is devoted to the proof of the first statement. The second statement will be the subject of the subsequent lectures.

Interlacing roots of polynomials

How can we prove an identity such as \mathrm{maxroot}(p_A)\le \mathrm{maxroot}(\mathbf{E}p_A)? The key issue that we face is that the \mathrm{maxroot} functional is extremely nonlinear, and it is not at all clear how it behaves when you take expectations. This problem already arises in the very simplest setting: what can we say about the locations of the roots of a convex combination \lambda p+(1-\lambda)q of two polynomials p and q? In general, there is little hope to say anything concrete. However, if the polynomials p and q have some special properties, then we are in much better shape.

Lemma. Let p and q be polynomials that are monic (their leading coefficient is one) and of the same degree. Suppose that the polynomial \lambda p+(1-\lambda)q is real-rooted (all its roots are real) for every 0\le\lambda\le 1. Then

    \[\text{maxroot}(p)\wedge\text{maxroot}(q) \le \text{maxroot}(\lambda p+(1-\lambda)q) \le \text{maxroot}(p)\vee\text{maxroot}(q)\]

for every 0\le \lambda\le 1.

The message of this lemma is that under the given assumptions, the maximal root of the convex combination \lambda p+(1-\lambda)q interlaces between the maximal roots of p and q (in fact, a similar property holds for all other roots as well; but we will not need this fact.)

Proof. Let us assume without loss of generality that \text{maxroot}(p)\le\text{maxroot}(q), so we must prove

    \[\text{maxroot}(p) \le \text{maxroot}(\lambda p+(1-\lambda)q) \le \text{maxroot}(q).\]

Proof of upper bound. Because the polynomials are monic, p(z) and q(z) are strictly increasing for real z to the right of their maximal roots. Thus p(z) and q(z) are both strictly positive for z>\text{maxroot}(q). But this means that \lambda p(x)+(1-\lambda)q(z) is also strictly positive for z>\text{maxroot}(q), and thus cannot have a root in this region. This implies that

    \[\text{maxroot}(\lambda p+(1-\lambda)q)\le\text{maxroot}(q).\]

Proof of lower bound. Of course, this is the interesting part. By the same argument as the upper bound, it is clear that \text{maxroot}(\lambda p+(1-\lambda)q)=\text{maxroot}(p) if \text{maxroot}(p)=\text{maxroot}(q). We can therefore assume without loss of generality that \text{maxroot}(p)<\text{maxroot}(q).

We distinguish two cases. Note that p(z)>0 for z>\text{maxroot}(p). Suppose first that q(z)<0 for all \text{maxroot}(p)<z<\text{maxroot}(q). Then p and q must look something like this:

Rendered by QuickLaTeX.com

This means that for any 0<\lambda<1, we have \lambda p(z)+(1-\lambda)q(z)\le 0 at z=\text{maxroot}(p) and \lambda p(z)+(1-\lambda)q(z)\ge 0 at z=\text{maxroot}(q). Thus \lambda p+(1-\lambda)q must have a root between \text{maxroot}(p) and \text{maxroot}(q) by the intermediate value theorem. In particular,

    \[\text{maxroot}(\lambda p+(1-\lambda)q)\ge\text{maxroot}(p).\]

Now suppose that q(z)\ge 0 for some \text{maxroot}(p)<z<\text{maxroot}(q), and let z^* be the second largest root of q. Then p and q must look something like this:

Rendered by QuickLaTeX.com

(Note that it could be that z^*=\text{maxroot}(q), or that z^* is a local maximum; then the picture looks slightly different, but this will not change our argument.) We now argue as follows. The polynomial p has no roots in the interval [z^*,\infty[, but the polynomial q has at least two roots in this interval (counting multiplicities). Thus as we let \lambda vary from 0 to 1, the number of roots of the polynomial \lambda p+(1-\lambda)q must jump at some point \lambda^* from having at least two roots to having zero roots in the interval [z^*,\infty[. The polynomial \lambda^*p+(1-\lambda^*)q must therefore look like this:

Rendered by QuickLaTeX.com

This means that for any \varepsilon>0 sufficiently small, the polynomial (\lambda^*+\varepsilon)p+(1-\lambda^*-\varepsilon)q loses two real roots in the interval [z^*,\infty[. As long as these roots do not reappear elsewhere on the real line, this implies that this polynomial has less real roots than its degree: and, therefore, it must have at least two roots in the complex plane away from the real line. This contradicts the assumption that all convex combinations of p and q are real-rooted.

Thus it only remains to argue what happens if the two roots that are lost when we go from \varepsilon=0 to \varepsilon>0 reappear elsewhere on the real line. Note that for new roots to appear, a local maximum that was strictly below 0 (or a local minimum that was strictly above 0) when \varepsilon=0 must have moved above (or below) 0 when \varepsilon>0. As the graph of a polynomial is continuous in its coefficients, we can always avoid this by choosing \varepsilon>0 to be sufficiently small. \square

Once we know how to deal with two polynomials, we can easily generalize as follows.

Corollary. Let p_1,\ldots,p_n be monic polynomials of the same degree. Suppose that all convex combinations of p_1,\ldots,p_n are real-rooted. Then

    \[\min_{i\le n}\text{maxroot}(p_i) \le \text{maxroot}(\lambda_1 p_1+\cdots+\lambda_np_n) \le \max_{i\le n}\text{maxroot}(p_i)\]

for all convex weights \lambda_i\ge 0, \lambda_1+\cdots+\lambda_n=1.

Proof by induction. assume the result is true for a given number of polynomials n; then the result is also true for n+1 polynomials by applying the previous lemma. \square

Mixed characteristic polynomials

In order to apply the interlacing property developed above to the Kadison-Singer problem, we must show that the characteristic polynomial p_A satisfies the necessary assumptions. Recall that the random matrix A has a distribution with finite support; that is, A is a random variable that can take one of N possible values A_1,\ldots,A_N. Our aim is to show that there exists 1\le i\le N such that

    \[\text{maxroot}(p_{A_i}) \le \text{maxroot}(\mathbf{E}p_A),\]

where the expectation \mathbf{E}p_A is clearly a convex combination of the polynomials p_{A_1},\ldots,p_{A_N}. In order to apply the interlacing property, we must show not only that each of these polynomials is real-rooted (this is obvious, as A_i are Hermitian matrices), but also that convex combinations of these polynomials are real-rooted. The latter is far from obvious: we must reason about the properties of convex combinations of determinants, such as \lambda \det(zI-A_1)+(1-\lambda)\det(zI-A_2). Determinants are nonlinear objects, and so it is not at all clear how to address such questions.

The stroke of luck in the Kadison-Singer problem is that the the matrix A is a sum of rank-one matrices. Even though the determinant is nonlinear, it is linear with respect to rank-one perturbations. This crucial fact will allow us to express convex combinations of the determinants that appear in our problem in terms of another determinant, for which it will prove to be easier to establish real-rootedness.

Lemma. (Sylvester’s determinant identity) \det(I+uv^*) = 1+v^*u.

Proof. Let U,V be nonsingular square matrices. Then

    \[\det(I+UV^*) = \det(U^{-1})\det(I+UV^*)\det(U) = \det(I+V^*U).\]

By continuity, this remains valid for singular matrices U,V. The result follows by letting U be the matrix whose first column is u and V be the matrix whose first column is v (all other entries are zero). \square

Corollary. t\mapsto\det(B+tvv^*) is linear in t.

Proof. If B is nonsingular, we can write

    \[\det(B+tvv^*) = \det(B)\det(I+tB^{-1}vv^*) = \det(B)(1+tv^*B^{-1}v),\]

which is linear in t. As the limit of linear functions is linear, the result remains valid for singular B. \square

What we would really like is that \mathbf{E}\det(B+vv^*)\overset{?}{=}\det(B+\mathbf{E}vv^*) for a random vector v: that would make it very easy to handle convex combinations of characteristic polynomials. Unfortunately, this is wishful thinking: the matrix \mathbf{E}vv^* is not of rank one, so this does not follow from the linearity of rank-one perturbations. Nonetheless, this is not such a bad idea. We can use the linearity of rank-one perturbations to rewrite the characteristic polynomial of a sum of rank-one matrices in terms of a different quantity, which proves to be linear even beyond the rank-one setting.

To this end, let us define the multivariate polynomial

    \[p(z,t_1,\ldots,t_n) = \det\Bigg(zI+\sum_{i=1}^nt_iv_iv_i^*\Bigg).\]

Then p_A(z)=p(z,-1,\ldots,-1). Moreover, we have just shown that p(z,t_1,\ldots,t_n) is multilinear in t_1,\ldots,t_n for every z. Using that a linear function equals its first-order Taylor expansion, we can write

    \[p(z,t_1,\ldots,t_n) = p(z,t_1,\ldots,t_{n-1},0) + t_n\frac{\partial}{\partial s}(z,t_1,\ldots,t_{n-1},s)\bigg|_{s=0}.\]

Iterating this device, we obtain

    \[p(z,t_1,\ldots,t_n) = \prod_{i=1}^n\bigg(1+t_i\frac{\partial}{\partial s_i}\bigg)p(z,s_1,\ldots,s_n)\bigg|_{s_1,\ldots,s_n=0}.\]

Setting t_1,\ldots,t_n=-1, we obtain the identity

    \[\det\Bigg(zI-\sum_{i=1}^nv_iv_i^*\Bigg) = \prod_{i=1}^n\bigg(1-\frac{\partial}{\partial t_i}\bigg) \det\Bigg(zI+\sum_{i=1}^nt_i v_iv_i^*\Bigg)\Bigg|_{t_1,\ldots,t_n=0}.\]

The quantity on the right is called a mixed characteristic polynomial.

Definition. The mixed characteristic polynomial of square matrices B_1,\ldots,B_n is defined as

    \[\mu[B_1,\ldots,B_n](z) := \prod_{i=1}^n\bigg(1-\frac{\partial}{\partial t_i}\bigg) \det\Bigg(zI+\sum_{i=1}^nt_i B_i\Bigg)\Bigg|_{t_1,\ldots,t_n=0}.\]

Note that if all B_i are rank one, we have shown that

    \[p_{\sum_{i=1}^nB_i}(z) = \mu[B_1,\ldots,B_n](z).\]

That is, the mixed characteristic polynomial of B_1,\ldots,B_n coincides with the characteristic polynomial of \sum_{i=1}^nB_i when all B_i are rank one. This is no longer true when the matrices B_i are not of rank one! Nonetheless, the mixed characteristic polynomial is an extremely useful device for the following reason.

Lemma. \mu[B_1,\ldots,B_n] is multilinear in B_1,\ldots,B_n.

Proof. The determinant \det(zI+\sum_{i=1}^nt_iB_i) is a polynomial in the entries of the matrices t_iB_i, and each term in the product \prod_{i=1}^n(1-\frac{\partial}{\partial t_i}) is a product of derivatives with respect to distinct variables t_i. In order for such a derivative of one of the terms of the polynomial \det(zI+\sum_{i=1}^nt_iB_i) to be nonzero after we set t_1,\ldots,t_n=0, this term can contain each t_iB_i at most once: otherwise this term must be proportional to t_i^k for some k>1, so the derivative of this term will vanish when we set t_i=0. Thus the only terms that survive in the mixed characteristic polynomials are products of the entries of the matrices t_iB_i, where each i appears at most once. Such terms are clearly multilinear in B_1,\ldots,B_n. \square

A simple consequence of this result is, for example, that

    \[\mathbf{E}p_A = \mathbf{E}\mu[v_1v_1^*,\ldots,v_nv_n^*] = \mu[\mathbf{E}v_1v_1^*,\ldots,\mathbf{E}v_nv_n^*],\]

where we have used multilinearity and the independence of the vectors v_i. Let us emphasize that the right-hand side in this expression cannot itself be expressed as the characteristic polynomial of \mathbf{E}A, for example. Nonetheless, we have achieved something very useful: we have expressed a convex combination of determinants in terms of the derivative of a determinant. It turns out that it is much easier to establish real-rootedness of derivatives of determinants, than it is to establish real-rootedness of convex combinations of determinants. Our next order of business is to understand why this is true.

Multivariate polynomials and real stability

Given positive semidefinite matrices B_1,\ldots,B_n, let us define the multivariate polynomial

    \[p(z,t_1,\ldots,t_n) = \det\Bigg(zI+\sum_{i=1}^nt_iB_i\Bigg).\]

The mixed characteristic polynomial \mu[B_1,\ldots,B_n](z) is obtained from p by taking derivatives, followed by fixing t_1,\ldots,t_n=0. We are going to prove real-rootedness of \mu[B_1,\ldots,B_n] by showing that the real-rootedness property is preserved by such operations.

Of course, it is not quite clear what we mean by this: p is a multivariate polynomial, so its roots are points in \mathbb{C}^n. The appropriate extension of real-rootedness to multivariate polynomials is as follows.

Definition. A multivariate polynomial q(z_1,\ldots,z_n) is called stable if it has no roots in the set \mathcal{H}=\{z\in\mathbb{C}^n:\text{Im}z_i>0\text{ for all }i\}, and is called real stable if in addition all its coefficients are real.

We are doing to establish the following facts:

  1. The polynomial p is real stable.
  2. Taking derivatives or fixing parameters preserves real stability.
  3. A univariate polynomial is real stable if and only if it is real-rooted.

As the (univariate) mixed characteristic polynomial \mu[B_1,\ldots,B_n] is obtained from p by taking derivatives and fixing parameters, this reasoning will show that \mu[B_1,\ldots,B_n] is real-rooted.

The third assertion is the easiest, and we prove it first.

Lemma. A univariate polynomial is real stable if and only if it is real-rooted.

Proof. Clearly a real-rooted polynomial is real stable. Conversely, suppose a univariate polynomial q is real stable. Then it has no roots z with \text{Im}z>0. However, if q(z)=0, then q(\bar z)=0 also. Therefore, there are also no roots with \text{Im}z<0. It follows that q must be real-rooted. \square

Next, we prove the preservation of real stability under taking derivatives and fixing parameters.

Lemma. Let q(z_1,\ldots,z_n) be a real stable polynomial and let t\in\mathbb{R}. Then

  1. q_t(z_1,\ldots,z_{n-1}):= q(z_1,\ldots,z_{n-1},t) is real stable.
  2. (1+t\frac{\partial}{\partial z_n})q(z_1,\ldots,z_n) is real stable.

Proof. We begin by proving the first assertion. Clearly q_t has real coefficients, so it suffices to show that it is stable. Suppose to the contrary that q_t has a root z^*=(z^*_1,\ldots,z^*_{n-1}) in \mathcal{H}. As the roots of a polynomial vary continuously with its coefficients, the polynomial q_{t+i\varepsilon} must also have a root in \mathcal{H} for \varepsilon>0 sufficiently small. But this contradicts the real stability of q.

Remark. A careful proof of the continuity of roots of complex polynomials can be given using Rouché’s theorem. This is left as an exercise to the interested reader.

We now turn to the second assertion. It is again obvious that the derivative of the polynomial has real coefficients, so it remains to establish that it is stable. Assume to the contrary that

    \[(1+t\frac{\partial}{\partial z_n})q(z_1^*,\ldots,z_n^*)=0\]

for some z^*=(z^*_1,\ldots,z^*_n) in \mathcal{H}.

Define the univariate polynomial \tilde q(z)= q(z^*_1,\ldots,z^*_{n-1},z), and denote by w_1,\ldots,w_d\in\mathbb{C} the roots of \tilde q. As q is real stable, we must have \tilde q(z_n^*)\ne 0 and \text{Im}w_i\le 0 for all i. Moreover, we can express \tilde q as

    \[\tilde q(z) = c\prod_{i=1}^d (z-w_i).\]

By assumption, we have

    \[0 = \tilde q(z_n^*)+t \tilde q'(z_n^*)    = \tilde q(z_n^*) (1+t(\log\tilde q)'(z_n^*))\]

(as \log\tilde q(z) is analytic in a neighborhood of z_n^*). But note that

    \[\log \tilde q(z) = \log c+\sum_{i=1}^d\log(z-w_i),\]

so

    \[0 = 1+\sum_{i=1}^d\frac{t}{z_n^*-w_i}.\]

Taking the imaginary part of this expression yields

    \[0 = \sum_{i=1}^d \frac{\text{Im}(z_n^*)-\text{Im}(w_i)}{|z_n^*-w_i|^2}.\]

But \text{Im}(z_n^*)>0 and \text{Im}(w_i)\le 0, so we have achieved a contradiction. \square

Finally, we prove that the polynomial p is in fact real stable.

Lemma. Suppose that B_1,\ldots,B_n are positive semidefinite matrices. Then the polynomial

    \[p(z,t_1,\ldots,t_n) = \det\Bigg(zI+\sum_{i=1}^nt_iB_i\Bigg).\]

is real stable.

Proof. To show that p is real, note that

    \[\overline{\det\Bigg(zI+\sum_{i=1}^nt_iB_i\Bigg)} = \det\Bigg(\bar zI+\sum_{i=1}^n\bar t_i B_i^*\Bigg).\]

As B_i are Hermitian, this implies that p(z,t_1,\ldots,t_n) is real whenever the parameters z,t_1,\ldots,t_n are all real. Thus the coefficients of p must be real, and it remains to prove stability.

To this end, suppose that p has a root z^*,t_1^*,\ldots,t_n^* in \mathcal{H}. Then the matrix z^*I+\sum_{i=1}^nt_i^*B_i is not invertible, so there exists a vector v\ne 0 such that

    \[z^*\|v\|^2+\sum_{i=1}^nt_i^*v^*B_iv=0.\]

But \|v\|^2>0 and v^*B_iv\ge 0, so the imaginary part of the left-hand size must be strictly positive by the assumption that the root z^*,t_1^*,\ldots,t_n^* is in \mathcal{H}. Thus we have achieved a contradiction. \square

Putting together the previous three results, we have shown the following.

Corollary. The mixed characteristic polynomial \mu[B_1,\ldots,B_n] is real-rooted whenever B_1,\ldots,B_n are positive semidefinite matrices.

Putting everything together

We are now ready to put together the various ideas developed in this post.

Proposition. Let v_1,\ldots,v_n be independent random vectors in \mathbb{C}^d whose distribution has finite support, and let A=\sum_{i=1}^nv_iv_i^*. Then

    \[\text{maxroot}(p_A) \le \text{maxroot}(\mathbf{E}p_A)\]

with positive probability.

Proof. We will apply the interlacing property step by step. Denote by \mathcal{V}_i the possible values that the random vector v_i can take. First, note that for every collection of convex weights \{\lambda_w\}_{w\in\mathcal{V}_n}

    \[\sum_{w\in\mathcal{V}_n}\lambda_w\,\mu[v_1^*v_1,\ldots,v_{n-1}^*v_{n-1},ww^*] = \mu\Bigg[v_1^*v_1,\ldots,v_{n-1}^*v_{n-1},\sum_{w\in\mathcal{V}_n}\lambda_w ww^*\Bigg]\]

by multilinearity. We have shown that the mixed characteristic polynomial on the right is real-rooted. Moreover, it is easily verified from their definition that the polynomials \mu[v_1^*v_1,\ldots,v_{n-1}^*v_{n-1},ww^*] are monic and have the same degree. Therefore, by the interlacing property, there exists a vector w_n\in\mathcal{V}_n (which may depend on the outcomes of v_1,\ldots,v_{n-1}) such that

    \[\text{maxroot}(\mu[v_1v_1^*,\ldots,v_{n-1}v_{n-1}^*,w_nw_n^*]) \le \text{maxroot}(\mu[v_1v_1^*,\ldots,v_{n-1}v_{n-1}^*,\mathbf{E}v_nv_n^*]).\]

We now proceed in precisely the same manner with the polynomial on the right-hand side of this inequality to establish the existence of a vector w_{n-1} (depending on v_1,\ldots,v_{n-2}) such that

    \begin{align*} &\text{maxroot}(\mu[v_1v_1^*,\ldots,v_{n-2}v_{n-2}^*,w_{n-1}w_{n-1}^*,\mathbf{E}v_nv_n^*]) \\ &\qquad\le \text{maxroot}(\mu[v_1v_1^*,\ldots,v_{n-2}v_{n-2}^*,\mathbf{E}v_{n-1}v_{n-1}^*,\mathbf{E}v_nv_n^*]). \end{align*}

By iterating this argument, we show that

    \[\text{maxroot}(\mu[w_1w_1^*,\ldots,w_{n}w_{n}^*]) \le \text{maxroot}(\mu[\mathbf{E}v_1v_1^*,\ldots,\mathbf{E}v_nv_n^*]).\]

But note that \mu[w_1w_1^*,\ldots,w_{n}w_{n}^*]=p_{\sum_{i=1}^nw_iw_i^*}, which is a single realization of p_A; whereas \mu[\mathbf{E}v_1v_1^*,\ldots,\mathbf{E}v_nv_n^*]=\mathbf{E}\mu[v_1v_1^*,\ldots,v_nv_n^*]=\mathbf{E}p_A by multilinearity of the mixed characteristic polynomial and as the random vectors v_i are independent. We have therefore completed the proof. \square

We have now completed the first step in our program. In the following lectures, we will concern ourselves the with second step: the control of \mathrm{maxroot}(\mathbf{E}p_A).

Remark. It is only fair to note that, even though the result we are after is a statement about random matrices, essentially no probability appears in the proof: the proof only finds one single realization of the random variable p_A whose maximal root is controlled as desired, and the statement that this happens with positive probability is largely cosmetic (this is a finite probability space, so any realization occurs with positive probability). This highlights, in some sense, the limitations of the probabilistic method in the present context: as the event we are after has very small probability, the usual methods of probabilistic reasoning are not efficient at finding such an event.

Lectures by Philippe Rigollet | Notes by Junwei Lu and Ramon van Handel

05. October 2014 by Ramon van Handel
Categories: Roots of polynomials | Comments Off on Lectures 2 and 3. Proof of Kadison-Singer (1)

css.php