Lecture 4. Proof of Kadison-Singer (2)

Recall that we are in the middle of proving the following result.

Theorem. Let v_1,\ldots,v_n be independent random vectors in \mathbb{C}^d whose distribution has finite support. Suppose that

    \[\mathbf{E}\sum_{i=1}^nv_iv_i^* = I, \qquad \mathbf{E}\|v_i\|^2 \le \varepsilon\mbox{ for all }i.\]


    \[\left\|\sum_{i=1}^nv_iv_i^*\right\| \le (1+\sqrt{\varepsilon})^2\]

with positive probability.

Define the random matrix A and its characteristic polynomial p_A as

    \[A = \sum_{i=1}^n A_i,\qquad A_i= v_iv_i^*,\qquad p_A(z) = \mathrm{det}(zI-A).\]

As A is positive definite, one representation of the matrix norm is \|A\|=\mathrm{maxroot}(p_A). The proof of the above theorem consists of two main parts:

  1. Show that \mathrm{maxroot}(p_A)\le \mathrm{maxroot}(\mathbf{E}p_A) with positive probability.
  2. Show that \mathrm{maxroot}(\mathbf{E}p_A)\le (1+\sqrt{\varepsilon})^2.

The first statement was proved in the previous two lectures. The goal of this lecture and the next is to prove the second statement.

Mixed characteristic polynomials

The key tool that was used in the previous lectures in the mixed characteristic polynomial of the matrices A_1,\ldots,A_n, which we defined as follows:

    \[\mu[A_1, \ldots, A_n](z) := \prod_{i=1}^n \bigg(1- \frac{\partial}{\partial t_i}\bigg) \det\Bigg( zI + \sum_{i=1}^n t_i A_i\Bigg)\Bigg\vert_{t_1,\ldots,t_n = 0}.\]

We recall the two crucial observations used in the previous lectures:

  1. \mu[A_1,\ldots,A_n] is multilinear in A_1,\ldots,A_n.
  2. If A_1,\ldots,A_n have rank one (this is essential!), then p_{\sum_{i=1}^nA_i}(z)=\mu[A_1,\ldots,A_n](z).

As in our case A_1,\ldots,A_n are independent random matrices of rank one, this implies

    \[\mathbf{E}p_A = \mu[\mathbf{E}A_1,\ldots,\mathbf{E}A_n] =     \mu[B_1,\ldots,B_n],\]

where we define for simplicity B_i:=\mathbf{E}A_i.

Note that the matrices B_i are no longer of rank one (in particular, it is not true that \mathbf{E}p_A is the characteristic polynomial of \sum_{i=1}^nB_i: that would make the remainder of the proof quite trivial!) However, by assumption, the matrices B_i satisfy

    \[\sum_{i=1}^nB_i=I,\qquad     B_i\succeq 0 \quad\mbox{and}\quad\mathrm{Tr}[B_i]\le\varepsilon \quad\mbox{for all }i.\]

To complete the proof, our aim will be to show that for any matrices B_1,\ldots,B_n with these properties, the maximal root of \mu[B_1,\ldots,B_n] can be at most (1+\sqrt{\varepsilon})^2.

Theorem. Let B_1,\ldots,B_n\succeq 0 be positive semidefinite matrices in \mathbb{C}^{d\times d}. Suppose that

    \[\sum_{i=1}^nB_i=I,\qquad     \mathrm{Tr}[B_i]\le\varepsilon\mbox{ for all }i.\]


    \[\text{maxroot}(\mu[B_1,\ldots,B_n]) \le (1+\sqrt{\varepsilon})^2.\]

As the matrices B_i sum to the identity \sum_{i=1}^nB_i=I, it will be convenient to rewrite the definition of the mixed characteristic polynomial slightly using this property:

    \begin{align*} \mu[B_1, \ldots, B_n](z) &:= \prod_{i=1}^n \bigg(1- \frac{\partial}{\partial t_i}\bigg) \det\Bigg( zI + \sum_{i=1}^n t_i B_i\Bigg)\Bigg\vert_{t_1,\ldots,t_n = 0} \\ &= \prod_{i=1}^n \bigg(1- \frac{\partial}{\partial t_i}\bigg) \det\Bigg(\sum_{i=1}^n (z+t_i) B_i\Bigg)\Bigg\vert_{t_1,\ldots,t_n = 0} \\ &= \prod_{i=1}^n \bigg(1- \frac{\partial}{\partial t_i}\bigg) \det\Bigg(\sum_{i=1}^n z_i B_i\Bigg)\Bigg\vert_{z_1,\ldots,z_n = z} \\ &=: \prod_{i=1}^n \bigg(1- \frac{\partial}{\partial t_i}\bigg) p(z_1,\ldots,z_n)\bigg\vert_{z_1,\ldots,z_n = z}, \end{align*}

where we defined the multivariate polynomial

    \[p(z_1,\ldots,z_n):=\det\Bigg(\sum_{i=1}^n z_i B_i\Bigg).\]

Our aim is to show that \prod_{i=1}^n(1+\frac{\partial}{\partial z_i})p(z_1,\ldots,z_n)\ne 0 whenever z_1=\cdots=z_n=z>(1+\sqrt{\varepsilon})^2.

How should we go about proving such a property? The first observation we should make is that the polynomial p itself has no roots in an even larger region.

Lemma. p(z_1,\ldots,z_n)\ne 0 whenever z_1,\ldots,z_n>0.

Proof. Note that

    \[\sum_{i=1}^n z_iB_i - \big(\min_iz_i\big)I = \sum_{i=1}^n \big(z_i-\min_iz_i\big)B_i\succeq 0,\]

as z_i-\min_iz_i\ge 0 and B_i\succeq 0. Thus \sum_{i=1}^n z_iB_i is nonsingular whenever \min_iz_i>0. \square

One might now hope that we can follow a similar idea to what we did in the previous lectures. In the previous lectures, we wanted to show that the mixed characteristic polynomial is real-rooted. To this end, we first showed that the determinant in the definition of the mixed characteristic polynomial is real stable (the appropriate multivariate generalization or real-rootedness), and then we showed that this property is preserved when we take derivatives of the form 1-\frac{\partial}{\partial z_i}.

In an ideal world, we would show precisely the same thing here: if the polynomial p has no roots in the region \{z_1,\ldots,z_n>0\}, then perhaps (1-\frac{\partial}{\partial z_i})p has no roots there as well, etc. If this were true, then we would in fact have shown that \mu[B_1,\ldots,B_n](z) has no roots in the region z>0. Unfortunately, this is clearly false: that would imply an upper bound of zero for the matrix norm in our original problem (impossible in general!) Nonetheless, this general strategy proves to be the right approach. It is indeed not true that the region with no roots is preserved by the operation 1-\frac{\partial}{\partial z_i}. However, it turns out that we will be able to control by how much this region shrinks in successive applications of 1-\frac{\partial}{\partial z_i}. The control that we will develop will prove to be sufficiently sharp in order to obtain the desired result.

Evidently, the key problem that we face is to understand what happens to the roots of a polynomial when we apply an operation of the form 1-\frac{\partial}{\partial z_i}. In the remainder of this lecture, we will investigate this problem in a univariate toy setting. This univariate approach will not be sufficiently sharp to yield the desired result, but will give us significant insight into the ingredients that are needed in the proof. In the following lecture, we will complete the proof by developing a multivariate version of these ideas.

Barrier functions: a toy problem

Let us put aside for the moment our original problem, and consider the following simplified setting. Let q(z) be a univariate polynomial with real roots x_1\ge x_2\ge\cdots\ge x_d. What can we say about the locations of the roots of the polynomial (1-\frac{\partial}{\partial z})q(z)? Clearly, the latter polynomial has a root at z if and only if

    \[\bigg(1-\frac{\partial}{\partial z}\bigg)q(z) = 0 \qquad\Longleftrightarrow\qquad q(z) = \frac{\partial q(z)}{\partial z} \qquad\Longleftrightarrow\qquad \frac{\partial}{\partial z}\log q(z)=1.\]

The function \frac{\partial}{\partial z}\log q(z) is very interesting; let us investigate what it looks like. Note that we can always represent a polynomial in terms of its roots as

    \[q(z) = c\prod_{i=1}^d (z-x_i)\]

for some constant c. We can therefore readily compute

    \[\frac{\partial}{\partial z}\log q(z) = \sum_{i=1}^d \frac{1}{z-x_i}.\]

As we assumed that the roots x_i of q are real, the function \frac{\partial}{\partial z}\log q(z) looks something like this:

Rendered by QuickLaTeX.com

The function \frac{\partial}{\partial z}\log q(z) blows up at the roots x_1\ge x_2\ge\cdots\ge x_d of q; this function is therefore referred to as the barrier function, in analogy with a similar notion in optimization. The values where it is equal to one determine the locations of the roots y_1\ge y_2\ge\cdots\ge y_d of (1-\frac{\partial}{\partial z})q(z). It follows immediately from the shape of the barrier function that the roots of our two polynomials are interlaced, as can be seen by inspecting the above figure. Note, moreover, that y_1>x_1, so that it is unfortunately the case that the maximal root of a polynomial increases under the operation 1-\frac{\partial}{\partial z}. However, we can control the location of the maximal root if we are able to control the barrier function.

Let us now return to our matrix problem and try to see how this idea can be useful. We will consider the first step towards establishing the desired result: what happens to the locations of the roots of the polynomial p(z_1,\ldots,z_n) when we apply a single operation 1-\frac{\partial}{\partial z_1}? (We will ultimately want to iterate such an argument for every coordinate to control the roots of the mixed characteristic polynomial.)

To this end, let us fix t>0 for the time being, and define

    \[q(z) = p(z,t,t,\ldots,t) = \det\Bigg(zB_1+t\sum_{i=2}^n B_i\Bigg).\]

We have already shown that \text{maxroot}(q)\le 0. In order to control the roots of (1-\frac{\partial}{\partial z_1})p(z_1,\ldots,z_n), we need to control the barrier function \frac{\partial}{\partial z}\log q(z). The stroke of luck that we have at this point is that the derivative of the logarithm of a determinant is a remarkably nice object.

Lemma (Jacobi formula). If A+tB is invertible, then \frac{d}{dt}\log\det(A+tB) = \mathrm{Tr}((A+tB)^{-1}B).

Proof. First, we note that

    \begin{align*} \frac{d}{dt}\log\det(A+tB) &= \frac{d}{d\varepsilon}\log\det(A+tB+\varepsilon B)\bigg|_{\varepsilon=0} \\ &= \frac{d}{d\varepsilon}\log(\det(A+tB)\det(I+\varepsilon(A+tB)^{-1} B))\bigg|_{\varepsilon=0} \\ & = \frac{d}{d\varepsilon}\log\det(I+\varepsilon(A+tB)^{-1} B)\bigg|_{\varepsilon=0}. \end{align*}

It therefore suffices to prove that

    \[\frac{d}{d\varepsilon}\log\det(I+\varepsilon M)\bigg|_{\varepsilon=0}=\mathrm{Tr}[M].\]

To this end, we use directly the definition of the determinant:

    \[\frac{d}{d\varepsilon}\det(I+\varepsilon M) = \sum_{\sigma} (-1)^{|\sigma|} \sum_{j=1}^d M_{j\sigma(j)} \prod_{i\ne j}(\delta_{i\sigma(i)}+\varepsilon M_{i\sigma(i)}),\]

where the sum is over permutations \sigma. Setting \varepsilon=0, we see that the only term that survives is the one that satisfies i=\sigma(i) for all i\ne j, that is, the identity permutation. We therefore obtain (as \det(I)=1)

    \[\frac{d}{d\varepsilon}\log\det(I+\varepsilon M)\bigg|_{\varepsilon=0}= \frac{1}{\det(I+\varepsilon M)}\frac{d}{d\varepsilon} \det(I+\varepsilon M)\bigg|_{\varepsilon=0}= \sum_{j=1}^d M_{jj} = \mathrm{Tr}[M].\]

This completes the proof. \square

Using the Jacobi formula, we immediately find

    \[\frac{\partial}{\partial z}\log q(z)\bigg|_{z=t} = \mathrm{Tr}[({\textstyle t\sum_{i=1}^nB_i})^{-1}B_1] = \mathrm{Tr}[t^{-1}B_1]\le \frac{\varepsilon}{t}.\]

It therefore follows that

    \[\bigg(1-\frac{\partial}{\partial z_1}\bigg)p(z_1,\ldots,z_n)\bigg|_{z_1,\ldots,z_n=t}\ne 0 \quad\mbox{for all}\quad t>\varepsilon.\]

This brings us one step closer to the desired conclusion!

We have seen above that we can control the roots of p(z_1,\ldots,z_n) and (1-\frac{\partial}{\partial z_1})p(z_1,\ldots,z_n). Of course, the next step will be to control the roots of (1-\frac{\partial}{\partial z_2})(1-\frac{\partial}{\partial z_1})p(z_1,\ldots,z_n). Unfortunately, a direct application of the barrier function method does not lend itself well to iteration. The problem is that while we could easily control the barrier function of p(z_1,\ldots,z_n) using the Jacobi formula, it is not so obvious how to control the barrier function of (1-\frac{\partial}{\partial z_1})p(z_1,\ldots,z_n).

Instead, we are going to develop in the following lecture a multivariate version of the barrier argument. In each consecutive application of 1-\frac{\partial}{\partial z_i}, we will control the region of \mathbb{R}^n in which the polynomial has no roots; at the same time, we will also obtain control over the barrier function of the polynomial in the next stage. When implemented properly, this procedure will allow us to iterate the barrier argument without requiring explicit control of the barrier function except at the first stage.

Many thanks to Mark Cerenzia for scribing this lecture!

19. October 2014 by Ramon van Handel
Categories: Roots of polynomials | Comments Off on Lecture 4. Proof of Kadison-Singer (2)