Lecture 5. Proof of Kadison-Singer (3)

This is the last installment of the proof of the Kadison-Singer theorem. After several plot twists, we have finally arrived at the following formulation of the problem. If you do not recall how we got here, this might be a good time to go back and read the previous posts.

Theorem 1. Let B_1,\ldots,B_n\succeq 0 be positive semidefinite matrices in \mathbb{C}^{d\times d} such that

    \[\sum_{i=1}^nB_i=I,\qquad \mathrm{Tr}[B_i]\le\varepsilon\text{ for all }i.\]

Define the multivariate polynomial



    \[\prod_{i=1}^n\bigg(1-\frac{\partial}{\partial z_i}\bigg)p(z_1,\ldots,z_n)\ne 0\quad\text{whenever }z_1,\ldots,z_n>(1+\sqrt{\varepsilon})^2.\]

The entire goal of this lecture is to prove the above theorem.

Let us recall that we already proved such a result without the derivatives 1-\frac{\partial}{\partial z_i} (this is almost trivial).

Lemma 1. p(z_1,\ldots,z_n)\ne 0 whenever z_1,\ldots,z_n>0.

The key challenge is to understand what happens to the region without roots under operations of the form 1-\frac{\partial}{\partial z_i}. To this end, we introduced in the previous lecture the barrier function

    \[\frac{\partial}{\partial z_i}\log p(z_1,\ldots,z_n).\]

It is easily seen that the roots of (1-\frac{\partial}{\partial z_i})p(z) coincide with the points where the barrier function equals one. As the barrier function is easily bounded in our setting using the Jacobi formula for determinants, we can immediately bound the roots of (1-\frac{\partial}{\partial z_i})p(z).

Unfortunately, this simple argument is only sufficient to apply the operation 1-\frac{\partial}{\partial z_i} once, while we must apply n such operations. The difficulty is that while it is easy to control the barrier function of the polynomial p(z), we do not know how to control directly the barrier function of the polynomial (1-\frac{\partial}{\partial z_i})p(z). To solve this problem, we must develop a more sophisticated version of the barrier argument that provides control not only of the roots of the derivative polynomial, but also of its barrier function: once we accomplish this, we will be able to iterate this argument to complete the proof.

The multivariate barrier argument

In the multivariate barrier argument, we will keep track of an octant in which the multivariate polynomial of interest has no roots. We will use the following terminology.

Definition. A point x=(x_1,\ldots,x_n)\in\mathbb{R}^n is said to bound the roots of the multivariate polynomial q(z_1,\ldots,z_n) if q(z_1,\ldots,z_n)\ne 0 whenever z_1>x_1, z_2>x_2, \ldots, z_n>x_n.

This notion is illustrated in the following figure:

Rendered by QuickLaTeX.com

We can now formulate the multivariate barrier argument that is at the heart of the proof.

Theorem 2. Let q(z_1,\ldots,z_n) be a real stable polynomial. Suppose that x bounds the roots of q, and

    \[\frac{\partial}{\partial x_j}\log q(x_1,\ldots,x_n)\le 1-\frac{1}{\delta}\]

for some \delta>0 and j\in[n]. Then x bounds the roots of (1-\frac{\partial}{\partial z_j})q(z_1,\ldots,z_n), and

    \[\frac{\partial}{\partial x_i}\log\bigg(1-\frac{\partial}{\partial x_j}\bigg)q(x_1,\ldots,x_{j-1},x_j+\delta,x_{j+1},\ldots,x_n) \le \frac{\partial}{\partial x_i}\log q(x_1,\ldots,x_n)\]

for all i\in[n].

Note that the barrier function assumption

    \[\frac{\partial}{\partial x_j}\log q(x_1,\ldots,x_n)<1\]

would already be enough to ensure that x bounds the roots of (1-\frac{\partial}{\partial z_j})q(z_1,\ldots,z_n); this is essentially what we proved in the last lecture (combined with a basic monotonicity property of the barrier function that we will prove below). The key innovation in Theorem 2 is that we do not only bound the roots of (1-\frac{\partial}{\partial z_j})q(z_1,\ldots,z_n), but we also control its barrier function \frac{\partial}{\partial z_i}\log(1-\frac{\partial}{\partial z_j})q(z_1,\ldots,z_n). This allows us to iterate this theorem over and over again to add more derivatives to the polynomial. To engineer this property, we must build some extra room (a gap of 1/\delta) into our bound on the barrier function of q. Once we understand the proof, we will see that this idea arises very naturally.

Up to the proof of Theorem 2, we now have everything we need to prove Theorem 1 (and therefore the Kadison-Singer theorem). Let us complete this proof first, so that we can concentrate for the remainder of this lecture on proving Theorem 2.

Proof of Theorem 1. In the previous lecture, we used the Jacobi formula to show that

    \[\frac{\partial}{\partial z_i}\log p(z_1,\ldots,z_n)\bigg|_{z_1,\ldots,z_n=t} \le \frac{\varepsilon}{t}\]

for all i. To start the barrier argument, let us therefore choose x_1,\ldots,x_n=t>0 and \delta>0 such that

    \[\frac{\varepsilon}{t} \le 1-\frac{1}{\delta}.\]

Initially, by Lemma 1, we see that x bounds the roots of p and that the barrier function satisfies the assumption of Theorem 2. That is, we start in the following situation:

Rendered by QuickLaTeX.com

Applying Theorem 2, we find that x still bounds the roots of (1-\frac{\partial}{\partial z_1})p(z_1,\ldots,z_n). Moreover, we have control over the barrier function of this derivative polynomial, however at a point that lies above x:

    \[\frac{\partial}{\partial x_i}\log\bigg(1-\frac{\partial}{\partial x_1}\bigg)p(x_1+\delta,x_2,\ldots,x_n) \le \frac{\partial}{\partial x_i}\log p(x_1,\ldots,x_n) \le 1-\frac{1}{\delta}\]

for all i. This is illustrated in the following figure:

Rendered by QuickLaTeX.com

We now have everything we need to apply Theorem 2 again to the polynomial \big(1-\frac{\partial}{\partial z_1}\big) p(z) (recall that p and its derivative polynomials are all real stable, as we proved early on in these lectures). In the next step, we control the barrier in the z_2-direction to obtain, in a picture:

Rendered by QuickLaTeX.com

We can now repeat this process in the z_3-direction, etc. After n iterations, we have evidently proved that

    \[\prod_{i=1}^n\bigg(1-\frac{\partial}{\partial z_i}\bigg)p(z_1,\ldots,z_n)\ne 0\quad\text{whenever }z_1,\ldots,z_n>t+\delta.\]

All that remains is to choose t>0 and \delta>0. To optimize the bound, we minimize t+\delta subject to the constraint \frac{\varepsilon}{t} \le 1-\frac{1}{\delta}. This is achieved by t=\sqrt{\varepsilon}+\varepsilon and \delta=1+\sqrt{\varepsilon}, which completes the proof. \square

Of course, it remains to prove Theorem 2. It turns out that this is quite easy, once we develop some (nontrivial!) properties of the barrier functions of real-stable polynomials.

Some properties of barrier functions

Throughout this section, let q(z_1,\ldots,z_n) be a real stable polynomial. As q is real stable, the univariate polynomial z_1\mapsto q(z_1,z_2,\ldots,z_n) is real-rooted, and we denote its roots by

    \[u_1(z_2,\ldots,z_n)\ge u_2(z_2,\ldots,z_n)\ge\cdots\ge u_d(z_2,\ldots,z_n).\]

We can then represent the polynomial q as

    \[q(z_1,\ldots,z_n) = c(z_2,\ldots,z_n)\prod_{i=1}^d(z_1-u_i(z_2,\ldots,z_n)),\]

and therefore the barrier function as

    \[\frac{\partial}{\partial z_1}\log q(z_1,\ldots,z_n) = \sum_{i=1}^d\frac{1}{z_1-u_i(z_2,\ldots,z_n)}.\]

Some simple properties of the barrier function follow easily from this expression.

Lemma 2. Suppose that z\in\mathbb{R}^n bounds the roots of q. Then the barrier function t\mapsto\frac{\partial}{\partial z_1}\log q(z_1+t,z_2\ldots,z_n) is positive, decreasing and convex for t\ge 0.

Proof. As z bounds the roots of q, we have z_1>u_i(z_2,\ldots,z_n) for all i. Thus clearly the barrier function is positive. To show that it is decreasing, we note that

    \[\frac{\partial}{\partial z_1}\frac{\partial}{\partial z_1}\log q(z_1,z_2,\ldots,z_n) = -\sum_{i=1}^d\frac{1}{(z_1-u_i(z_2,\ldots,z_n))^2}<0.\]

Likewise, convexity follows as

    \[\frac{\partial^2}{\partial z_1^2}\frac{\partial}{\partial z_1}\log q(z_1,z_2,\ldots,z_n) = \sum_{i=1}^d\frac{2}{(z_1-u_i(z_2,\ldots,z_n))^3}>0\]

when z bounds the roots of q. \square

The main property that we need in order to prove Theorem 2 is that these monotonicity and convexity properties of the barrier function also hold when we vary other coordinates. This seems innocuous, but is actually much harder to prove (and requires a clever idea!)

Lemma 3. Suppose that z\in\mathbb{R}^n bounds the roots of q. Then the barrier function t\mapsto\frac{\partial}{\partial z_1}\log q(z_1,z_2+t,z_3,\ldots,z_n) is positive, decreasing and convex for t\ge 0.

Remark. There is of course nothing special about the use (for notational simplicity) of the first two coordinates z_1,z_2 in Lemmas 2 and 3. We can identically consider the barrier function in any direction z_i and obtain monotonicity and convexity in any other direction z_j, as will be needed in the proof of Theorem 2. In fact, as the remaining coordinates z_3,\ldots,z_n are frozen, Lemmas 2 and 3 are really just statements about the properties of bivariate real stable polynomials.

The difficulty in proving Lemma 3 is that while \frac{\partial}{\partial z_1}\log q(z) behaves very nicely as a function of z_1, it is much less clear how it behaves as a function of z_2: to understand this, we must understand how the roots u_i(z_2,\ldots,z_n) vary as a function of z_2. In general, this might seem like a hopeless task. Surprisingly, however, the roots of real stable polynomials exhibit some remarkable behavior.

Lemma 4. Let q(x,y) be a bivariate real-stable polynomial, and let x_1(y)\ge x_2(y)\ge\cdots\ge x_d(y) be the roots of x\mapsto q(x,y). Then each y\mapsto x_i(y) is a decreasing function.

This is enough to prove Lemma 3.

Proof of Lemma 3. That the barrier function is positive follows precisely as in the proof of Lemma 2 (using the fact that if z bounds the roots of q, then (z_1,z_2+t,z_3,\ldots,z_n) also bounds the roots of q).

It remains to establish monotonicity and convexity. There is no loss of generality in assuming that q is a bivariate polynomial in z_1,z_2 only, and that we can write

    \[q(z_1,z_2) = c(z_1)\prod_{i=1}^d (z_2-v_i(z_1))\]

for roots v_1(z_1)\ge v_2(z_1)\ge\cdots \ge v_d(z_1). We can therefore write

    \[\frac{\partial}{\partial z_2} \frac{\partial}{\partial z_1}\log q(z_1,z_2) = \frac{\partial}{\partial z_1}\sum_{i=1}^d \frac{1}{z_2-v_i(z_1)}, \qquad \frac{\partial^2}{\partial z_2^2} \frac{\partial}{\partial z_1}\log q(z_1,z_2) = -\frac{\partial}{\partial z_1}\sum_{i=1}^d \frac{1}{(z_2-v_i(z_1))^2}.\]

But as z bounds the roots of q, we have z_2>v_i(z_1) for all i. As z_1\mapsto v_i(z_1) is also decreasing, we have

    \[\frac{\partial}{\partial z_2} \frac{\partial}{\partial z_1}\log q(z_1,z_2) <0, \qquad \frac{\partial^2}{\partial z_2^2} \frac{\partial}{\partial z_1}\log q(z_1,z_2) > 0,\]

which is precisely what we set out to show. \square

We must still prove Lemma 4. This seems to be quite a miracle: why should such a property be true? To get some intuition, let us first consider an apparently very special case where q is a polynomial of degree one, that is, q(x,y)=ax+by+c. The root of the polynomial x\mapsto q(x,y) is clearly given by

    \[x_1(y) = -\frac{b}{a}y-\frac{c}{a}.\]

Suppose that y\mapsto x_1(y) is nondecreasing, that is, that b and a have opposite sign. Then q cannot be real stable! Indeed, for any real root (x,y), the point (x+i/a,y-i/b) is also a root, which violates real stability if a and b have opposite sign (as both coordinates have strictly positive or negative imaginary parts). Thus for polynomials of degree one, real stability trivially implies that y\mapsto x_1(y) is decreasing.

While this is very intuitive, it also seems at first sight like it does not help much in understanding nontrivial polynomials of higher degree. Nonetheless, this simple observation proves to be the key to understanding the case of general polynomials. This idea is that the property that y\mapsto x_i(y) is decreasing is local. If there is a point at which this property is violated, then we can Taylor expand the polynomial around that point to reduce to the degree one case, and thereby obtain a contradiction.

Proof of Lemma 4. By the implicit function theorem, the maps y\mapsto x_i(y) are continuous and C^1 everywhere except at a finite number of points. Therefore, if the conclusion fails, then there must exist a root i and a (nondegenerate) point y^* such that


We will use this to bring ourselves to a contradiction.

Let us write x^*=x_i(y^*), so (x^*,y^*) is a root of q. We Taylor expand q around this point. Note that

    \[q(x,y) = c(y)\prod_{j=1}^d (x-x_j(y)),\]

so that

    \[\frac{\partial}{\partial x}q(x,y)\bigg|_{x^*,y^*} = c(y)\sum_{k=1}^d\prod_{j\ne k} (x-x_j(y))\bigg|_{x^*,y^*} = c(y^*)\prod_{j\ne i} (x^*-x_j(y^*)),\]

where we have used that x^*-x_i(y^*)=0. Similarly, we obtain

    \[\frac{\partial}{\partial y}q(x,y)\bigg|_{x^*,y^*} = -\frac{dx_i(y)}{dy}\bigg|_{y=y^*}c(y^*) \prod_{j\ne i} (x^*-x_j(y^*)).\]

As q(x^*,y^*)=0, we obtain the Taylor expansion

    \[q(x,y) = \alpha\bigg\{(x-x^*)-\beta(y-y^*)\bigg\} +o(|x-x^*|+|y-y^*|)\]

for a suitable constant \alpha\in\mathbb{R} and where \beta>0.

We now conclude by a perturbation argument. Define the univariate polynomials

    \[\tilde q_\delta(x) := \delta^{-1} q(\delta x+x^*,\delta i+y^*).\]

Letting \delta\downarrow 0, we obtain \tilde q_0(x) = \alpha(x-i\beta), which evidently has a root \tilde x=i\beta with strictly positive real part. By the continuity of roots of polynomials, \tilde q_\delta(x) must still have a root with strictly positive real part when \delta>0 is sufficiently small (this follows readily using Rouché’s theorem). But this implies that q(x,y) has a root with \text{Im}(x)>0 and \text{Im}(y)>0, contradicting real stability. \square

Conclusion of the proof

All that remains is to finally prove Theorem 2. The hard work is behind us: with the monotonicity and convexity properties of Lemmas 2 and 3 in hand, the proof of Theorem 2 is straightforward.

Proof of Theorem 2. As the barrier function is coordintewise decreasing, the assumption

    \[\frac{\partial}{\partial x_j}\log q(x_1,\ldots,x_n)< 1\]

implies that

    \[\frac{\partial}{\partial z_j}\log q(z_1,\ldots,z_n)< 1\quad\text{whenever }z_1>x_1,\ldots,z_n>x_n.\]

It follows immediately that x bounds the roots of \big(1-\frac{\partial}{\partial z_j}\big)q(z_1,\ldots,z_n).

Let us now control the barrier function of \big(1-\frac{\partial}{\partial z_j}\big)q(z_1,\ldots,z_n). As q is positive above the roots of q,

    \[\bigg(1-\frac{\partial}{\partial z_j}\bigg)q(z_1,\ldots,z_n) = q(z_1,\ldots,z_n)\bigg(1-\frac{\partial}{\partial z_j}\log q(z_1,\ldots,z_n)\bigg).\]

We can therefore write the barrier function as

    \[\frac{\partial}{\partial z_i}\log \bigg(1-\frac{\partial}{\partial z_j}\bigg)q(z_1,\ldots,z_n) = \frac{\partial}{\partial z_i}\log q(z_1,\ldots,z_n) + \frac{-\frac{\partial^2}{\partial z_i\partial z_j}\log q(z_1,\ldots,z_n)}{1-\frac{\partial}{\partial z_j}\log q(z_1,\ldots,z_n)}.\]

Note that as the barrier function is decreasing, the numerator in this expression is positive. Moreover, by the assumption of Theorem 2 and monotonicity of the barrier function, we have 1-\frac{\partial}{\partial z_j}\log q(z_1,\ldots,z_n) \ge \frac{1}{\delta} whenever z_i\ge x_i. We can therefore estimate

    \begin{align*} &\frac{\partial}{\partial x_i}\log\bigg(1-\frac{\partial}{\partial x_j}\bigg)q(x_1,\ldots,x_{j-1},x_j+\delta,x_{j+1},\ldots,x_n) \\ &\le \frac{\partial}{\partial x_i}\log q(x_1,\ldots,x_{j-1},x_j+\delta,x_{j+1},\ldots,x_n) - \delta\frac{\partial}{\partial x_j} \frac{\partial}{\partial x_i}\log q(x_1,\ldots,x_{j-1},x_j+\delta,x_{j+1},\ldots,x_n) \\ &\le \frac{\partial}{\partial x_i}\log q(x_1,\ldots,x_n), \end{align*}

where we have used convexity of the barrier function (that is, we used the first-order condition g(t+\delta)-g(t)\le \delta g'(t+\delta) for the convexity of a function g). And Kadison-Singer is now proved. \square

Epilogue: our presentation of the proof of the Kadison-Singer theorem has largely followed the approach from the blog post by T. Tao, which simplifies some of the arguments in the original paper by A. W. Marcus, D. A. Spielman, and N. Srivastava. Both are very much worth reading!

03. November 2014 by Ramon van Handel
Categories: Roots of polynomials | Comments Off on Lecture 5. Proof of Kadison-Singer (3)