Lecture 1. Introduction

The goal of our seminar this semester is to try to understand the unexpected application of the geometry of roots of polynomials to address probabilistic problems. Such methods have recently led to the resolution of several important questions, and could potentially be very useful in many other problems as well. In the following lectures, we will work in detail through some recent problems in which the geometry of polynomials arises in a crucial manner in the proof.

Kadison-Singer, random matrices, and polynomials

In 1959, Kadison and Singer made a fundamental conjecture about the extension of pure states on C^* algebras. While the original conjecture was purely qualitative, it has since been reformulated in many different ways. These different formulations prove to have applications in various areas: see the nice survey by Cassaza and Tremain. After many decades, the Kadison-Singer conjecture was finally proved in a paper by Marcus, Spielman, and Srivastava. They prove a quantitative formulation of the conjecture, due to Weaver, that can be stated as an elementary (deterministic!) property of vectors in \mathbb{C}^n.

Theorem (formerly Conjecture) There exist universal constants \delta,\varepsilon > 0 such that, if n and d are any positive integers and w_1,\ldots,w_n \in \mathbb{C}^d are vectors such that

    \[\sum_{i=1}^n w_iw_i^* = I\qquad\mbox{and}\qquad \|w_i\|^2\le\delta\mbox{ for all }i,\]

then we can partition \{1,\ldots,n\} into disjoint subsets S and S^c such that

    \begin{align*} \left\|\sum_{i \in S}w_iw_i^*\right\|\le 1-\varepsilon,\qquad \left\|\sum_{i \in S^c}w_iw_i^*\right\|\le 1-\varepsilon, \end{align*}

where \|M\| denotes the spectral norm of the matrix M.

This result appears at first sight to be almost obvious: in the scalar case d=1, the corresponding statement is completely elementary. To see this, let x_1,\ldots,x_n\ge 0 satisfy \sum_ix_i=1 and x_i\le\delta for all i. The only way that we can fail to find a subset S such that \sum_{i\in S}x_i\le 1-\varepsilon and \sum_{i\in S^c}x_i\le 1-\varepsilon is when the sizes of the x_i are very imbalanced (e.g., if x_1=1 and x_2,\ldots,x_n=0), and this is ruled out by the assumption x_i\le\delta. It seems natural to expect that a similar conclusion follows in any dimension d.

Remark. To make this argument precise, let us argue by contradiction. Suppose that for every set S, we have either \sum_{i\in S}x_i> 1-\varepsilon or \sum_{i\in S^c}x_i> 1-\varepsilon. The first possibility occurs when S=\{1,\ldots,n\} and the second possibility occurs when S=\varnothing. By adding points one by one, we see that there must exist a set S and a point j\not\in S such that \sum_{i\in S\cup\{j\}}x_i> 1-\varepsilon and \sum_{i\in S^c}x_i> 1-\varepsilon. But this implies that x_j=\sum_{i\in S\cup\{j\}}x_i-1+\sum_{i\in S^c}x_i > 1-2\varepsilon, which contradicts x_j\le\delta when \varepsilon is sufficiently small.

Despite the intuitive nature of the statement, the general result is far from obvious. In particular, it is important to note that the affirmative solution to the Kadison-Singer conjecture requires that the constants \delta,\varepsilon must be independent of d and n. It is not at all obvious a priori that it is possible to achieve this: the scalar case is misleadingly simple, and its proof does not extend to higher dimension (try it!)

So far no probability has entered the picture: we are only asked to establish the existence of one set S for which the above result holds. However, in many problems in analysis and combinatorics, the easiest way to find the existence of a nontrivial object is by selecting such an object randomly and attempting to show that it satisfies the desired properties with positive probability. This approach is known as the probabilistic method. Using this idea, we will presently show that the above deterministic problem can be reduced to a basic probabilistic problem of estimating the norm of certain random matrices.

Let w_1,\ldots,w_n\in\mathbb{C}^d be given. In order to find a suitable set S, let us try to choose this set uniformly at random: that is, S is a random set such that each point i\in\{1,\ldots,n\} is included in S independently with probability one half. Let us now define the random vectors v_1,\ldots,v_n\in\mathbb{C}^{2d} as follows:

    \[v_i = \begin{cases} \left(\begin{array}{c} \sqrt{2}w_i \\ 0_d \end{array}\right) &\text{ if } i\in S, \vspace{0.5em} \\ \left(\begin{array}{c} 0_d \\ \sqrt{2}w_i \end{array}\right) &\text{ if } i\not\in S, \end{cases}\]

where 0_d denotes the zero vector in \mathbb{C}^d. A quick calculation gives

    \[\mathbf{E}[v_iv_i^*] = \left(\begin{array}{cc}         w_iw_i^* & 0 \\ 0  &w_iw_i^* \end{array}\right).\]

We therefore have

    \[\mathbf{E}\sum_{i=1}^nv_iv_i^* = I,\qquad \|v_i\|^2 = 2\|w_i\|^2 \le 2\delta.\]

Moreover, we can compute

    \[\sum_{i=1}^nv_iv_i^* = 2 \left(\begin{array}{cc}         \sum_{i \in S}w_iw_i^* & 0 \\ 0  &\sum_{i \in S^c}w_iw_i^* \end{array}\right),\]

and therefore

    \[\left\|\sum_{i=1}^nv_iv_i^* \right\| =  2\max\left(\left\|\sum_{i \in S}^nw_iw_i^*  \right\|, \left\|\sum_{i \in S^c}^nw_iw_i^*  \right\|\right).\]

It is now immediately clear that in order to prove the deterministic version of the Kadison-Singer conjecture, it suffices to prove the following result about random matrices.

Theorem. Let v_1,\ldots,v_n be independent random vectors in \mathbb{C}^d whose distribution has finite support. Suppose that

    \[\mathbf{E}\sum_{i=1}^nv_iv_i^* = I, \qquad \mathbf{E}\|v_i\|^2 \le \gamma\mbox{ for all }i.\]


    \begin{align*} \mathbf{P}\left(\left\|\sum_{i=1}^nv_iv_i^*\right\| \le 1 + o(1)\right) > 0 \end{align*}

(where o(1)\to 0 as \gamma\to 0 depends only on \gamma).

How might one prove this result? The easiest approach would be to show that

    \[\mathbf{E}\left\|\sum_{i=1}^nv_iv_i^*\right\| \le 1 + o(1),\]

in which case the result would follow immediately (this is called the first moment method). This seems promising, as the literature on random matrices contains numerous bounds on the norm of a random matrix. Unfortunately, this approach cannot work, as the following example illustrates.

Example. Denote by e_1,\ldots,e_d the standard basis in \mathbb{C}^d, and let g_{ij} be i.i.d. standard normal random variables for i=1,\ldots,k and j=1,\ldots,d. Define the random vectors v_1,\ldots,v_{dk} \in \mathbb{C}^d by

    \[v_{(j-1)k + i} := \frac{1}{\sqrt{k}}g_{ij}e_j.\]

That is,

    \[\begin{array}{cccc} v_1 = \frac{1}{\sqrt{k}}g_{11}e_1, &v_{k+1} = \frac{1}{\sqrt{k}}g_{12}e_2, &\cdots &v_{(d-1)k + 1} = \frac{1}{\sqrt{k}}g_{1d}e_d, \\ \vdots & \vdots &  & \vdots \\ v_k = \frac{1}{\sqrt{k}}g_{k1}e_1, &v_{2k} = \frac{1}{\sqrt{k}}g_{k2}e_2, &\cdots &v_{dk} = \frac{1}{\sqrt{k}}g_{kd}e_d. \end{array}\]

Then \mathbf{E}[v_{(j-1)k + i}v_{(j-1)k + i}^*] = \frac{1}{k}e_je_j^*, so \mathbf{E}\sum_{i=1}^{kd}v_iv_i^* = I. Moreover, \mathbf{E}\|v_i\|^2 = 1/k =: \gamma. Thus the assumptions of the above result are satisfied when k is sufficiently large. On the other hand, the matrix \sum_{i=1}^{kd} v_iv_i^* is diagonal with entries j=1,\ldots,d given by \frac{1}{k}\sum_{i=1}^kg_{ij}^2, and we therefore find that

    \[\left\|\sum_{i=1}^{kd} v_iv_i^*\right\| = \max_{j \le d} \frac{1}{k}\sum_{i=1}^kg_{ij}^2 \sim \gamma\log d\]

as d\to\infty (as the maximum of chi-square random variables is of order \log d). Thus the expected matrix norm in this example must depend on the dimension d. We therefore cannot obtain a dimension-free result, as is required to resolve Kadison-Singer, by looking at the expectation of the matrix norm.

Remark. In this example we used Gaussian variables, for which \mathbf{E}\|v_i\|^2\le\gamma but \|v_i\|^2 is not uniformly bounded. In the reduction of Kadison-Singer to the random matrix problem, we had the stronger property that \|v_i\|^2\le\gamma uniformly. However, by using Bernoulli variables instead of Gaussians, we should be able to construct an analogous example using only bounded variables.

Remark. The lovely method due to Ahlswede and Winter makes it possible to show rather easily that \mathbf{E}\left\|\sum_i v_iv_i^*\right\| \lesssim \gamma\log d in the general random matrix setting that we are considering. Thus the above example exhibits the worst case behavior for this problem. However, as noted above, a dimension-dependent bound of this kind does not suffice for our purposes.

Is the result we are trying to prove hopeless? Not at all: bounding the expectation of the matrix norm is only a very crude way to show that the matrix norm is small with positive probability. The expected norm captures the typical behavior of the matrix norm: in the present case, we have seen above that this typical behavior must in general be dimension-dependent. However, all we want to establish is that it is possible that the matrix norm is bounded by a dimension-independent quantity (that is, that this happens with positive probability). Unfortunately, the above example suggests that the latter cannot be a typical event: the bound that we are interested in can only hold, in general, with very small probability that vanishes as we increase the dimension. This means that it is much more difficult to apply the probabilistic method, which usually works well precisely when the phenomenon that we are interested in is not only possible, but in fact typical. The present situation is much more delicate, and results in random matrix theory are not equipped to address it. To surmount this problem, Marcus, Spielman, and Srivastava developed an entirely new approach to bound the norm of a random matrix.

Let us briefly recall the common approaches in random matrix theory for bounding the norm \|X\| of a random matrix X. The key issue is to reduce the random variable \|X\| to a probabilistic structure that we know how to control. The following two approaches are used in much of the literature:

  1. Recall that we can write \|X\|=\sup_{v,w\in B}\langle v,Xw\rangle, where B denotes the unit ball in \mathbb{C}^d. This reduces estimating the norm to estimating the supremum of a family of random variables \langle v,Xw\rangle whose distribution is easily controlled (this is a linear function of the matrix entries). There are many standard probabilistic methods for estimating the supremum of random variables.
  2. Recall that we can write \|X\| = \lim_{p\to\infty}\mathrm{Tr}[X^{2p}]^{1/2p} (the quantity \mathrm{Tr}[X^{2p}]^{1/2p} is the \ell_p-norm of the eigenvalues of X, and the spectral norm is the \ell_\infty norm of the eigenvalues). We can therefore estimate the matrix norm if we can control the moments \mathbf{E}\mathrm{Tr}[X^{2p}]. The latter is the expectation of a polynomial in the matrix entries, and thus the problem reduces to controlling the moments of the matrix entries and counting (this typically involves some combinatorics).

The beauty of these methods is that they reduce the problem of estimating the matrix norm to well-understood probabilistic problems of estimating the maxima or moments of random variables.

There is a third representation of the matrix norm that is taught in every linear algebra course. Suppose that the random matrix X is positive definite (as is the case in our setting). Then the matrix norm \|X\| is the largest eigenvalue of X, and therefore coincides with the maximal root of the characteristic polynomial p_X(z) := \text{det}(z-X) (the roots of p_X are all real as X is Hermitian, so it makes sense to speak of the “maximal” root). That is, we have the representation

    \[\|X\| = \text{maxroot}(p_X).\]

Note that as X is a random matrix, p_X is a random polynomial. Its expectation \mathbf{E}p_X could potentially be a very useful tool in random matrix theory (just as \mathbf{E}\text{Tr}[X^{2p}] was a very useful quantity). Nonetheless, this representation is typically considered to be useless in random matrix theory. The problem is that p\mapsto\text{maxroot}(p) is an extremely nasty functional, and there is no reason for it to behave well under expectations. In essence, the difficulty is that the geometry of roots of polynomials is extremely nonlinear, and the probabilistic behavior of these objects is not well understood.

A major insight of Marcus, Spielman, and Srivastava is that this representation can be extremely useful in certain problems. To exploit it, they use (a variant of) the following remarkable observation.

Lemma. Let p be a random polynomial that takes values in the finite set \{p_1,\ldots,p_n\}. Suppose that p_1,\ldots,p_n all have the same degree and are monic (the leading coefficient is 1), and that all convex combinations of p_1,\ldots,p_n have only real roots. Then \mathbf{P}\left[\text{maxroot}(p) \le \text{maxroot}(\mathbf{E}p)\right] > 0.

This statement is not at all obvious! In particular, it is not true without the assumption on the locations of the roots. To use it, we must show that the random polynomials that arise in the random matrix problem satisfy the required assumption, and we must also find a way to control the maximal root of \mathbf{E}p_X. We can now see why roots of polynomials play a key role throughout the proof, both in its statement (the norm is represented as a maximal root) and in its guts (the locations of the roots are essential to obtain results such as the above lemma). The development of these ideas will require us to develop tools that marry the geometry of roots of polynomials with the probabilistic problems that we are trying to address. These methods will be developed in detail in the following lectures.

Determinantal processes

Let us now turn to an entirely different application of roots of polynomials in probability theory. We first recall some basic definitions.

A point process on a space E is a random finite or countable subset of E. For example, E might be \{1,\ldots,n\}, \mathbb{R}, or \mathbb{C}. Some interesting examples of point processes include:

  • The eigenvalues of Gaussian random matrices;
  • The roots of an infinite power series with i.i.d. Gaussian coefficients;
  • Given a (finite) graph (V,E), choosing a spanning tree uniformly at random defines a random subset of the edges E;
  • (Karlin-McGregor) Pick n different starting points in \mathbb{Z}, and run n independent random walks starting from these points conditionally on the event that their paths never cross. Then the positions of the n random walks at time n is a point process.

Remarkably, all of these apparently quite unrelated point processes turn out to share a very rich common structure: they are examples of determinantal point processes. For simplicity, let us define this notion only in the discrete case where E=\{1,\ldots,n\}.

Definition. A random subset X of \{1,\ldots,n\} is called a determinantal point process if

    \[\mathbf{P}(S \subseteq X) = \text{det}(A|_S)\]

for every S \subseteq \{1,\ldots,n\}, where A is a given n \times n matrix called the kernel of the point process, and A|_S is the restriction of A to the rows and columns in S.

Determinantal point processes have many interesting properties. For the time being, let us mention one example: the property of negative dependence

    \[\mathbf{P}(i \in X,  j \in X) \le \mathbf{P}(i \in X)\mathbf{P}(j \in X)\]

for all i\ne j, i,j\in\{1,\ldots,n\}. We will (hopefully) encounter more interesting examples later on.

A basic tool for studying determinantal point processes is the generating function

    \[g(z) = \mathbf{E}\left[\prod_{i \in X}z_i\right] =: \mathbf{E}[z^X],\qquad z\in\mathbb{C}^n.\]

Note that g is a multivariate polynomial in n (complex) variables. As an illustration of the utility of the generating function, let us use it to express the property of negative dependence. It is easy to check that

    \[\mathbf{P}(i \in X) = \frac{\partial g}{\partial z_i}(1)\]


    \[\mathbf{P}(i \in X,  j \in X) = \frac{\partial^2 g}{\partial z_i \partial z_j}(1),\]

where 1 denotes the vector of ones. Hence, negative dependence is equivalent to the polynomial identity

    \[\frac{\partial^2 g}{\partial z_i \partial z_j}(1) \le \frac{\partial g}{\partial z_i}(1)\frac{\partial g}{\partial z_j}(1).\]

Clearly only very special kinds of polynomials can satisfy such a property. Why should the generating functions of determinantal point processes be of this kind? Some insight can be gained from the following result of Brändén.

Theorem. The following are equivalent.

  1. For all z \in \mathbb{R}^n and 1\le i,j\le n, we have

        \begin{align*} g(z)\frac{\partial^2 g}{\partial z_i \partial z_j}(z) \le \frac{\partial g}{\partial z_i}(z)\frac{\partial g}{\partial z_j}(z). \end{align*}

  2. g has no roots in \{z \in \mathbb{C}^n : \text{Im}(z_i) > 0 \text{ for all }1\le i\le n\}.

If either of the equivalent conditions of this theorem hold, we say that g is real stable. Note that g(1) = 1, and thus we recover the negative dependence relationship if the generating function g is real stable.

While it was initially far from clear why we would care about the roots of polynomials in this setting, we now see that the locations of the roots of the generating function g are apparently intimately related with the negative dependence properties of point processes. The geometry of the roots of polynomials enters in an essential manner in results of this kind. It is not difficult to see that determinantal point processes must have real stable generating functions under some mild assumptions (very roughly speaking, one can compute g(z)=\text{det}(A)\text{det}(A^{-1}-I-\text{diag}(z)) in terms of the kernel A of the determinantal point process; thus if 0\preceq A\preceq I, the generating function is the characteristic polynomial of a positive definite matrix, and thus has no roots in the upper quadrant of \mathbb{C}^n). Such ideas therefore provide a powerful tool to investigate the properties of determinantal point processes.

The simple negative dependence property described above could have been obtained in a more elementary fashion. However, the polynomial method makes it possible to prove more delicate properties, such as stronger negative association properties or concentration of measure properties of determinantal point processes. If time permits, we will try to cover some of these topics at the end of the semester. While these questions are quite distinct from the Kadison-Singer problem, it is remarkable that many of the same tools play a central role in both cases.

Many thanks to Dan Lacker for scribing this lecture!

23. September 2014 by Ramon van Handel
Categories: Roots of polynomials | Comments Off on Lecture 1. Introduction