Lecture 1. Introduction
The goal of our seminar this semester is to try to understand the unexpected application of the geometry of roots of polynomials to address probabilistic problems. Such methods have recently led to the resolution of several important questions, and could potentially be very useful in many other problems as well. In the following lectures, we will work in detail through some recent problems in which the geometry of polynomials arises in a crucial manner in the proof.
Kadison-Singer, random matrices, and polynomials
In 1959, Kadison and Singer made a fundamental conjecture about the extension of pure states on algebras. While the original conjecture was purely qualitative, it has since been reformulated in many different ways. These different formulations prove to have applications in various areas: see the nice survey by Cassaza and Tremain. After many decades, the Kadison-Singer conjecture was finally proved in a paper by Marcus, Spielman, and Srivastava. They prove a quantitative formulation of the conjecture, due to Weaver, that can be stated as an elementary (deterministic!) property of vectors in .
Theorem (formerly Conjecture) There exist universal constants such that, if and are any positive integers and are vectors such that
then we can partition into disjoint subsets and such that
where denotes the spectral norm of the matrix .
This result appears at first sight to be almost obvious: in the scalar case , the corresponding statement is completely elementary. To see this, let satisfy and for all . The only way that we can fail to find a subset such that and is when the sizes of the are very imbalanced (e.g., if and ), and this is ruled out by the assumption . It seems natural to expect that a similar conclusion follows in any dimension .
Remark. To make this argument precise, let us argue by contradiction. Suppose that for every set , we have either or . The first possibility occurs when and the second possibility occurs when . By adding points one by one, we see that there must exist a set and a point such that and . But this implies that , which contradicts when is sufficiently small.
Despite the intuitive nature of the statement, the general result is far from obvious. In particular, it is important to note that the affirmative solution to the Kadison-Singer conjecture requires that the constants must be independent of and . It is not at all obvious a priori that it is possible to achieve this: the scalar case is misleadingly simple, and its proof does not extend to higher dimension (try it!)
So far no probability has entered the picture: we are only asked to establish the existence of one set for which the above result holds. However, in many problems in analysis and combinatorics, the easiest way to find the existence of a nontrivial object is by selecting such an object randomly and attempting to show that it satisfies the desired properties with positive probability. This approach is known as the probabilistic method. Using this idea, we will presently show that the above deterministic problem can be reduced to a basic probabilistic problem of estimating the norm of certain random matrices.
Let be given. In order to find a suitable set , let us try to choose this set uniformly at random: that is, is a random set such that each point is included in independently with probability one half. Let us now define the random vectors as follows:
where denotes the zero vector in . A quick calculation gives
We therefore have
Moreover, we can compute
It is now immediately clear that in order to prove the deterministic version of the Kadison-Singer conjecture, it suffices to prove the following result about random matrices.
Theorem. Let be independent random vectors in whose distribution has finite support. Suppose that
(where as depends only on ).
How might one prove this result? The easiest approach would be to show that
in which case the result would follow immediately (this is called the first moment method). This seems promising, as the literature on random matrices contains numerous bounds on the norm of a random matrix. Unfortunately, this approach cannot work, as the following example illustrates.
Example. Denote by the standard basis in , and let be i.i.d. standard normal random variables for and . Define the random vectors by
Then , so . Moreover, . Thus the assumptions of the above result are satisfied when is sufficiently large. On the other hand, the matrix is diagonal with entries given by , and we therefore find that
as (as the maximum of chi-square random variables is of order ). Thus the expected matrix norm in this example must depend on the dimension . We therefore cannot obtain a dimension-free result, as is required to resolve Kadison-Singer, by looking at the expectation of the matrix norm.
Remark. In this example we used Gaussian variables, for which but is not uniformly bounded. In the reduction of Kadison-Singer to the random matrix problem, we had the stronger property that uniformly. However, by using Bernoulli variables instead of Gaussians, we should be able to construct an analogous example using only bounded variables.
Remark. The lovely method due to Ahlswede and Winter makes it possible to show rather easily that in the general random matrix setting that we are considering. Thus the above example exhibits the worst case behavior for this problem. However, as noted above, a dimension-dependent bound of this kind does not suffice for our purposes.
Is the result we are trying to prove hopeless? Not at all: bounding the expectation of the matrix norm is only a very crude way to show that the matrix norm is small with positive probability. The expected norm captures the typical behavior of the matrix norm: in the present case, we have seen above that this typical behavior must in general be dimension-dependent. However, all we want to establish is that it is possible that the matrix norm is bounded by a dimension-independent quantity (that is, that this happens with positive probability). Unfortunately, the above example suggests that the latter cannot be a typical event: the bound that we are interested in can only hold, in general, with very small probability that vanishes as we increase the dimension. This means that it is much more difficult to apply the probabilistic method, which usually works well precisely when the phenomenon that we are interested in is not only possible, but in fact typical. The present situation is much more delicate, and results in random matrix theory are not equipped to address it. To surmount this problem, Marcus, Spielman, and Srivastava developed an entirely new approach to bound the norm of a random matrix.
Let us briefly recall the common approaches in random matrix theory for bounding the norm of a random matrix . The key issue is to reduce the random variable to a probabilistic structure that we know how to control. The following two approaches are used in much of the literature:
- Recall that we can write , where denotes the unit ball in . This reduces estimating the norm to estimating the supremum of a family of random variables whose distribution is easily controlled (this is a linear function of the matrix entries). There are many standard probabilistic methods for estimating the supremum of random variables.
- Recall that we can write (the quantity is the -norm of the eigenvalues of , and the spectral norm is the norm of the eigenvalues). We can therefore estimate the matrix norm if we can control the moments . The latter is the expectation of a polynomial in the matrix entries, and thus the problem reduces to controlling the moments of the matrix entries and counting (this typically involves some combinatorics).
The beauty of these methods is that they reduce the problem of estimating the matrix norm to well-understood probabilistic problems of estimating the maxima or moments of random variables.
There is a third representation of the matrix norm that is taught in every linear algebra course. Suppose that the random matrix is positive definite (as is the case in our setting). Then the matrix norm is the largest eigenvalue of , and therefore coincides with the maximal root of the characteristic polynomial (the roots of are all real as is Hermitian, so it makes sense to speak of the “maximal” root). That is, we have the representation
Note that as is a random matrix, is a random polynomial. Its expectation could potentially be a very useful tool in random matrix theory (just as was a very useful quantity). Nonetheless, this representation is typically considered to be useless in random matrix theory. The problem is that is an extremely nasty functional, and there is no reason for it to behave well under expectations. In essence, the difficulty is that the geometry of roots of polynomials is extremely nonlinear, and the probabilistic behavior of these objects is not well understood.
A major insight of Marcus, Spielman, and Srivastava is that this representation can be extremely useful in certain problems. To exploit it, they use (a variant of) the following remarkable observation.
Lemma. Let be a random polynomial that takes values in the finite set . Suppose that all have the same degree and are monic (the leading coefficient is ), and that all convex combinations of have only real roots. Then
This statement is not at all obvious! In particular, it is not true without the assumption on the locations of the roots. To use it, we must show that the random polynomials that arise in the random matrix problem satisfy the required assumption, and we must also find a way to control the maximal root of . We can now see why roots of polynomials play a key role throughout the proof, both in its statement (the norm is represented as a maximal root) and in its guts (the locations of the roots are essential to obtain results such as the above lemma). The development of these ideas will require us to develop tools that marry the geometry of roots of polynomials with the probabilistic problems that we are trying to address. These methods will be developed in detail in the following lectures.
Let us now turn to an entirely different application of roots of polynomials in probability theory. We first recall some basic definitions.
A point process on a space is a random finite or countable subset of . For example, might be , , or . Some interesting examples of point processes include:
- The eigenvalues of Gaussian random matrices;
- The roots of an infinite power series with i.i.d. Gaussian coefficients;
- Given a (finite) graph , choosing a spanning tree uniformly at random defines a random subset of the edges ;
- (Karlin-McGregor) Pick different starting points in , and run independent random walks starting from these points conditionally on the event that their paths never cross. Then the positions of the random walks at time is a point process.
Remarkably, all of these apparently quite unrelated point processes turn out to share a very rich common structure: they are examples of determinantal point processes. For simplicity, let us define this notion only in the discrete case where .
Definition. A random subset of is called a determinantal point process if
for every , where is a given matrix called the kernel of the point process, and is the restriction of to the rows and columns in .
Determinantal point processes have many interesting properties. For the time being, let us mention one example: the property of negative dependence
for all , . We will (hopefully) encounter more interesting examples later on.
A basic tool for studying determinantal point processes is the generating function
Note that is a multivariate polynomial in (complex) variables. As an illustration of the utility of the generating function, let us use it to express the property of negative dependence. It is easy to check that
where denotes the vector of ones. Hence, negative dependence is equivalent to the polynomial identity
Clearly only very special kinds of polynomials can satisfy such a property. Why should the generating functions of determinantal point processes be of this kind? Some insight can be gained from the following result of Brändén.
Theorem. The following are equivalent.
- For all and , we have
- has no roots in .
If either of the equivalent conditions of this theorem hold, we say that is real stable. Note that , and thus we recover the negative dependence relationship if the generating function is real stable.
While it was initially far from clear why we would care about the roots of polynomials in this setting, we now see that the locations of the roots of the generating function are apparently intimately related with the negative dependence properties of point processes. The geometry of the roots of polynomials enters in an essential manner in results of this kind. It is not difficult to see that determinantal point processes must have real stable generating functions under some mild assumptions (very roughly speaking, one can compute in terms of the kernel of the determinantal point process; thus if , the generating function is the characteristic polynomial of a positive definite matrix, and thus has no roots in the upper quadrant of ). Such ideas therefore provide a powerful tool to investigate the properties of determinantal point processes.
The simple negative dependence property described above could have been obtained in a more elementary fashion. However, the polynomial method makes it possible to prove more delicate properties, such as stronger negative association properties or concentration of measure properties of determinantal point processes. If time permits, we will try to cover some of these topics at the end of the semester. While these questions are quite distinct from the Kadison-Singer problem, it is remarkable that many of the same tools play a central role in both cases.
Many thanks to Dan Lacker for scribing this lecture!