Next lecture: March 24
There will be no lecture next week (March 17) due to spring break. Lectures will resume the following week on Thursday, March 24, at the usual time and place.
Next lecture: March 3
There will be no lecture on February 25 (as I am out of town). The stochastic analysis seminar will resume the following week, on March 3, at the usual time and place.
Spring 2016: Optimal transportation
Welcome back to the stochastic analysis seminar! We resume our informal lectures this semester on the topic of optimal transportation.
Optimal transportation is the problem of coupling probability measures in an optimal manner (in the sense that a certain expected cost functional is minimized.) The rich theory that arises from this problem has had far-reaching consequences in probability theory, geometry, PDEs, interacting particle systems, etc. The aim of these informal lectures is to introduce some of the basic ingredients of this theory and to develop some probabilistic applications. Potential topics include: Wasserstein distances and Kantorovich duality; the Brenier map; the Monge-Ampere equation; displacement interpolation; Caffarelli’s theorem; applications to functional inequalities, measure concentration, dissipative PDEs, and/or interacting particle systems.
Time and location: Thursdays, 4:30-6:00 PM, Bendheim Center classroom 103 (note different location than last year). The first lecture will be on February 4.
Schedule changes and any other announcements will be posted on this blog. Unfortunately, we will probably not have the resources to post lecture notes this semester as was done in previous semesters.
References:
- C. Villani, “Topics in Optimal Transportation“, AMS (2003).
- L. Ambrosio and N. Gigli, “A User’s Guide to Optimal Transport“, lecture notes.
- K. Ball, “An Elementary Introduction to Monotone Transportation“, lecture notes.
- D. Cordero-Erausquin, “Some Applications of Mass Transport to Gaussian-Type Inequalities“, Arch. Rational Mech. Anal. 161 (2002) 257–269.
- D. Bakry, I. Gentil, and M. Ledoux, “Analysis and Geometry of Markov Diffusion Operators“, Springer (2014), Chapter 9.
Stochastic Analysis Seminar AY 2015/16
We will not have a stochastic analysis seminar in Fall 2015/16. However, the probability seminar will continue as in the previous semester with a fantastic lineup of speakers. See the probability seminar schedule, and go to this page to sign up for the probability seminar mailing list.
We are currently planning to resume the stochastic analysis seminar in the Spring semester (tentatively on the topic of optimal transportation).
Lecture 5. Proof of Kadison-Singer (3)
This is the last installment of the proof of the Kadison-Singer theorem. After several plot twists, we have finally arrived at the following formulation of the problem. If you do not recall how we got here, this might be a good time to go back and read the previous posts.
Theorem 1. Let
be positive semidefinite matrices in
such that
Define the multivariate polynomial
Then
The entire goal of this lecture is to prove the above theorem.
Let us recall that we already proved such a result without the derivatives (this is almost trivial).
Lemma 1.
whenever
.
The key challenge is to understand what happens to the region without roots under operations of the form . To this end, we introduced in the previous lecture the barrier function
It is easily seen that the roots of coincide with the points where the barrier function equals one. As the barrier function is easily bounded in our setting using the Jacobi formula for determinants, we can immediately bound the roots of
.
Unfortunately, this simple argument is only sufficient to apply the operation once, while we must apply
such operations. The difficulty is that while it is easy to control the barrier function of the polynomial
, we do not know how to control directly the barrier function of the polynomial
. To solve this problem, we must develop a more sophisticated version of the barrier argument that provides control not only of the roots of the derivative polynomial, but also of its barrier function: once we accomplish this, we will be able to iterate this argument to complete the proof.
The multivariate barrier argument
In the multivariate barrier argument, we will keep track of an octant in which the multivariate polynomial of interest has no roots. We will use the following terminology.
Definition. A point
is said to bound the roots of the multivariate polynomial
if
whenever
,
,
,
.
This notion is illustrated in the following figure:
We can now formulate the multivariate barrier argument that is at the heart of the proof.
Theorem 2. Let
be a real stable polynomial. Suppose that
bounds the roots of
, and
for some
and
. Then
bounds the roots of
, and
for all
.
Note that the barrier function assumption
would already be enough to ensure that bounds the roots of
; this is essentially what we proved in the last lecture (combined with a basic monotonicity property of the barrier function that we will prove below). The key innovation in Theorem 2 is that we do not only bound the roots of
, but we also control its barrier function
. This allows us to iterate this theorem over and over again to add more derivatives to the polynomial. To engineer this property, we must build some extra room (a gap of
) into our bound on the barrier function of
. Once we understand the proof, we will see that this idea arises very naturally.
Up to the proof of Theorem 2, we now have everything we need to prove Theorem 1 (and therefore the Kadison-Singer theorem). Let us complete this proof first, so that we can concentrate for the remainder of this lecture on proving Theorem 2.
Proof of Theorem 1. In the previous lecture, we used the Jacobi formula to show that
for all . To start the barrier argument, let us therefore choose
and
such that
Initially, by Lemma 1, we see that bounds the roots of
and that the barrier function satisfies the assumption of Theorem 2. That is, we start in the following situation:
Applying Theorem 2, we find that still bounds the roots of
. Moreover, we have control over the barrier function of this derivative polynomial, however at a point that lies above
:
for all . This is illustrated in the following figure:
We now have everything we need to apply Theorem 2 again to the polynomial (recall that
and its derivative polynomials are all real stable, as we proved early on in these lectures). In the next step, we control the barrier in the
-direction to obtain, in a picture:
We can now repeat this process in the -direction, etc. After
iterations, we have evidently proved that
All that remains is to choose and
. To optimize the bound, we minimize
subject to the constraint
. This is achieved by
and
, which completes the proof.
Of course, it remains to prove Theorem 2. It turns out that this is quite easy, once we develop some (nontrivial!) properties of the barrier functions of real-stable polynomials.
Some properties of barrier functions
Throughout this section, let be a real stable polynomial. As
is real stable, the univariate polynomial
is real-rooted, and we denote its roots by
We can then represent the polynomial as
and therefore the barrier function as
Some simple properties of the barrier function follow easily from this expression.
Lemma 2. Suppose that
bounds the roots of
. Then the barrier function
is positive, decreasing and convex for
.
Proof. As bounds the roots of
, we have
for all
. Thus clearly the barrier function is positive. To show that it is decreasing, we note that
Likewise, convexity follows as
when bounds the roots of
.
The main property that we need in order to prove Theorem 2 is that these monotonicity and convexity properties of the barrier function also hold when we vary other coordinates. This seems innocuous, but is actually much harder to prove (and requires a clever idea!)
Lemma 3. Suppose that
bounds the roots of
. Then the barrier function
is positive, decreasing and convex for
.
Remark. There is of course nothing special about the use (for notational simplicity) of the first two coordinates in Lemmas 2 and 3. We can identically consider the barrier function in any direction
and obtain monotonicity and convexity in any other direction
, as will be needed in the proof of Theorem 2. In fact, as the remaining coordinates
are frozen, Lemmas 2 and 3 are really just statements about the properties of bivariate real stable polynomials.
The difficulty in proving Lemma 3 is that while behaves very nicely as a function of
, it is much less clear how it behaves as a function of
: to understand this, we must understand how the roots
vary as a function of
. In general, this might seem like a hopeless task. Surprisingly, however, the roots of real stable polynomials exhibit some remarkable behavior.
Lemma 4. Let
be a bivariate real-stable polynomial, and let
be the roots of
. Then each
is a decreasing function.
This is enough to prove Lemma 3.
Proof of Lemma 3. That the barrier function is positive follows precisely as in the proof of Lemma 2 (using the fact that if bounds the roots of
, then
also bounds the roots of
).
It remains to establish monotonicity and convexity. There is no loss of generality in assuming that is a bivariate polynomial in
only, and that we can write
for roots . We can therefore write
But as bounds the roots of
, we have
for all
. As
is also decreasing, we have
which is precisely what we set out to show.
We must still prove Lemma 4. This seems to be quite a miracle: why should such a property be true? To get some intuition, let us first consider an apparently very special case where is a polynomial of degree one, that is,
. The root of the polynomial
is clearly given by
Suppose that is nondecreasing, that is, that
and
have opposite sign. Then
cannot be real stable! Indeed, for any real root
, the point
is also a root, which violates real stability if
and
have opposite sign (as both coordinates have strictly positive or negative imaginary parts). Thus for polynomials of degree one, real stability trivially implies that
is decreasing.
While this is very intuitive, it also seems at first sight like it does not help much in understanding nontrivial polynomials of higher degree. Nonetheless, this simple observation proves to be the key to understanding the case of general polynomials. This idea is that the property that is decreasing is local. If there is a point at which this property is violated, then we can Taylor expand the polynomial around that point to reduce to the degree one case, and thereby obtain a contradiction.
Proof of Lemma 4. By the implicit function theorem, the maps are continuous and
everywhere except at a finite number of points. Therefore, if the conclusion fails, then there must exist a root
and a (nondegenerate) point
such that
We will use this to bring ourselves to a contradiction.
Let us write , so
is a root of
. We Taylor expand
around this point. Note that
so that
where we have used that . Similarly, we obtain
As , we obtain the Taylor expansion
for a suitable constant and where
.
We now conclude by a perturbation argument. Define the univariate polynomials
Letting , we obtain
, which evidently has a root
with strictly positive real part. By the continuity of roots of polynomials,
must still have a root with strictly positive real part when
is sufficiently small (this follows readily using Rouché’s theorem). But this implies that
has a root with
and
, contradicting real stability.
Conclusion of the proof
All that remains is to finally prove Theorem 2. The hard work is behind us: with the monotonicity and convexity properties of Lemmas 2 and 3 in hand, the proof of Theorem 2 is straightforward.
Proof of Theorem 2. As the barrier function is coordintewise decreasing, the assumption
implies that
It follows immediately that bounds the roots of
.
Let us now control the barrier function of . As
is positive above the roots of
,
We can therefore write the barrier function as
Note that as the barrier function is decreasing, the numerator in this expression is positive. Moreover, by the assumption of Theorem 2 and monotonicity of the barrier function, we have whenever
. We can therefore estimate
where we have used convexity of the barrier function (that is, we used the first-order condition for the convexity of a function
). And Kadison-Singer is now proved.
Epilogue: our presentation of the proof of the Kadison-Singer theorem has largely followed the approach from the blog post by T. Tao, which simplifies some of the arguments in the original paper by A. W. Marcus, D. A. Spielman, and N. Srivastava. Both are very much worth reading!
Lecture 4. Proof of Kadison-Singer (2)
Recall that we are in the middle of proving the following result.
Theorem. Let
be independent random vectors in
whose distribution has finite support. Suppose that
Then
with positive probability.
Define the random matrix and its characteristic polynomial
as
As is positive definite, one representation of the matrix norm is
. The proof of the above theorem consists of two main parts:
- Show that
with positive probability.
- Show that
.
The first statement was proved in the previous two lectures. The goal of this lecture and the next is to prove the second statement.
Mixed characteristic polynomials
The key tool that was used in the previous lectures in the mixed characteristic polynomial of the matrices , which we defined as follows:
We recall the two crucial observations used in the previous lectures:
-
is multilinear in
.
- If
have rank one (this is essential!), then
.
As in our case are independent random matrices of rank one, this implies
where we define for simplicity .
Note that the matrices are no longer of rank one (in particular, it is not true that
is the characteristic polynomial of
: that would make the remainder of the proof quite trivial!) However, by assumption, the matrices
satisfy
To complete the proof, our aim will be to show that for any matrices with these properties, the maximal root of
can be at most
.
Theorem. Let
be positive semidefinite matrices in
. Suppose that
Then
As the matrices sum to the identity
, it will be convenient to rewrite the definition of the mixed characteristic polynomial slightly using this property:
where we defined the multivariate polynomial
Our aim is to show that whenever
.
How should we go about proving such a property? The first observation we should make is that the polynomial itself has no roots in an even larger region.
Lemma.
whenever
.
Proof. Note that
as and
. Thus
is nonsingular whenever
.
One might now hope that we can follow a similar idea to what we did in the previous lectures. In the previous lectures, we wanted to show that the mixed characteristic polynomial is real-rooted. To this end, we first showed that the determinant in the definition of the mixed characteristic polynomial is real stable (the appropriate multivariate generalization or real-rootedness), and then we showed that this property is preserved when we take derivatives of the form .
In an ideal world, we would show precisely the same thing here: if the polynomial has no roots in the region
, then perhaps
has no roots there as well, etc. If this were true, then we would in fact have shown that
has no roots in the region
. Unfortunately, this is clearly false: that would imply an upper bound of zero for the matrix norm in our original problem (impossible in general!) Nonetheless, this general strategy proves to be the right approach. It is indeed not true that the region with no roots is preserved by the operation
. However, it turns out that we will be able to control by how much this region shrinks in successive applications of
. The control that we will develop will prove to be sufficiently sharp in order to obtain the desired result.
Evidently, the key problem that we face is to understand what happens to the roots of a polynomial when we apply an operation of the form . In the remainder of this lecture, we will investigate this problem in a univariate toy setting. This univariate approach will not be sufficiently sharp to yield the desired result, but will give us significant insight into the ingredients that are needed in the proof. In the following lecture, we will complete the proof by developing a multivariate version of these ideas.
Barrier functions: a toy problem
Let us put aside for the moment our original problem, and consider the following simplified setting. Let be a univariate polynomial with real roots
. What can we say about the locations of the roots of the polynomial
? Clearly, the latter polynomial has a root at
if and only if
The function is very interesting; let us investigate what it looks like. Note that we can always represent a polynomial in terms of its roots as
for some constant . We can therefore readily compute
As we assumed that the roots of
are real, the function
looks something like this:
The function blows up at the roots
of
; this function is therefore referred to as the barrier function, in analogy with a similar notion in optimization. The values where it is equal to one determine the locations of the roots
of
. It follows immediately from the shape of the barrier function that the roots of our two polynomials are interlaced, as can be seen by inspecting the above figure. Note, moreover, that
, so that it is unfortunately the case that the maximal root of a polynomial increases under the operation
. However, we can control the location of the maximal root if we are able to control the barrier function.
Let us now return to our matrix problem and try to see how this idea can be useful. We will consider the first step towards establishing the desired result: what happens to the locations of the roots of the polynomial when we apply a single operation
? (We will ultimately want to iterate such an argument for every coordinate to control the roots of the mixed characteristic polynomial.)
To this end, let us fix for the time being, and define
We have already shown that . In order to control the roots of
, we need to control the barrier function
. The stroke of luck that we have at this point is that the derivative of the logarithm of a determinant is a remarkably nice object.
Lemma (Jacobi formula). If
is invertible, then
.
Proof. First, we note that
It therefore suffices to prove that
To this end, we use directly the definition of the determinant:
where the sum is over permutations . Setting
, we see that the only term that survives is the one that satisfies
for all
, that is, the identity permutation. We therefore obtain (as
)
This completes the proof.
Using the Jacobi formula, we immediately find
It therefore follows that
This brings us one step closer to the desired conclusion!
We have seen above that we can control the roots of and
. Of course, the next step will be to control the roots of
. Unfortunately, a direct application of the barrier function method does not lend itself well to iteration. The problem is that while we could easily control the barrier function of
using the Jacobi formula, it is not so obvious how to control the barrier function of
.
Instead, we are going to develop in the following lecture a multivariate version of the barrier argument. In each consecutive application of , we will control the region of
in which the polynomial has no roots; at the same time, we will also obtain control over the barrier function of the polynomial in the next stage. When implemented properly, this procedure will allow us to iterate the barrier argument without requiring explicit control of the barrier function except at the first stage.
Many thanks to Mark Cerenzia for scribing this lecture!
Next lecture: October 16
There will be no lecture on October 9 as several of us are out of town. The stochastic analysis seminar will resume on October 16 (the next week) at the usual time and place.
Lectures 2 and 3. Proof of Kadison-Singer (1)
The goal of the following lectures is to prove the following result about random matrices due to Marcus, Spielman, and Srivastava. This result readily implies a positive resolution of the Kadison-Singer conjecture, as was explained in detail in the previous lecture.
Theorem. Let
be independent random vectors in
whose distribution has finite support. Suppose that
Then
with positive probability.
Define the random matrix and its characteristic polynomial
as
As is positive definite, one representation of the matrix norm is
. The proof of the above theorem consists of two main parts:
- Show that
with positive probability.
- Show that
.
The present post is devoted to the proof of the first statement. The second statement will be the subject of the subsequent lectures.
Interlacing roots of polynomials
How can we prove an identity such as ? The key issue that we face is that the
functional is extremely nonlinear, and it is not at all clear how it behaves when you take expectations. This problem already arises in the very simplest setting: what can we say about the locations of the roots of a convex combination
of two polynomials
and
? In general, there is little hope to say anything concrete. However, if the polynomials
and
have some special properties, then we are in much better shape.
Lemma. Let
and
be polynomials that are monic (their leading coefficient is one) and of the same degree. Suppose that the polynomial
is real-rooted (all its roots are real) for every
. Then
for every
.
The message of this lemma is that under the given assumptions, the maximal root of the convex combination interlaces between the maximal roots of
and
(in fact, a similar property holds for all other roots as well; but we will not need this fact.)
Proof. Let us assume without loss of generality that , so we must prove
Proof of upper bound. Because the polynomials are monic, and
are strictly increasing for real
to the right of their maximal roots. Thus
and
are both strictly positive for
. But this means that
is also strictly positive for
, and thus cannot have a root in this region. This implies that
Proof of lower bound. Of course, this is the interesting part. By the same argument as the upper bound, it is clear that if
. We can therefore assume without loss of generality that
.
We distinguish two cases. Note that for
. Suppose first that
for all
. Then
and
must look something like this:
This means that for any , we have
at
and
at
. Thus
must have a root between
and
by the intermediate value theorem. In particular,
Now suppose that for some
, and let
be the second largest root of
. Then
and
must look something like this:
(Note that it could be that , or that
is a local maximum; then the picture looks slightly different, but this will not change our argument.) We now argue as follows. The polynomial
has no roots in the interval
, but the polynomial
has at least two roots in this interval (counting multiplicities). Thus as we let
vary from
to
, the number of roots of the polynomial
must jump at some point
from having at least two roots to having zero roots in the interval
. The polynomial
must therefore look like this:
This means that for any sufficiently small, the polynomial
loses two real roots in the interval
. As long as these roots do not reappear elsewhere on the real line, this implies that this polynomial has less real roots than its degree: and, therefore, it must have at least two roots in the complex plane away from the real line. This contradicts the assumption that all convex combinations of
and
are real-rooted.
Thus it only remains to argue what happens if the two roots that are lost when we go from to
reappear elsewhere on the real line. Note that for new roots to appear, a local maximum that was strictly below
(or a local minimum that was strictly above
) when
must have moved above (or below)
when
. As the graph of a polynomial is continuous in its coefficients, we can always avoid this by choosing
to be sufficiently small.
Once we know how to deal with two polynomials, we can easily generalize as follows.
Corollary. Let
be monic polynomials of the same degree. Suppose that all convex combinations of
are real-rooted. Then
for all convex weights
,
.
Proof by induction. assume the result is true for a given number of polynomials ; then the result is also true for
polynomials by applying the previous lemma.
Mixed characteristic polynomials
In order to apply the interlacing property developed above to the Kadison-Singer problem, we must show that the characteristic polynomial satisfies the necessary assumptions. Recall that the random matrix
has a distribution with finite support; that is,
is a random variable that can take one of
possible values
. Our aim is to show that there exists
such that
where the expectation is clearly a convex combination of the polynomials
. In order to apply the interlacing property, we must show not only that each of these polynomials is real-rooted (this is obvious, as
are Hermitian matrices), but also that convex combinations of these polynomials are real-rooted. The latter is far from obvious: we must reason about the properties of convex combinations of determinants, such as
. Determinants are nonlinear objects, and so it is not at all clear how to address such questions.
The stroke of luck in the Kadison-Singer problem is that the the matrix is a sum of rank-one matrices. Even though the determinant is nonlinear, it is linear with respect to rank-one perturbations. This crucial fact will allow us to express convex combinations of the determinants that appear in our problem in terms of another determinant, for which it will prove to be easier to establish real-rootedness.
Lemma. (Sylvester’s determinant identity)
.
Proof. Let be nonsingular square matrices. Then
By continuity, this remains valid for singular matrices . The result follows by letting
be the matrix whose first column is
and
be the matrix whose first column is
(all other entries are zero).
Corollary.
is linear in
.
Proof. If is nonsingular, we can write
which is linear in . As the limit of linear functions is linear, the result remains valid for singular
.
What we would really like is that for a random vector
: that would make it very easy to handle convex combinations of characteristic polynomials. Unfortunately, this is wishful thinking: the matrix
is not of rank one, so this does not follow from the linearity of rank-one perturbations. Nonetheless, this is not such a bad idea. We can use the linearity of rank-one perturbations to rewrite the characteristic polynomial of a sum of rank-one matrices in terms of a different quantity, which proves to be linear even beyond the rank-one setting.
To this end, let us define the multivariate polynomial
Then . Moreover, we have just shown that
is multilinear in
for every
. Using that a linear function equals its first-order Taylor expansion, we can write
Iterating this device, we obtain
Setting , we obtain the identity
The quantity on the right is called a mixed characteristic polynomial.
Definition. The mixed characteristic polynomial of square matrices
is defined as
Note that if all are rank one, we have shown that
That is, the mixed characteristic polynomial of coincides with the characteristic polynomial of
when all
are rank one. This is no longer true when the matrices
are not of rank one! Nonetheless, the mixed characteristic polynomial is an extremely useful device for the following reason.
Lemma.
is multilinear in
.
Proof. The determinant is a polynomial in the entries of the matrices
, and each term in the product
is a product of derivatives with respect to distinct variables
. In order for such a derivative of one of the terms of the polynomial
to be nonzero after we set
, this term can contain each
at most once: otherwise this term must be proportional to
for some
, so the derivative of this term will vanish when we set
. Thus the only terms that survive in the mixed characteristic polynomials are products of the entries of the matrices
, where each
appears at most once. Such terms are clearly multilinear in
.
A simple consequence of this result is, for example, that
where we have used multilinearity and the independence of the vectors . Let us emphasize that the right-hand side in this expression cannot itself be expressed as the characteristic polynomial of
, for example. Nonetheless, we have achieved something very useful: we have expressed a convex combination of determinants in terms of the derivative of a determinant. It turns out that it is much easier to establish real-rootedness of derivatives of determinants, than it is to establish real-rootedness of convex combinations of determinants. Our next order of business is to understand why this is true.
Multivariate polynomials and real stability
Given positive semidefinite matrices , let us define the multivariate polynomial
The mixed characteristic polynomial is obtained from
by taking derivatives, followed by fixing
. We are going to prove real-rootedness of
by showing that the real-rootedness property is preserved by such operations.
Of course, it is not quite clear what we mean by this: is a multivariate polynomial, so its roots are points in
. The appropriate extension of real-rootedness to multivariate polynomials is as follows.
Definition. A multivariate polynomial
is called stable if it has no roots in the set
, and is called real stable if in addition all its coefficients are real.
We are doing to establish the following facts:
- The polynomial
is real stable.
- Taking derivatives or fixing parameters preserves real stability.
- A univariate polynomial is real stable if and only if it is real-rooted.
As the (univariate) mixed characteristic polynomial is obtained from
by taking derivatives and fixing parameters, this reasoning will show that
is real-rooted.
The third assertion is the easiest, and we prove it first.
Lemma. A univariate polynomial is real stable if and only if it is real-rooted.
Proof. Clearly a real-rooted polynomial is real stable. Conversely, suppose a univariate polynomial is real stable. Then it has no roots
with
. However, if
, then
also. Therefore, there are also no roots with
. It follows that
must be real-rooted.
Next, we prove the preservation of real stability under taking derivatives and fixing parameters.
Lemma. Let
be a real stable polynomial and let
. Then
is real stable.
is real stable.
Proof. We begin by proving the first assertion. Clearly has real coefficients, so it suffices to show that it is stable. Suppose to the contrary that
has a root
in
. As the roots of a polynomial vary continuously with its coefficients, the polynomial
must also have a root in
for
sufficiently small. But this contradicts the real stability of
.
Remark. A careful proof of the continuity of roots of complex polynomials can be given using Rouché’s theorem. This is left as an exercise to the interested reader.
We now turn to the second assertion. It is again obvious that the derivative of the polynomial has real coefficients, so it remains to establish that it is stable. Assume to the contrary that
for some in
.
Define the univariate polynomial , and denote by
the roots of
. As
is real stable, we must have
and
for all
. Moreover, we can express
as
By assumption, we have
(as is analytic in a neighborhood of
). But note that
so
Taking the imaginary part of this expression yields
But and
, so we have achieved a contradiction.
Finally, we prove that the polynomial is in fact real stable.
Lemma. Suppose that
are positive semidefinite matrices. Then the polynomial
is real stable.
Proof. To show that is real, note that
As are Hermitian, this implies that
is real whenever the parameters
are all real. Thus the coefficients of
must be real, and it remains to prove stability.
To this end, suppose that has a root
in
. Then the matrix
is not invertible, so there exists a vector
such that
But and
, so the imaginary part of the left-hand size must be strictly positive by the assumption that the root
is in
. Thus we have achieved a contradiction.
Putting together the previous three results, we have shown the following.
Corollary. The mixed characteristic polynomial
is real-rooted whenever
are positive semidefinite matrices.
Putting everything together
We are now ready to put together the various ideas developed in this post.
Proposition. Let
be independent random vectors in
whose distribution has finite support, and let
. Then
with positive probability.
Proof. We will apply the interlacing property step by step. Denote by the possible values that the random vector
can take. First, note that for every collection of convex weights
by multilinearity. We have shown that the mixed characteristic polynomial on the right is real-rooted. Moreover, it is easily verified from their definition that the polynomials are monic and have the same degree. Therefore, by the interlacing property, there exists a vector
(which may depend on the outcomes of
) such that
We now proceed in precisely the same manner with the polynomial on the right-hand side of this inequality to establish the existence of a vector (depending on
) such that
By iterating this argument, we show that
But note that , which is a single realization of
; whereas
by multilinearity of the mixed characteristic polynomial and as the random vectors
are independent. We have therefore completed the proof.
We have now completed the first step in our program. In the following lectures, we will concern ourselves the with second step: the control of .
Remark. It is only fair to note that, even though the result we are after is a statement about random matrices, essentially no probability appears in the proof: the proof only finds one single realization of the random variable whose maximal root is controlled as desired, and the statement that this happens with positive probability is largely cosmetic (this is a finite probability space, so any realization occurs with positive probability). This highlights, in some sense, the limitations of the probabilistic method in the present context: as the event we are after has very small probability, the usual methods of probabilistic reasoning are not efficient at finding such an event.
Lectures by Philippe Rigollet | Notes by Junwei Lu and Ramon van Handel
Lecture 1. Introduction
The goal of our seminar this semester is to try to understand the unexpected application of the geometry of roots of polynomials to address probabilistic problems. Such methods have recently led to the resolution of several important questions, and could potentially be very useful in many other problems as well. In the following lectures, we will work in detail through some recent problems in which the geometry of polynomials arises in a crucial manner in the proof.
Kadison-Singer, random matrices, and polynomials
In 1959, Kadison and Singer made a fundamental conjecture about the extension of pure states on algebras. While the original conjecture was purely qualitative, it has since been reformulated in many different ways. These different formulations prove to have applications in various areas: see the nice survey by Cassaza and Tremain. After many decades, the Kadison-Singer conjecture was finally proved in a paper by Marcus, Spielman, and Srivastava. They prove a quantitative formulation of the conjecture, due to Weaver, that can be stated as an elementary (deterministic!) property of vectors in
.
Theorem (formerly Conjecture) There exist universal constants
such that, if
and
are any positive integers and
are vectors such that
then we can partition
into disjoint subsets
and
such that
where
denotes the spectral norm of the matrix
.
This result appears at first sight to be almost obvious: in the scalar case , the corresponding statement is completely elementary. To see this, let
satisfy
and
for all
. The only way that we can fail to find a subset
such that
and
is when the sizes of the
are very imbalanced (e.g., if
and
), and this is ruled out by the assumption
. It seems natural to expect that a similar conclusion follows in any dimension
.
Remark. To make this argument precise, let us argue by contradiction. Suppose that for every set , we have either
or
. The first possibility occurs when
and the second possibility occurs when
. By adding points one by one, we see that there must exist a set
and a point
such that
and
. But this implies that
, which contradicts
when
is sufficiently small.
Despite the intuitive nature of the statement, the general result is far from obvious. In particular, it is important to note that the affirmative solution to the Kadison-Singer conjecture requires that the constants must be independent of
and
. It is not at all obvious a priori that it is possible to achieve this: the scalar case is misleadingly simple, and its proof does not extend to higher dimension (try it!)
So far no probability has entered the picture: we are only asked to establish the existence of one set for which the above result holds. However, in many problems in analysis and combinatorics, the easiest way to find the existence of a nontrivial object is by selecting such an object randomly and attempting to show that it satisfies the desired properties with positive probability. This approach is known as the probabilistic method. Using this idea, we will presently show that the above deterministic problem can be reduced to a basic probabilistic problem of estimating the norm of certain random matrices.
Let be given. In order to find a suitable set
, let us try to choose this set uniformly at random: that is,
is a random set such that each point
is included in
independently with probability one half. Let us now define the random vectors
as follows:
where denotes the zero vector in
. A quick calculation gives
We therefore have
Moreover, we can compute
and therefore
It is now immediately clear that in order to prove the deterministic version of the Kadison-Singer conjecture, it suffices to prove the following result about random matrices.
Theorem. Let
be independent random vectors in
whose distribution has finite support. Suppose that
Then
(where
as
depends only on
).
How might one prove this result? The easiest approach would be to show that
in which case the result would follow immediately (this is called the first moment method). This seems promising, as the literature on random matrices contains numerous bounds on the norm of a random matrix. Unfortunately, this approach cannot work, as the following example illustrates.
Example. Denote by the standard basis in
, and let
be i.i.d. standard normal random variables for
and
. Define the random vectors
by
That is,
Then , so
. Moreover,
. Thus the assumptions of the above result are satisfied when
is sufficiently large. On the other hand, the matrix
is diagonal with entries
given by
, and we therefore find that
as (as the maximum of chi-square random variables is of order
). Thus the expected matrix norm in this example must depend on the dimension
. We therefore cannot obtain a dimension-free result, as is required to resolve Kadison-Singer, by looking at the expectation of the matrix norm.
Remark. In this example we used Gaussian variables, for which but
is not uniformly bounded. In the reduction of Kadison-Singer to the random matrix problem, we had the stronger property that
uniformly. However, by using Bernoulli variables instead of Gaussians, we should be able to construct an analogous example using only bounded variables.
Remark. The lovely method due to Ahlswede and Winter makes it possible to show rather easily that in the general random matrix setting that we are considering. Thus the above example exhibits the worst case behavior for this problem. However, as noted above, a dimension-dependent bound of this kind does not suffice for our purposes.
Is the result we are trying to prove hopeless? Not at all: bounding the expectation of the matrix norm is only a very crude way to show that the matrix norm is small with positive probability. The expected norm captures the typical behavior of the matrix norm: in the present case, we have seen above that this typical behavior must in general be dimension-dependent. However, all we want to establish is that it is possible that the matrix norm is bounded by a dimension-independent quantity (that is, that this happens with positive probability). Unfortunately, the above example suggests that the latter cannot be a typical event: the bound that we are interested in can only hold, in general, with very small probability that vanishes as we increase the dimension. This means that it is much more difficult to apply the probabilistic method, which usually works well precisely when the phenomenon that we are interested in is not only possible, but in fact typical. The present situation is much more delicate, and results in random matrix theory are not equipped to address it. To surmount this problem, Marcus, Spielman, and Srivastava developed an entirely new approach to bound the norm of a random matrix.
Let us briefly recall the common approaches in random matrix theory for bounding the norm of a random matrix
. The key issue is to reduce the random variable
to a probabilistic structure that we know how to control. The following two approaches are used in much of the literature:
- Recall that we can write
, where
denotes the unit ball in
. This reduces estimating the norm to estimating the supremum of a family of random variables
whose distribution is easily controlled (this is a linear function of the matrix entries). There are many standard probabilistic methods for estimating the supremum of random variables.
- Recall that we can write
(the quantity
is the
-norm of the eigenvalues of
, and the spectral norm is the
norm of the eigenvalues). We can therefore estimate the matrix norm if we can control the moments
. The latter is the expectation of a polynomial in the matrix entries, and thus the problem reduces to controlling the moments of the matrix entries and counting (this typically involves some combinatorics).
The beauty of these methods is that they reduce the problem of estimating the matrix norm to well-understood probabilistic problems of estimating the maxima or moments of random variables.
There is a third representation of the matrix norm that is taught in every linear algebra course. Suppose that the random matrix is positive definite (as is the case in our setting). Then the matrix norm
is the largest eigenvalue of
, and therefore coincides with the maximal root of the characteristic polynomial
(the roots of
are all real as
is Hermitian, so it makes sense to speak of the “maximal” root). That is, we have the representation
Note that as is a random matrix,
is a random polynomial. Its expectation
could potentially be a very useful tool in random matrix theory (just as
was a very useful quantity). Nonetheless, this representation is typically considered to be useless in random matrix theory. The problem is that
is an extremely nasty functional, and there is no reason for it to behave well under expectations. In essence, the difficulty is that the geometry of roots of polynomials is extremely nonlinear, and the probabilistic behavior of these objects is not well understood.
A major insight of Marcus, Spielman, and Srivastava is that this representation can be extremely useful in certain problems. To exploit it, they use (a variant of) the following remarkable observation.
Lemma. Let
be a random polynomial that takes values in the finite set
. Suppose that
all have the same degree and are monic (the leading coefficient is
), and that all convex combinations of
have only real roots. Then
![]()
This statement is not at all obvious! In particular, it is not true without the assumption on the locations of the roots. To use it, we must show that the random polynomials that arise in the random matrix problem satisfy the required assumption, and we must also find a way to control the maximal root of . We can now see why roots of polynomials play a key role throughout the proof, both in its statement (the norm is represented as a maximal root) and in its guts (the locations of the roots are essential to obtain results such as the above lemma). The development of these ideas will require us to develop tools that marry the geometry of roots of polynomials with the probabilistic problems that we are trying to address. These methods will be developed in detail in the following lectures.
Determinantal processes
Let us now turn to an entirely different application of roots of polynomials in probability theory. We first recall some basic definitions.
A point process on a space is a random finite or countable subset of
. For example,
might be
,
, or
. Some interesting examples of point processes include:
- The eigenvalues of Gaussian random matrices;
- The roots of an infinite power series with i.i.d. Gaussian coefficients;
- Given a (finite) graph
, choosing a spanning tree uniformly at random defines a random subset of the edges
;
- (Karlin-McGregor) Pick
different starting points in
, and run
independent random walks starting from these points conditionally on the event that their paths never cross. Then the positions of the
random walks at time
is a point process.
Remarkably, all of these apparently quite unrelated point processes turn out to share a very rich common structure: they are examples of determinantal point processes. For simplicity, let us define this notion only in the discrete case where .
Definition. A random subset
of
is called a determinantal point process if
for every
, where
is a given
matrix called the kernel of the point process, and
is the restriction of
to the rows and columns in
.
Determinantal point processes have many interesting properties. For the time being, let us mention one example: the property of negative dependence
for all ,
. We will (hopefully) encounter more interesting examples later on.
A basic tool for studying determinantal point processes is the generating function
Note that is a multivariate polynomial in
(complex) variables. As an illustration of the utility of the generating function, let us use it to express the property of negative dependence. It is easy to check that
and
where denotes the vector of ones. Hence, negative dependence is equivalent to the polynomial identity
Clearly only very special kinds of polynomials can satisfy such a property. Why should the generating functions of determinantal point processes be of this kind? Some insight can be gained from the following result of Brändén.
Theorem. The following are equivalent.
- For all
and
, we have
has no roots in
.
If either of the equivalent conditions of this theorem hold, we say that is real stable. Note that
, and thus we recover the negative dependence relationship if the generating function
is real stable.
While it was initially far from clear why we would care about the roots of polynomials in this setting, we now see that the locations of the roots of the generating function are apparently intimately related with the negative dependence properties of point processes. The geometry of the roots of polynomials enters in an essential manner in results of this kind. It is not difficult to see that determinantal point processes must have real stable generating functions under some mild assumptions (very roughly speaking, one can compute
in terms of the kernel
of the determinantal point process; thus if
, the generating function is the characteristic polynomial of a positive definite matrix, and thus has no roots in the upper quadrant of
). Such ideas therefore provide a powerful tool to investigate the properties of determinantal point processes.
The simple negative dependence property described above could have been obtained in a more elementary fashion. However, the polynomial method makes it possible to prove more delicate properties, such as stronger negative association properties or concentration of measure properties of determinantal point processes. If time permits, we will try to cover some of these topics at the end of the semester. While these questions are quite distinct from the Kadison-Singer problem, it is remarkable that many of the same tools play a central role in both cases.
Many thanks to Dan Lacker for scribing this lecture!
Fall 2014: Roots of polynomials and probabilistic applications
Welcome back to the stochastic analysis seminar! We resume this semester with an unusual but exciting topic: the geometry of roots of polynomials and their probabilistic applications.
Understanding the roots of polynomials seems far from a probabilistic issue, yet has recently appeared as an important technique in various unexpected problems in probability as well as in theoretical computer science. As an illustration of the power of such methods, these informal lectures will work through two settings where significant recent progress was enabled using this idea. The first is the proof of the Kadison-Singer conjecture by using roots of polynomials to study the norm of certain random matrices. The second is the proof that determinantal processes, which arise widely in probability theory, exhibit concentration of measure properties. No prior knowledge of these topics will be assumed.
Time and location: Thursdays, 4:30-6:00, Sherrerd Hall 101 (note different location than last year).
The first lecture will be on September 18.
Notes from the lectures, schedule changes and any other announcements will be posted on this blog.
References:
- A. W. Marcus, D. A. Spielman, N. Srivastava, “Interlacing families I/II”
- Notes by T. Tao
- Notes by N. K. Vishnoi
- J. Borcea, P. Brändén, T. M. Liggett, “Negative dependence and the geometry of polynomials”
- R. Pemantle and Y. Peres, “Concentration of Lipschitz functionals of determinantal and other strong Rayleigh measures”