Today I would like to talk (somewhat indirectly) about a beautiful COLT 2014 paper by Nick Harvey and Samira Samadi. The problem studied in this paper goes as follows: imagine that you have a bunch of data points in with a certain mean value
. Your goal is to generate a sequence out these data points, such that the partial averages of this new sequence are as close as possible to
. The “herding” procedure of Max Welling has the property that the average of the
first terms in the new sequence is at distance at most
from
(this was shown by Francis Bach, Simon Lacoste-Julien and Guillaume Obozinski by viewing the herding procedure as a simple Frank-Wolfe). In their paper Nick and Samira improves the dependency on the dimension
in the
notation by using a new algorithm inspired by the theory of discrepancy. In the rest of this post I would like to discuss the basic notions of this theory. What follows is a transcription of what Ronen Eldan (a postdoc in the theory group at MSR) explained to me (of course all errors are mine). I will use
and
to denote absolute numerical constants that take whatever value I need them to take.
Discrepancy theory
Consider a large finite set of vertices , and a size-
set
of subsets of vertices (in other words
is an hypergraph). For sake of simplicity we will assume here that elements of
are also of size
. We are now interested in the following combinatorial optimization problem:
In words, one is looking for a partitioning of the vertices, so as to have the hyperedges
as balanced as possible.
Alright, so let’s try something simple, just randomly partition the vertices: clearly a simple union bounds shows that the obtained objective value in this case is of order . Denote
(for Discrepancy) the smallest objective value (in particular we just saw that
). A beautiful result of Joel Spencer shows that one can always do much better than random partitioning:
Theorem (Spencer’s six deviations, 1985):
.
We will now see a splendid (partial) proof of this result. Let be the Gaussian measure in
(in the following we will take
). First let me remind you of a basic fact about this measure, and a related conjecture. A set of the form
, for some
, is called a slab.
Theorem (Sidak, 1967): Let
be a symmetric convex set and
be a slab, then
Obviously there is equality if and
are two orthogonal slabs, and the theorem says (in particular) that an angle between the slabs would imply positive correlation.
Conjecture (Gaussian correlation): Let
and
be two symmetric convex sets, then
Ok, now back to the proof of Spencer’s theorem. First, let’s rewrite things suggestively in terms of the slabs ,
. We want to show that there exists
such that
. Using Sidak’s theorem one clearly has
(with
being roughly like
). From there it is now a fairly short route to show that, if
(say), then the intersection of
with
must be non-empty. Ronen showed me a non-trivial (and algorithmically non-efficient) 5 lines argument for this but I think that it’s yet unpublished work so I won’t reproduce it here. Another proof of this, which yields an efficient algorithm for finding a point in the intersection, is given in this very nice paper of Lovett and Meka.
Extensions
In what we discussed so far the underlying hypergraph was completely arbitrary. For some specific graphs one can clearly improve the bound on . Indeed if all hyperedges are disjoint, then
. More interestingly, Beck and Fiala showed in 1981 that, if each vertex appears in at most
hyperedges, then
. They also conjectured that the correct bound should be
. In 1998 Banaszczyk showed a bound of
using the following theorem (which is a variant of the result mentioned above):
Theorem (Banaszczyk, 1998): Let
be a convex set such that
, and
such that
. Then there exists
such that
.
The longstanding Komlos’ conjecture says that in fact this holds for (despite the fact that the Gaussian measure of this set is
).
Finally let me mention another open problem, which is the matrix version of Spencer’s six deviations, see this blog post by Raghu Meka for details.
By Sasho November 16, 2014 - 1:27 pm
This is essentially the Steinitz lemma, right? There is a beautiful proof, using a iterative rounding procedure, that if all points are bounded in some norm, then the
-th partial average will be bounded in the same norm by
, where
is the dimension. Improving this to
is an important open problem. All the discrepancy techniques introduce factors polynomial in
, where
is the number of points.
The only methods to get these bounds I knew are the iterative rounding and the discrepancy techniques. Now I am curious to look into this herding procedure.
By U August 25, 2014 - 2:18 am
There is a claimed proof for the Gaussian Conjecture, by Yashar Memarian, a student of Gromov:
http://arxiv.org/abs/1310.8099
By Sebastien Bubeck August 25, 2014 - 11:46 am
I believe that there is a known bug in this proof though I don’t know the details.
By Thomas Royen December 9, 2014 - 12:35 pm
Recently, a simple proof of the Gaussian correlation conjecture was given by Thomas Royen.
For a preprint see arXiv:1408.1028v2[math.PR]
and for a shorter Version:
“A simple proof of the Gaussian correlation conjecture extended to some multivariate gamma distributions”, Far East Journal of Theoretical Statistics, Vol. 48, 2014, 139-145.
By Sebastien Bubeck December 30, 2015 - 11:56 pm
Dear Thomas, it looks like your proof has been verified, see this paper posted yesterday: http://arxiv.org/pdf/1512.08776v1.pdf . Congratulations to you for your great achievement!
I am a bit curious as to why you submitted such a great paper to the Far East Journal of Theoretical Statistics (to be honest the name of the journal is the reason why I didn’t look at your paper in more details in the first place…).