Komlos conjecture, Gaussian correlation conjecture, and a bit of machine learning

Today I would like to talk (somewhat indirectly) about a beautiful COLT 2014 paper by Nick Harvey and Samira Samadi. The problem studied in this paper goes as follows: imagine that you have a bunch of data points in $\mathbb{R}^n$ with a certain mean value $\mu$ . Your goal is to generate a sequence out these data points, such that the partial averages of this new sequence are as close as possible to $\mu$ . The “herding” procedure of Max Welling has the property that the average of the $t^{th}$ first terms in the new sequence is at distance at most $O(1/t)$ from $\mu$ (this was shown by Francis Bach, Simon Lacoste-Julien and Guillaume Obozinski by viewing the herding procedure as a simple Frank-Wolfe). In their paper Nick and Samira improves the dependency on the dimension $n$ in the $O$ notation by using a new algorithm inspired by the theory of discrepancy. In the rest of this post I would like to discuss the basic notions of this theory. What follows is a transcription of what Ronen Eldan (a postdoc in the theory group at MSR) explained to me (of course all errors are mine). I will use $c$ and $C$ to denote absolute numerical constants that take whatever value I need them to take.

Discrepancy theory

Consider a large finite set of vertices $V$ , and a size- $n$ set $\mathcal{P}$ of subsets of vertices (in other words $(V, \mathcal{P})$ is an hypergraph). For sake of simplicity we will assume here that elements of $\mathcal{P}$ are also of size $n$ . We are now interested in the following combinatorial optimization problem:

$\begin{align*} & \qquad \;\; \mathrm{min.} \qquad \;\;\max_{A \in \mathcal{P}} \left| \sum_{v \in A} f(v) \right| \\ & f : V \rightarrow \{-1,1\} . \end{align*}$

In words, one is looking for a partitioning $f$ of the vertices, so as to have the hyperedges $A \in \mathcal{P}$ as balanced as possible.

Alright, so let’s try something simple, just randomly partition the vertices: clearly a simple union bounds shows that the obtained objective value in this case is of order $\sqrt{n \log n}$ . Denote $D$ (for Discrepancy) the smallest objective value (in particular we just saw that $D \leq C \sqrt{n \log n}$ ). A beautiful result of Joel Spencer shows that one can always do much better than random partitioning:

Theorem (Spencer’s six deviations, 1985): $D \leq 6 \sqrt{n}$ .

We will now see a splendid (partial) proof of this result. Let $\gamma$ be the Gaussian measure in $\mathbb{R}^d$ (in the following we will take $d=|V|$ ). First let me remind you of a basic fact about this measure, and a related conjecture. A set of the form $S_{v,t} = \{x : |v^{\top} x| \leq t\}$ , for some $v \in \mathbb{R}^n, t >0$ , is called a slab.

Theorem (Sidak, 1967): Let $A$ be a symmetric convex set and $B$ be a slab, then

$\gamma(A \cap B) \geq \gamma(A) \gamma(B) .$

Obviously there is equality if $A$ and $B$ are two orthogonal slabs, and the theorem says (in particular) that an angle between the slabs would imply positive correlation.

Conjecture (Gaussian correlation): Let $A$ and $B$ be two symmetric convex sets, then

$\gamma(A \cap B) \geq \gamma(A) \gamma(B) .$

Ok, now back to the proof of Spencer’s theorem. First, let’s rewrite things suggestively in terms of the slabs $S_A := S_{\frac{\mathds{1}_A}{\sqrt{n}}, C}$ , $A \in \mathcal{P}$ . We want to show that there exists $f \in \{-1,1\}^d$ such that $f \in \cap_{A \in \mathcal{P}} S_A$ . Using Sidak’s theorem one clearly has $\gamma(\cap_{A \in \mathcal{P}} S_A) \geq c^d$ (with $c$ being roughly like $1-2 \exp(-C^2/2)$ ). From there it is now a fairly short route to show that, if $c \geq 0.99$ (say), then the intersection of $\{-1,1\}^d$ with $\cap_{A \in \mathcal{P}} S_A$ must be non-empty. Ronen showed me a non-trivial (and algorithmically non-efficient) 5 lines argument for this but I think that it’s yet unpublished work so I won’t reproduce it here. Another proof of this, which yields an efficient algorithm for finding a point in the intersection, is given in this very nice paper of Lovett and Meka.

Extensions

In what we discussed so far the underlying hypergraph was completely arbitrary. For some specific graphs one can clearly improve the bound on $D$ . Indeed if all hyperedges are disjoint, then $D=0$ . More interestingly, Beck and Fiala showed in 1981 that, if each vertex appears in at most $t$ hyperedges, then $D \leq 2 t$ . They also conjectured that the correct bound should be $O(\sqrt{t})$ . In 1998 Banaszczyk showed a bound of $O(\sqrt{t \log(n)})$ using the following theorem (which is a variant of the result mentioned above):

Theorem (Banaszczyk, 1998): Let $A$ be a convex set such that $\gamma(A) \geq 1/2$ , and $u_1, \hdots u_m \in \mathbb{R}^d$ such that $\|u_i\|_2 \leq C$ . Then there exists $f \in \{-1,1\}^m$ such that $\sum_{i=1}^m f_i u_i \in A$ .

The longstanding Komlos’ conjecture says that in fact this holds for $A=[-1,1]^n$ (despite the fact that the Gaussian measure of this set is $c^n$ ).

Finally let me mention another open problem, which is the matrix version of Spencer’s six deviations, see this blog post by Raghu Meka for details.

5 Responses to "Komlos conjecture, Gaussian correlation conjecture, and a bit of machine learning"

By Sasho November 16, 2014 - 1:27 pm

This is essentially the Steinitz lemma, right? There is a beautiful proof, using a iterative rounding procedure, that if all points are bounded in some norm, then the $t$ -th partial average will be bounded in the same norm by $O(n/t)$ , where $n$ is the dimension. Improving this to $O(\sqrt{n}/t)$ is an important open problem. All the discrepancy techniques introduce factors polynomial in $\log N$ , where $N$ is the number of points.

The only methods to get these bounds I knew are the iterative rounding and the discrepancy techniques. Now I am curious to look into this herding procedure.
By U August 25, 2014 - 2:18 am

There is a claimed proof for the Gaussian Conjecture, by Yashar Memarian, a student of Gromov:

http://arxiv.org/abs/1310.8099
- By Sebastien Bubeck August 25, 2014 - 11:46 am
  
  I believe that there is a known bug in this proof though I don’t know the details.
- By Thomas Royen December 9, 2014 - 12:35 pm
  
  Recently, a simple proof of the Gaussian correlation conjecture was given by Thomas Royen.
  For a preprint see arXiv:1408.1028v2[math.PR]
  and for a shorter Version:
  “A simple proof of the Gaussian correlation conjecture extended to some multivariate gamma distributions”, Far East Journal of Theoretical Statistics, Vol. 48, 2014, 139-145.
- By Sebastien Bubeck December 30, 2015 - 11:56 pm
  
  Dear Thomas, it looks like your proof has been verified, see this paper posted yesterday: http://arxiv.org/pdf/1512.08776v1.pdf . Congratulations to you for your great achievement!
  I am a bit curious as to why you submitted such a great paper to the Far East Journal of Theoretical Statistics (to be honest the name of the journal is the reason why I didn’t look at your paper in more details in the first place…).

Komlos conjecture, Gaussian correlation conjecture, and a bit of machine learning

5 Responses to "Komlos conjecture, Gaussian correlation conjecture, and a bit of machine learning"

By Sasho November 16, 2014 - 1:27 pm

By U August 25, 2014 - 2:18 am

By Sebastien Bubeck August 25, 2014 - 11:46 am

By Thomas Royen December 9, 2014 - 12:35 pm

By Sebastien Bubeck December 30, 2015 - 11:56 pm

Leave a replyLeave a Reply to U

Archives

Categories

Recent Posts

Subscribe to Blog via Email

Meta

Blogroll