A week or so ago at our Theory Lunch we had the pleasure to listen to Harishchandra Ramadas (student of Thomas Rothvoss) who told us about their latest discrepancy algorithm. I think the algorithm is quite interesting as it combines ideas from gradient descent and multiplicative weights in a non-trivial (yet very simple) way. Below I reprove Spencer’s deviations theorem with their machinery (in the actual paper Levy, Ramadas and Rothvoss do more than this).
First let me remind you the setting (see also this previous blog post for some motivation on discrepancy and a bit more context; by the way it is funny to read the comments in that post after this): given one wants to find
(think of it as a “coloring” of the coordinates) such that
for some numerical constant
(when
is a normalized vectors of
‘s and
‘s the quantity
represents the unbalancedness of the coloring in the set corresponding to
). Clearly it suffices to give a method to find
with at least half of its coordinates equal to
and
and such that
for some numerical constant
(indeed one can then simply recurse on the coordinates not yet set to
or
; this is the so-called “partial coloring” argument). Note also that one can drop the absolute value by taking
and
(the number of constraints then becomes
but this is easy to deal with and we ignore it here for sake of simplicity).
The algorithm
Let ,
. We run an iterative algorithm which keeps at every time step
a subspace
of valid update directions and then proceeds as follows. First find (using for instance a basis for
)
such that
(1)
Then update where
is maximal so that
remains in
. Finally update the exponential weights by
.
It remains to describe the subspace . For this we introduce the set
containing the
largest coordinates of
(the “inactive” coordinates) and the set
containing the coordinates of
equal to
or
(the “frozen” coordinates). The subspace
is now described as the set of points orthogonal to (i)
, (ii)
, (iii)
, (iv)
. The intuition for (i) and (ii) is rather clear: for (i) one simply wants to ensure that the method keeps making progress towards the boundary of the cube (i.e.,
) while for (ii) one wants to make sure that coordinates which are already “colored” (i.e., set to
or
) are not updated. In particular (i) and (ii) together ensures that at each step either the norm squared of
augments by
(in particular
) or that one fixes forever one of the coordinates to
or
. In particular this means that after at most
iterations one will have a partial coloring (i.e., half of the coordinates set to
or
, which was our objective). Property (iii) is about ensuring that we stop walking in the directions where we are not making good progress (there are many ways to ensure this and this precise form will make sense towards the end of the analysis). Property (iv) is closely related, and while it might be only a technical condition it can also be understood as ensuring that locally one is not increasing the softmax of the constraints, indeed (iv) exactly says that one should move orthogonally to the gradient of
.
The analysis
Let . Note that since
is on the sphere and
one has that
. Thus using
for
, as well as property (iv) (i.e.,
) and
one obtains:
Observe now that the subspace has dimension at least
(say for
) and thus by (1) and the above inequalities one gets:
In particular for any for some numerical constant
. It only remains to observe that this ensures
for any
(this concludes the proof since we already observed that at time
at least half of the coordinates are colored). For this last implication we simply use property (iii). Indeed assume that some coordinate
satisfies at some time
,
for some
. Since each update increases the weights (multiplicatively) by at most
it means that there is a previous time (say
) where this weight was larger than
and yet it got updated, meaning that it was not in the top
weights, and in particular one had
which contradicts
for
large enough (namely
).
By Sasho October 18, 2017 - 1:22 am
Seb, I think you are claiming a little too much in the second paragraph. While the partial coloring lemma works for arbitrary unit vectors, Spencer’s theorem doesn’t. I.e. it is not true that for any $n$ unit vectors $v_1, \ldots, v_n$ in $\mathbb{R}^n$, there are signs $x_1, \ldots, x_n$ such that $\max_i |\langle v_i, x_i \rangle| = O(1)$. The best you can guarantee is $O(\sqrt{\log n})$, which is what you get from a random coloring. What breaks in your argument is that when you recurse on the fractionally colored elements, the lengths of the vectors may not decrease.That’s why in the Spencer setting you assume that each $v_i$ is in $[-1, 1]^n$, and you ask that $\max_i |\langle v_i, x_i \rangle| = O(\sqrt{n})$.