The goal of this second lecture is to explain how to do the variance calculation we talked about at the end of Part 1 for the case where the exponential weights distribution is non-Gaussian. We will lose here a factor in the regret bound compared to the optimistic calculations from Part 1, and thus by the end of this lecture we could still hope to prove a regret bound of order
.
Gaussian core
First let us revisit the introduction of the core. Recall that we are considering a kernel such that
is the distribution of
for some random variable
to be defined. To control the regret we want to have the following inequality for any convex function
:
which means that the distribution of the random variable
should satisfy
(1)
The core is defined so that (1) is in fact an equality. As we discussed in the previous post the core is a deep mathematical construction, but unfortunately we could not find a way to generalize the Part 1’s variance calculation when the core is non-Gaussian. In what follows we describe a slightly different construction which will allow us to satisfy (1) with being Gaussian. A key observation is that if
is convexly dominated by
, i.e. for any convex
one has
, then (1) is satisfied by taking
to be the core of
:
Thus it suffices to show that for any that we care about one can find a Gaussian
“inside” of
(in the sense that
is convexly dominated by
). Then instead of taking the core of
to define the kernel one can take the core of
(which we will also refer to as the Gaussian core of
).
Next we show that we can essentially restrict our attention to the case where is log-concave, and then we show how to find a Gaussian inside a log-concave measure.
Approximate log-concavity
Recall that in Part 1 we replaced the variance calculation by simply showing that the loss estimates are bounded. Thus we see by induction and using Hoeffding-Azuma that with probability at least
one has
Using an union bound over an -net together with Lipschitzness of
(which we didn’t prove but it follows the same lines than boundedness) one has in fact that with high probability (say
) there is a convex function
(recall that
is a convex function) such that
In particular provided that the above display shows that
(whose density is proportional to
) is
-approximately log-concave (recall that
is said to be
-approximately log-concave if there is a log-concave function
such that
). To simplify the discussion in these lectures we will in fact assume that
is exactly log-concave.
We note that the above concentration argument makes us lose a factor in the regret. Indeed in Part 1 we used
to (informally) obtain the regret
(or
with even more optimistic calculations). We see that now we are forced to take a learning rate which is
smaller than this, which in turn multiplies the regret bound by a factor
.
Finding a Gaussian inside a log-concave
Let be an isotropic log-concave measure. We will show that a centered Gaussian
with covariance
is (approximately) convexly dominated by
. We note in passing that by going back to the calculations at the end of Part 1 it is easy to see that this factor
in the covariance will in turn force us to take
to be
smaller than what we hoped at the end of Part 1, leading to yet again an extra
in the regret bound.
In fact we will show that any centered measure supported on a ball of radius
is (exactly) convexly dominated by
. This implies the (approximate) Gaussian convex domination we mentioned above since most of the mass of
is in such a ball (it is easy to deal with the remaining
mass but we won’t worry about it here).
Let us fix some convex function . We want to show that
. Since both measures are centered we may add any linear function to
without affecting this inequality, so we may legitimately assume that
and
for all
. By scaling we might also assume that the maximum on the centered ball of radius
is
and thus it is enough to show
. By convexity
is above the linear function
where
is the maximizer of
on the centered ball of radius
. Note also that
and thus it is enough to show (recall that
):
where , which is implied by
It only remains to use the fact that an isotropic real log-concave random variable verifies
, and thus the above display holds true with
.