The goal of this second lecture is to explain how to do the variance calculation we talked about at the end of Part 1 for the case where the exponential weights distribution is non-Gaussian. We will lose here a factor in the regret bound compared to the optimistic calculations from Part 1, and thus by the end of this lecture we could still hope to prove a regret bound of order .
Gaussian core
First let us revisit the introduction of the core. Recall that we are considering a kernel such that is the distribution of for some random variable to be defined. To control the regret we want to have the following inequality for any convex function :
which means that the distribution of the random variable should satisfy
(1)
The core is defined so that (1) is in fact an equality. As we discussed in the previous post the core is a deep mathematical construction, but unfortunately we could not find a way to generalize the Part 1’s variance calculation when the core is non-Gaussian. In what follows we describe a slightly different construction which will allow us to satisfy (1) with being Gaussian. A key observation is that if is convexly dominated by , i.e. for any convex one has , then (1) is satisfied by taking to be the core of :
Thus it suffices to show that for any that we care about one can find a Gaussian “inside” of (in the sense that is convexly dominated by ). Then instead of taking the core of to define the kernel one can take the core of (which we will also refer to as the Gaussian core of ).
Next we show that we can essentially restrict our attention to the case where is log-concave, and then we show how to find a Gaussian inside a log-concave measure.
Approximate log-concavity
Recall that in Part 1 we replaced the variance calculation by simply showing that the loss estimates are bounded. Thus we see by induction and using Hoeffding-Azuma that with probability at least one has
Using an union bound over an -net together with Lipschitzness of (which we didn’t prove but it follows the same lines than boundedness) one has in fact that with high probability (say ) there is a convex function (recall that is a convex function) such that
In particular provided that the above display shows that (whose density is proportional to ) is -approximately log-concave (recall that is said to be -approximately log-concave if there is a log-concave function such that ). To simplify the discussion in these lectures we will in fact assume that is exactly log-concave.
We note that the above concentration argument makes us lose a factor in the regret. Indeed in Part 1 we used to (informally) obtain the regret (or with even more optimistic calculations). We see that now we are forced to take a learning rate which is smaller than this, which in turn multiplies the regret bound by a factor .
Finding a Gaussian inside a log-concave
Let be an isotropic log-concave measure. We will show that a centered Gaussian with covariance is (approximately) convexly dominated by . We note in passing that by going back to the calculations at the end of Part 1 it is easy to see that this factor in the covariance will in turn force us to take to be smaller than what we hoped at the end of Part 1, leading to yet again an extra in the regret bound.
In fact we will show that any centered measure supported on a ball of radius is (exactly) convexly dominated by . This implies the (approximate) Gaussian convex domination we mentioned above since most of the mass of is in such a ball (it is easy to deal with the remaining mass but we won’t worry about it here).
Let us fix some convex function . We want to show that . Since both measures are centered we may add any linear function to without affecting this inequality, so we may legitimately assume that and for all . By scaling we might also assume that the maximum on the centered ball of radius is and thus it is enough to show . By convexity is above the linear function where is the maximizer of on the centered ball of radius . Note also that and thus it is enough to show (recall that ):
where , which is implied by
It only remains to use the fact that an isotropic real log-concave random variable verifies , and thus the above display holds true with .