We start with some of the standard setups for the Mirror Descent algorithm we described in the previous post.
Standard setups for Mirror Descent
‘Ball’ setup. The simplest version of Mirror Descent is obtained by taking on The function is a mirror map strongly convex w.r.t. , and furthermore the associated Bregman Divergence is given by . Thus in that case Mirror Descent is exactly equivalent to Projected Subgradient Descent, and the rate of convergence obtained in the previous post recovers our earlier result on Projected Subgradient Descent.
‘Simplex’ setup. A more interesting choice of a mirror map is given by the negative entropy
on . In that case the gradient update can be written equivalently as
The Bregman divergence of this mirror map is given by (also known as the generalized Kullback-Leibler divergence). It is easy to verify that the projection with respect to this Bregman divergence on the simplex amounts to a simple renormalization . Furthermore it is also easy to verify that is -strongly convex w.r.t. on (this result is known as Pinsker’s inequality). Note also that one has and when
The above observations imply that when minimizing on the simplex a function with subgradients bounded in -norm, Mirror Descent with the negative entropy achieves a rate of convergence of order . On the other the regular Subgradient Descent achieves only a rate of order in this case!
‘Spectrahedron’ setup. We consider here functions defined on matrices, and we are interested in minimizing a function on the spectrahedron defined as:
In this setting we consider the mirror map on given by the negative von Neumann entropy:
where are the eigenvalues of . It can be shown that the gradient update can be written equivalently as
where the matrix exponential and matrix logarithm are defined as usual. Furthermore the projection on is a simple trace renormalization.
With highly non-trivial computation one can show that is -strongly convex with respect to the Schatten -norm defined as
It is easy to see that one has and for . In other words the rate of convergence for optimization on the spectrahedron is the same than on the simplex!
Mirror Descent for Saddle Points
We consider now the following saddle-point problem: Let be compact convex sets, and be a continuous function such that is convex and is concave. We are interested in finding such that
(Note that the above equality uses Sion’s minimax theorem from this lecture.) We measure the quality of a candidate solution through the primal/dual gap
The key observation now is that for any , one has with and ,
In particular, denoting we just proved
for some This inequality suggests that one can solve the saddle-point problem with a Mirror Descent in the space using the ‘vector field’ given by . Precisely this gives the following SP Mirror Descent:
Let be a mirror map on (defined on ) and let be a mirror map on (defined on ). We consider now the mirror map on (defined on ).
Let . Then for , let such that
where with and . Finally let
The following result describes the performances of SP Mirror Descent.
Theorem. Assume that is -strongly convex on with respect to , and let . Assume that is -Lipschitz w.r.t. . Define similarly . Then SP Mirror Descent with satisfies
Proof: First we endow with the norm defined by
It is immediate that is -strongly convex with respect to on . Furthermore one can easily check that
and thus the vector field used in the SP mirror descent satisfies:
It suffices now to use the exact same proof than for the standard Mirror Descent starting with: