In this lecture we describe the wonderful ellipsoid method. Recall that an ellipsoid is a convex set of the form
where , and is a symmetric positive definite matrix. Geometrically is the center of the ellipsoid, and the semi-axes of are given by the eigenvectors of , with lengths given by the square root of the corresponding eigenvalues.
A geometric lemma
We start with a simple geometric lemma, which is at the heart of the ellipsoid method.
Lemma Let be an ellipsoid centered at . For any , there exists an ellipsoid such that
This lemma reads as follows: For any ellipsoid centered at , there exists another ellipsoid which contains the half ellipsoid , and such that the volume of is smaller (by a factor ) than the volume of the original ellipsoid . Furthermore as we shall see from the proof, there is an explicit analytical formula to construct the ellipsoid .
Proof: For the result is obvious, in fact we even have
Now let , and let us first focus on the case where is the Euclidean ball, that is . Without loss of generality we assume that is a unit norm vector. By doing a quick picture, one can see that it makes sense to look for an ellipsoid that would be centered at , with (presumably will be small), and such that one principal direction is (with inverse squared semi-axis ), and the other principal directions are all orthogonal to (with the same inverse squared semi-axes ). In other words we are looking for of the following form:
Now we have to express our constraints on the fact that should contain the half Euclidean ball . Since we are also looking for to be as small as possible, it makes sense to ask for to ‘touch’ the Euclidean ball, both at , and at the equator . The former condition can be written as:
while the latter is expressed as:
As one can see from the above two equations, we are still free to choose any value for (the fact that we need comes from ). Let us simply take the one that minimizes the volume of the resulting ellipsoid! Note that
where . Elementary computations show that the maximum of (on ) is attained at (which corresponds to ), and the value is
where the lower bound follows again from elementary computations. This concludes the proof for the case where is the Euclidean ball. The general case can be analyzed very easily by noting that an arbitrary ellipsoid centered at is the unit ball in some norm derived from a dot product . Then one can mimic all the computations described above using this dot product instead of the canonical inner product of (for instance will be replaced by , ect).
The ellipsoid method
Let be a convex body, and be a continuous and convex function. Let be such that is contained in an Euclidean ball of radius (respectively it contains an Euclidean ball of radius ).
The ellipsoid method (invented in the 1970 by Shor, and Yudin and Nemirovski) produces a sequence of ellipsoids , starting with being the Euclidean ball of radius that contains . Let be the associated sequence of centers of the ellipsoids. We describe now how to construct based on . There are two different cases:
Case (a): . Then by the Separation Theorem one can find a vector such that . In that case is the ellipsoid given by our geometric lemma that contains the half ellipsoid .
Case (b): . Then one can use a subgradient of at , denote it by , such that:
In particular remark that all points in the half space have a function value strictly larger than the value at . Thus in that case we choose to be the ellipsoid given by our geometric lemma that contains the half ellipsoid .
Output: Assume that we stop the method after steps. Then we output
Theorem (Yudin and Nemirovski 1972)
Proof: Let be such that . It is clear from the description of the method that . Furthermore by the geometric lemma we also have
Now for , let
Clearly one has:
In particular for , one has . This means that there must exist a time , and , such that and . The only way for this to happen, since , is that at time we were in Case (b) (see definition of the method), that is and . But clearly by convexity of , one also has . This concludes the proof.
The ellipsoid method is based on three subroutines with the following properties:
- Separation algorithm: given , it outputs either that is in , or if then it outputs a separating hyperplane between and .
- Subgradient algorithm: Given , it outputs a subgradient of at .
- Evaluation algorithm: Given , it outputs the value of at .
To reach an accuracy of , the ellipsoid method must perform iterations. Each iteration results in a call to the separation algorithm, and potentially the subgradient and evaluation algorithms. Thus to reach an accuracy of one makes at most calls to each subroutine. Furthermore the algorithmic implementation of the geometric lemma takes ‘elementary operations’, which in total yields a computational complexity of to reach accuracy (assuming that a call to a subroutine takes at most ). One can interpret this results as saying that to improve the quality of our solution value by one digit, one needs a computational effort of . In the literature an exponential rate of decrease for the optimization error (such as the one we proved here for the ellipsoid method) is sometimes call a linear rate, because the effort to increase the number of digits of accuracy is linear.
It is standard to call the subgradient algorithm a –order oracle (respectively the evaluation algorithm a -order oracle). These oracles are assumed to be given with the problem, and one usually counts the number of calls to these oracles, but their computational complexity is not directly taken into account. As we shall see later in this course, one can develop an almost complete theory for optimization if one focuses on the complexity in terms of number of calls to these oracles. Arguably however it is more natural to try to understand the computational complexity of optimization, but this is a much more difficult task (it will become clear with the incoming lectures).