ORF523: the ellipsoid method

In this lecture we describe the wonderful ellipsoid method. Recall that an ellipsoid is a convex set of the form

\displaystyle \mathcal{E} = \{x \in {\mathbb R}^n : (x - c)^{\top} H^{-1} (x-c) \leq 1 \} ,

where {c \in {\mathbb R}^n}, and {H} is a symmetric positive definite matrix. Geometrically {c} is the center of the ellipsoid, and the semi-axes of {\mathcal{E}} are given by the eigenvectors of {H}, with lengths given by the square root of the corresponding eigenvalues.

 

A geometric lemma

We start with a simple geometric lemma, which is at the heart of the ellipsoid method.

Lemma Let {\mathcal{E} \subset {\mathbb R}^n} be an ellipsoid centered at {0}. For any {w \in {\mathbb R}^n}, there exists an ellipsoid {\mathcal{E}'} such that

\displaystyle \{x \in \mathcal{E} : x^{\top} w \leq 0\} \subset \mathcal{E}' ,

and

\displaystyle \mathrm{vol}(\mathcal{E}') \leq \exp \left(- \frac{1}{2 n} \right) \mathrm{vol}(\mathcal{E}) .

This lemma reads as follows: For any ellipsoid {\mathcal{E}} centered at {0}, there exists another ellipsoid {\mathcal{E}'} which contains the half ellipsoid {\{x \in \mathcal{E} : x^{\top} w \leq 0\}}, and such that the volume of {\mathcal{E}'} is smaller (by a factor {\exp \left(- \frac{1}{2 n} \right)}) than the volume of the original ellipsoid {\mathcal{E}}. Furthermore as we shall see from the proof, there is an explicit analytical formula to construct the ellipsoid {\mathcal{E}'}.

Proof: For n=1 the result is obvious, in fact we even have \mathrm{vol}(\mathcal{E}') \leq \frac12 \mathrm{vol}(\mathcal{E}) .

Now let n \geq 2, and let us first focus on the case where {\mathcal{E}} is the Euclidean ball, that is {\mathcal{E}= \{x \in {\mathbb R}^n : \|x\|_2 \leq 1\}}. Without loss of generality we assume that {w} is a unit norm vector. By doing a quick picture, one can see that it makes sense to look for an ellipsoid {\mathcal{E}'} that would be centered at {c= - t w}, with {t \in [0,1]} (presumably {t} will be small), and such that one principal direction is {w} (with inverse squared semi-axis {a>0}), and the other principal directions are all orthogonal to {w} (with the same inverse squared semi-axes {b>0}). In other words we are looking for {\mathcal{E}'} of the following form:

\displaystyle \mathcal{E}' = \{x: (x - c)^{\top} H^{-1} (x-c) \leq 1 \}, \; \text{with} \; c = - t w, \; \text{and} \; H^{-1} = a w w^{\top} + b (I_n - w w^{\top} ) .

Now we have to express our constraints on the fact that {\mathcal{E}'} should contain the half Euclidean ball {\{x \in \mathcal{E} : x^{\top} w \leq 0\}}. Since we are also looking for {\mathcal{E}'} to be as small as possible, it makes sense to ask for {\mathcal{E}'} to ‘touch’ the Euclidean ball, both at {x = - w}, and at the equator {\partial \mathcal{E} \cap w^{\perp}}. The former condition can be written as:

\displaystyle (- w - c)^{\top} H^{-1} (- w - c) = 1 \Leftrightarrow (t-1)^2 a = 1 ,

while the latter is expressed as:

\displaystyle \forall y \in \partial \mathcal{E} \cap w^{\perp}, (y - c)^{\top} H^{-1} (y - c) = 1 \Leftrightarrow b + t^2 a = 1 .

As one can see from the above two equations, we are still free to choose any value for {t \in [0,1/2)} (the fact that we need {t<1/2} comes from {b=1 - \left(\frac{t}{t-1}\right)^2>0}). Let us simply take the one that minimizes the volume of the resulting ellipsoid! Note that

\displaystyle \frac{\mathrm{vol}(\mathcal{E}')}{\mathrm{vol}(\mathcal{E})} = \frac{1}{\sqrt{a}} \left(\frac{1}{\sqrt{b}}\right)^{n-1} = (1-t) \left(\frac{1}{\sqrt{1 - \left(\frac{t}{1-t}\right)^2}}\right)^{n-1} = \frac{1}{\sqrt{\frac{1}{(1-t)^2}\left (1 - \left(\frac{t}{1-t}\right)^2\right)^{n-1}}} = \frac{1}{\sqrt{f\left(\frac{1}{1-t}\right)}} ,

where {f(h) = h^2 (2 h - h^2)^{n-1}}. Elementary computations show that the maximum of {f} (on {[1,2]}) is attained at {h = 1+ \frac{1}{n}} (which corresponds to {t=\frac{1}{n+1}}), and the value is

\displaystyle \left(1+\frac{1}{n}\right)^2 \left(1 - \frac{1}{n^2} \right)^{n-1} \geq \exp \left(\frac{1}{n} \right),

where the lower bound follows again from elementary computations. This concludes the proof for the case where {\mathcal{E}} is the Euclidean ball. The general case can be analyzed very easily by noting that an arbitrary ellipsoid {\mathcal{E}} centered at {0} is the unit ball in some norm derived from a dot product {\langle \cdot, \cdot \rangle}. Then one can mimic all the computations described above using this dot product instead of the canonical inner product of {{\mathbb R}^n} (for instance {ww^{\top}} will be replaced by {w \otimes w}, ect). \Box

 

The ellipsoid method

Let {\mathcal{X} \subset {\mathbb R}^n} be a convex body, and {f : \mathcal{X} \rightarrow [-B,B]} be a continuous and convex function. Let {r, R>0} be such that {\mathcal{X}} is contained in an Euclidean ball of radius {R} (respectively it contains an Euclidean ball of radius {r}).

The ellipsoid method (invented in the 1970 by Shor, and Yudin and Nemirovski) produces a sequence of ellipsoids {\mathcal{E}_0, \mathcal{E}_1, \hdots, \mathcal{E}_t}, starting with {\mathcal{E}_0} being the Euclidean ball of radius {R} that contains {\mathcal{X}}. Let {c_0, c_1, \hdots, c_t} be the associated sequence of centers of the ellipsoids. We describe now how to construct {\mathcal{E}_{t+1}} based on {\mathcal{E}_t}. There are two different cases:

Case (a): {c_t \not\in \mathcal{X}}. Then by the Separation Theorem one can find a vector {w_t \in {\mathbb R}^n} such that {\mathcal{X} \subset \{x \in {\mathbb R}^n : c_t^{\top} w_t < x^{\top} w_t\}}. In that case {\mathcal{E}_{t+1}} is the ellipsoid given by our geometric lemma that contains the half ellipsoid {\mathcal{E}_t \cap \{x \in {\mathbb R}^n : c_t^{\top} w_t \leq x^{\top} w_t\}}.

Case (b): {c_t \in \mathcal{X}}. Then one can use a subgradient of {f} at {c_t}, denote it by {w_t}, such that:

\displaystyle f(c_t) - f(x) \leq w_t^{\top} (c_t - x) \Leftrightarrow f(x) \geq f(c_t) + w_t^{\top} (x - c_t) .

In particular remark that all points in the half space {\{x \in {\mathbb R}^n : c_t^{\top} w_t < x^{\top} w_t\}} have a function value strictly larger than the value at {c_t}. Thus in that case we choose {\mathcal{E}_{t+1}} to be the ellipsoid given by our geometric lemma that contains the half ellipsoid {\mathcal{E}_t \cap \{x \in {\mathbb R}^n : c_t^{\top} w_t \geq x^{\top} w_t\}}.

Output: Assume that we stop the method after {t} steps. Then we output

\displaystyle x^*_t = \mathrm{argmin}_{c \in \{c_1, \hdots, c_t\} \cap \mathcal{X}} f(c) .

Theorem (Yudin and Nemirovski 1972)

\displaystyle f(x^*_t) - \inf_{x \in \mathcal{X}} f(x) \leq \frac{2 B R}{r} \exp\left( - \frac{t}{2 n^2}\right) .

Proof: Let {x^* \in \mathcal{X}} be such that {f(x^*) = \inf_{x \in \mathcal{X}} f(x)}. It is clear from the description of the method that {x^* \in \mathcal{E}_t, \forall t}. Furthermore by the geometric lemma we also have

\displaystyle \mathrm{vol}(\mathcal{E}_t) \leq \exp\left( - \frac{t}{2 n}\right) \mathrm{vol}(\mathcal{E}_0) .

Now for {\epsilon \in [0,1]}, let

\displaystyle \mathcal{X}_{\epsilon} = \{(1-\epsilon) x^* + \epsilon x, x \in \mathcal{X}\}.

Clearly one has:

\displaystyle \mathrm{vol}(\mathcal{X}_{\epsilon}) = \epsilon^n \mathrm{vol}(\mathcal{X}) \geq \epsilon^n \frac{r^n}{R^n} \mathrm{vol}(\mathcal{E}_0) .

In particular for {\epsilon > \frac{R}{r} \exp\left( - \frac{t}{2 n^2}\right)}, one has {\mathrm{vol}(\mathcal{X}_{\epsilon}) > \mathrm{vol}(\mathcal{E}_t)}. This means that there must exist a time {s \in \{0,\hdots, t-1\}}, and {x_{\epsilon} \in \mathcal{X}_{\epsilon}}, such that {x_{\epsilon} \in \mathcal{E}_{s}} and {x_{\epsilon} \not\in \mathcal{E}_{s+1}}. The only way for this to happen, since {x_{\epsilon} \in \mathcal{X}}, is that at time {s} we were in Case (b) (see definition of the method), that is {c_s \in \mathcal{X}} and {f(c_s) \leq f(x_{\epsilon})}. But clearly by convexity of {f}, one also has {f(x_{\epsilon}) \leq f(x^*) + 2 \epsilon B}. This concludes the proof. \Box

 

Discussion

The ellipsoid method is based on three subroutines with the following properties:

  1. Separation algorithm: given {x \in {\mathbb R}^n}, it outputs either that {x} is in {\mathcal{X}}, or if {x \not\in \mathcal{X}} then it outputs a separating hyperplane between {x} and {\mathcal{X}}.
  2. Subgradient algorithm: Given {x \in \mathcal{X}}, it outputs a subgradient of {f} at {x}.
  3. Evaluation algorithm: Given {x \in \mathcal{X}}, it outputs the value of {f} at {x}.

To reach an accuracy of {\epsilon > 0}, the ellipsoid method must perform {T_{\epsilon} = 2 n^2 \log \left(\frac{2 B R}{\epsilon r}\right)} iterations. Each iteration results in a call to the separation algorithm, and potentially the subgradient and evaluation algorithms. Thus to reach an accuracy of {\epsilon} one makes at most {T_{\epsilon}} calls to each subroutine. Furthermore the algorithmic implementation of the geometric lemma takes {O(n^2)} ‘elementary operations’, which in total yields a computational complexity of {O(n^4 \log(1/\epsilon))} to reach accuracy {\epsilon} (assuming that a call to a subroutine takes at most {O(n^2)}). One can interpret this results as saying that to improve the quality of our solution value {f(x^*_t)} by one digit, one needs a computational effort of {O(n^4)}. In the literature an exponential rate of decrease for the optimization error (such as the one we proved here for the ellipsoid method) is sometimes call a linear rate, because the effort to increase the number of digits of accuracy is linear.

It is standard to call the subgradient algorithm a {1^{st}}-order oracle (respectively the evaluation algorithm a {0^{th}}-order oracle). These oracles are assumed to be given with the problem, and one usually counts the number of calls to these oracles, but their computational complexity is not directly taken into account. As we shall see later in this course, one can develop an almost complete theory for optimization if one focuses on the complexity in terms of number of calls to these oracles. Arguably however it is more natural to try to understand the computational complexity of optimization, but this is a much more difficult task (it will become clear with the incoming lectures).

This entry was posted in Optimization. Bookmark the permalink.

Leave a reply