Khachiyan’s ellipsoid method was such a breakthrough that it made it to The New York Times in 1979. The next time that optimization theory was featured in The New York Times was only a few years later, in 1984, and it was to report a new breakthrough: Karmakar’s Interior Point Method to solve LPs. Similarly to the ellipsoid method, it was shown that Karmakar’s algorithm is a polynomial time algorithm for LPs, but this time it was also shown that the method had very good empirical performances! We will now describe the idea behind this new class of algorithms, following the presentation of Nesterov and Nemirovski.
Interior point methods
Consider the following standard optimization problem:
Assume that we can construct a barrier for the set , that is a real-valued function defined on and such that
Let us extend the function to with for . Then, instead of the original problem, one could consider the following penalized version, for :
Let be a solution to the above problem, we call the curve the central path. Note that is simply the minimizer of , we call this point the analytical center of . Remark also that , that is the central path converges to the solution of our original problem.
The whole idea of interior point methods is to ‘boost’ an efficient locally convergent optimization algorithm, such as Newton’s method, with the following scheme: Assume that we have found for some . Then let , and use the locally convergent algorithm, initialized at , to find . In other words the idea is to move along the central path, using some basic algorithm to do small steps along the central path. The ‘basic algorithm’ that is used in interior point methods is the Newton’s method, because of its very fast local convergence.
To make this (beautiful) idea more formal we need to take care of several things:
- First we need to describe precisely the region of fast convergence for Newton’s method. This will lead us to define self-concordant functions, which are ‘natural’ functions for Newton’s method.
- Then we need to evaluate how much larger can be compared to , so that is still in the region of fast convergence of Newton’s method when optimizing the function (which is minimized at ). This will lead us to define -self concordant barriers.
- Can we find easily the analytical center of ?
Traditional analysis of Newton’s method
In this section we denote for both the Euclidean norm on and the operator norm on matrices (in particular ).
Let be a function. Newton’s method is a simple iterative optimization scheme. Using a Taylor’s expansion of around one obtains
Thus, starting at , in order to minimize it seems natural to move in the direction that minimizes
If is invertible then the solution to this problem is given by . Thus Newton’s method simply iterates this idea, starting at some point , and then
While this method can have an arbitrarily bad behavior in general, if started close enough to a strict local minimum of , it can have a very fast convergence:
Theorem 1 (Local quadratic convergence of Newton’s method) Assume that has a Lipschitz Hessian, that is . Let be local minimum of with strictly positive Hessian, that is , . Suppose that the initial starting point of Newton’s method is such that
Then Newton’s method is well-defined and converges to at a quadratic rate:
Proof: We use the following simple formula, for ,
Now note that , and thus with the above formula one obtains
which allows us to write:
In particular one has
Using the Lipschitz property of the Hessian one immediately obtains that
Using again the Lipschitz property of the Hessian, the hypothesis on , and an induction hypothesis that , one has
which concludes the proof.
Let us try to get some insight into the ‘geometry’ of Newton’s method by looking at the iteration that would be performed using the function . In other words, we view as a mapping from (the ‘-space’) to (the ‘-space’), and the function is defined on the ‘-space’, so that its evaluation at a point coincides with the evaluation of at . Assume that we start Newton’s method on in the ‘-space’ at , and on in the ‘-space’ at . The next iterations are then given by the following formulas:
Using (1) one can see in a few lines that , that is Newton’s method is affine invariant, it follows the same trajectory in the ‘-space’ and the ‘-space’!
This fact casts some concerns on the assumptions of the previous analysis: indeed the Lipschitz assumption on the Hessian is not affine invariant. The idea of self-concordance is to improve this situation.
Assume from now on that is . Consider the third order differential operator . Then the Lipschitz assumption on the Hessian that we used can be written as:
The left-hand side in the inequality is affine invariant, but not the right-hand side. A natural idea to make the right-hand side affine invariant is to replace the Euclidean metric by the metric given by the function itself at , that is:
Definition 2 Let be a convex set with non-empty interior, and a convex function defined on . Then is self-concordant (with constant ) if for all ,
We say that is standard self-concordant if is self-concordant with constant .
An easy consequence of the definition is that a self-concordant function is a barrier for the set . The main example to keep in mind of a standard self-concordant function is for . The next definition will be key in order to describe the region of quadratic convergence for Newton’s method on self-concordant functions. Recall first that for an arbitrary norm on , its dual norm is defined by
In particular, by definition, Hölder’s inequality holds: . Recall also that for a norm of the form , the dual norm is given by .
Definition 3 Let be a standard self-concordant function on . For , we say that is the Newton decrement of at .
We state the next theorem without a proof.
Theorem 4 Let be a standard self-concordant function on , and such that , then
In other words the above theorem states that, if initialized at a point such that , then Newton’s iterates satisfy . Thus, Newton’s region of quadratic convergence for self-concordant functions can be described as a ‘Newton decrement ball’ .
Let us go back to our central path idea. Assume that we have computed , where and is a standard self-concordant function on . We want to use Newton’s method to compute , starting at . Thus, using the analysis of the previous section, we need to ensure that
Since the Hessian of is the Hessian of , one has with ,
Now, by the first-order optimality condition, one has , which yields
Thus we can take , and according to the analysis of the previous section the Newton’s method will converge very rapidly to . Now to obtain a reasonable optimization method overall, we need to make sure that we are increasing fast enough. That is we need to control . In particular we could do this if we had a uniform control over . There is a natural class of functions for which such a control exist. This is the set of functions such that
Indeed in that case one has:
Thus a safe choice to increase the penalization parameter is . Note that the condition (2) can also be written as the fact that the function is -exp-concave, that is is concave. We arrive at the following definition.
Definition 5 is a -self-concordant barrier if it is a standard self-concordant function, and it is -exp-concave.
Again the canonical example is the logarithmic function, , which is a -self-concordant barrier for the set . We state the next (difficult) theorem without a proof.
Theorem 6 Let be a closed convex set with non-empty interior. There exists which is a -self-concordant barrier for (where is some universal constant).
Let us recap what we have seen so far. The basic path-following scheme with a -self-concordant barrier for goes as follows. Assume that we can find for some small value . Then for , let
Using the machinery that we developed it is not hard to analyze the above scheme. Indeed it is reasonable to expect that will be ‘close’ to , and thus an inequality similar to (3) should hold for . This is indeed the case, up to a multiplicate numerical constant. Furthermore, since one has , this leads to the following bound:
In other words, with one can get an -optimal point with the path-following scheme.
At this point we still need to explain how one can get close to an intial point of the central path. This can be done with a damped Newton method on , which uses the following iterates:
The complexity of this step is roughly of the same order than the rest of the path-following scheme.