Khachiyan’s ellipsoid method was such a breakthrough that it made it to The New York Times in 1979. The next time that optimization theory was featured in The New York Times was only a few years later, in 1984, and it was to report a new breakthrough: Karmakar’s Interior Point Method to solve LPs. Similarly to the ellipsoid method, it was shown that Karmakar’s algorithm is a polynomial time algorithm for LPs, but this time it was also shown that the method had very good empirical performances! We will now describe the idea behind this new class of algorithms, following the presentation of Nesterov and Nemirovski.
Interior point methods
Consider the following standard optimization problem:
Assume that we can construct a barrier for the set
, that is a real-valued function
defined on
and such that
Let us extend the function to
with
for
. Then, instead of the original problem, one could consider the following penalized version, for
:
Let be a solution to the above problem, we call the curve
the central path. Note that
is simply the minimizer of
, we call this point the analytical center of
. Remark also that
, that is the central path converges to the solution of our original problem.
The whole idea of interior point methods is to ‘boost’ an efficient locally convergent optimization algorithm, such as Newton’s method, with the following scheme: Assume that we have found for some
. Then let
, and use the locally convergent algorithm, initialized at
, to find
. In other words the idea is to move along the central path, using some basic algorithm to do small steps along the central path. The ‘basic algorithm’ that is used in interior point methods is the Newton’s method, because of its very fast local convergence.
To make this (beautiful) idea more formal we need to take care of several things:
- First we need to describe precisely the region of fast convergence for Newton’s method. This will lead us to define self-concordant functions, which are ‘natural’ functions for Newton’s method.
- Then we need to evaluate how much larger
can be compared to
, so that
is still in the region of fast convergence of Newton’s method when optimizing the function
(which is minimized at
). This will lead us to define
-self concordant barriers.
- Can we find easily the analytical center
of
?
Traditional analysis of Newton’s method
In this section we denote for both the Euclidean norm on
and the operator norm on matrices (in particular
).
Let be a
function. Newton’s method is a simple iterative optimization scheme. Using a Taylor’s expansion of
around
one obtains
Thus, starting at , in order to minimize
it seems natural to move in the direction
that minimizes
If is invertible then the solution to this problem is given by
. Thus Newton’s method simply iterates this idea, starting at some point
, and then
While this method can have an arbitrarily bad behavior in general, if started close enough to a strict local minimum of , it can have a very fast convergence:
Theorem 1 (Local quadratic convergence of Newton’s method) Assume that
has a Lipschitz Hessian, that is
. Let
be local minimum of
with strictly positive Hessian, that is
,
. Suppose that the initial starting point
of Newton’s method is such that
Then Newton’s method is well-defined and converges to
at a quadratic rate:
Proof: We use the following simple formula, for ,
Now note that , and thus with the above formula one obtains
which allows us to write:
In particular one has
Using the Lipschitz property of the Hessian one immediately obtains that
Using again the Lipschitz property of the Hessian, the hypothesis on , and an induction hypothesis that
, one has
which concludes the proof.
Self-concordant functions
Let be an invertible
matrix. We will use the following simple formulas:
Let us try to get some insight into the ‘geometry’ of Newton’s method by looking at the iteration that would be performed using the function . In other words, we view
as a mapping from
(the ‘
-space’) to
(the ‘
-space’), and the function
is defined on the ‘
-space’, so that its evaluation at a point
coincides with the evaluation of
at
. Assume that we start Newton’s method on
in the ‘
-space’ at
, and on
in the ‘
-space’ at
. The next iterations are then given by the following formulas:
Using (1) one can see in a few lines that , that is Newton’s method is affine invariant, it follows the same trajectory in the ‘
-space’ and the ‘
-space’!
This fact casts some concerns on the assumptions of the previous analysis: indeed the Lipschitz assumption on the Hessian is not affine invariant. The idea of self-concordance is to improve this situation.
Assume from now on that is
. Consider the third order differential operator
. Then the Lipschitz assumption on the Hessian that we used can be written as:
The left-hand side in the inequality is affine invariant, but not the right-hand side. A natural idea to make the right-hand side affine invariant is to replace the Euclidean metric by the metric given by the function itself at
, that is:
Definition 2 Let
be a convex set with non-empty interior, and
a
convex function defined on
. Then
is self-concordant (with constant
) if for all
,
We say that
is standard self-concordant if
is self-concordant with constant
.
An easy consequence of the definition is that a self-concordant function is a barrier for the set . The main example to keep in mind of a standard self-concordant function is
for
. The next definition will be key in order to describe the region of quadratic convergence for Newton’s method on self-concordant functions. Recall first that for an arbitrary norm
on
, its dual norm
is defined by
In particular, by definition, Hölder’s inequality holds: . Recall also that for a norm of the form
, the dual norm is given by
.
Definition 3 Let
be a standard self-concordant function on
. For
, we say that
is the Newton decrement of
at
.
We state the next theorem without a proof.
Theorem 4 Let
be a standard self-concordant function on
, and
such that
, then
In other words the above theorem states that, if initialized at a point such that
, then Newton’s iterates satisfy
. Thus, Newton’s region of quadratic convergence for self-concordant functions can be described as a ‘Newton decrement ball’
.
-self-concordant barriers
Let us go back to our central path idea. Assume that we have computed , where
and
is a standard self-concordant function on
. We want to use Newton’s method to compute
, starting at
. Thus, using the analysis of the previous section, we need to ensure that
Since the Hessian of is the Hessian of
, one has with
,
Now, by the first-order optimality condition, one has , which yields
Thus we can take , and according to the analysis of the previous section the Newton’s method will converge very rapidly to
. Now to obtain a reasonable optimization method overall, we need to make sure that we are increasing
fast enough. That is we need to control
. In particular we could do this if we had a uniform control over
. There is a natural class of functions for which such a control exist. This is the set of functions such that
Indeed in that case one has:
Thus a safe choice to increase the penalization parameter is . Note that the condition (2) can also be written as the fact that the function
is
-exp-concave, that is
is concave. We arrive at the following definition.
Definition 5
is a
-self-concordant barrier if it is a standard self-concordant function, and it is
-exp-concave.
Again the canonical example is the logarithmic function, , which is a
-self-concordant barrier for the set
. We state the next (difficult) theorem without a proof.
Theorem 6 Let
be a closed convex set with non-empty interior. There exists
which is a
-self-concordant barrier for
(where
is some universal constant).
A key property of -self-concordant barriers is the following inequality:
Path-following scheme
Let us recap what we have seen so far. The basic path-following scheme with a -self-concordant barrier
for
goes as follows. Assume that we can find
for some small value
. Then for
, let
Using the machinery that we developed it is not hard to analyze the above scheme. Indeed it is reasonable to expect that will be ‘close’ to
, and thus an inequality similar to (3) should hold for
. This is indeed the case, up to a multiplicate numerical constant. Furthermore, since one has
, this leads to the following bound:
In other words, with one can get an
-optimal point with the path-following scheme.
At this point we still need to explain how one can get close to an intial point of the central path. This can be done with a damped Newton method on
, which uses the following iterates:
The complexity of this step is roughly of the same order than the rest of the path-following scheme.
By Chenyi February 17, 2013 - 11:38 am
Excuse me, can you explain what is exact format of Definition 2 self-concordant “delta^3 f(x)[h,h,h]<=M||h||^3_x", so take f(x)=-log x as an example, does it means "(-2/x^3_k)*(x_(k+1)-x_k)^3<=M*sqrt(1/x^2_k*(x_(k+1)-x_k)^2)^3" ? But the LHS format is a little bit different from the previous Lipschitz assumption.
Thank you!
By Sebastien Bubeck February 18, 2013 - 1:26 pm
For
and
fixed, let
. Then, by definition,
Thus the standard self-concordant assumption is equivalent to
for all
and
.
In particular if the function
is defined over the reals, then the definition is
It is is immediate that the convex function
satisfies this.