Khachiyan’s ellipsoid method was such a breakthrough that it made it to The New York Times in 1979. The next time that optimization theory was featured in The New York Times was only a few years later, in 1984, and it was to report a new breakthrough: Karmakar’s Interior Point Method to solve LPs. Similarly to the ellipsoid method, it was shown that Karmakar’s algorithm is a polynomial time algorithm for LPs, but this time it was also shown that the method had very good empirical performances! We will now describe the idea behind this new class of algorithms, following the presentation of Nesterov and Nemirovski.

**Interior point methods**

Consider the following standard optimization problem:

Assume that we can construct a *barrier* for the set , that is a real-valued function defined on and such that

Let us extend the function to with for . Then, instead of the original problem, one could consider the following penalized version, for :

Let be a solution to the above problem, we call the curve the *central path*. Note that is simply the minimizer of , we call this point the *analytical center* of . Remark also that , that is the central path converges to the solution of our original problem.

The whole idea of interior point methods is to ‘boost’ an efficient locally convergent optimization algorithm, such as Newton’s method, with the following scheme: Assume that we have found for some . Then let , and use the locally convergent algorithm, initialized at , to find . In other words the idea is to move along the central path, using some basic algorithm to do small steps along the central path. The ‘basic algorithm’ that is used in interior point methods is the Newton’s method, because of its very fast local convergence.

To make this (beautiful) idea more formal we need to take care of several things:

- First we need to describe precisely the region of fast convergence for Newton’s method. This will lead us to define self-concordant functions, which are ‘natural’ functions for Newton’s method.
- Then we need to evaluate how much larger can be compared to , so that is still in the region of fast convergence of Newton’s method when optimizing the function (which is minimized at ). This will lead us to define -self concordant barriers.
- Can we find easily the analytical center of ?

**Traditional analysis of Newton’s method**

In this section we denote for both the Euclidean norm on and the operator norm on matrices (in particular ).

Let be a function. Newton’s method is a simple iterative optimization scheme. Using a Taylor’s expansion of around one obtains

Thus, starting at , in order to minimize it seems natural to move in the direction that minimizes

If is invertible then the solution to this problem is given by . Thus Newton’s method simply iterates this idea, starting at some point , and then

While this method can have an arbitrarily bad behavior in general, if started close enough to a strict local minimum of , it can have a very fast convergence:

Theorem 1 (Local quadratic convergence of Newton’s method)Assume that has a Lipschitz Hessian, that is . Let be local minimum of with strictly positive Hessian, that is , . Suppose that the initial starting point of Newton’s method is such that

Then Newton’s method is well-defined and converges to at a quadratic rate:

*Proof:* We use the following simple formula, for ,

Now note that , and thus with the above formula one obtains

which allows us to write:

In particular one has

Using the Lipschitz property of the Hessian one immediately obtains that

Using again the Lipschitz property of the Hessian, the hypothesis on , and an induction hypothesis that , one has

which concludes the proof.

**Self-concordant functions**

Let be an invertible matrix. We will use the following simple formulas:

Let us try to get some insight into the ‘geometry’ of Newton’s method by looking at the iteration that would be performed using the function . In other words, we view as a mapping from (the ‘-space’) to (the ‘-space’), and the function is defined on the ‘-space’, so that its evaluation at a point coincides with the evaluation of at . Assume that we start Newton’s method on in the ‘-space’ at , and on in the ‘-space’ at . The next iterations are then given by the following formulas:

Using (1) one can see in a few lines that , that is Newton’s method is *affine invariant*, it follows the same trajectory in the ‘-space’ and the ‘-space’!

This fact casts some concerns on the assumptions of the previous analysis: indeed the Lipschitz assumption on the Hessian is not affine invariant. The idea of self-concordance is to improve this situation.

Assume from now on that is . Consider the third order differential operator . Then the Lipschitz assumption on the Hessian that we used can be written as:

The left-hand side in the inequality is affine invariant, but not the right-hand side. A natural idea to make the right-hand side affine invariant is to replace the Euclidean metric by the metric given by the function itself at , that is:

Definition 2Let be a convex set with non-empty interior, and a convex function defined on . Then is self-concordant (with constant ) if for all ,

We say that is standard self-concordant if is self-concordant with constant .

An easy consequence of the definition is that a self-concordant function is a barrier for the set . The main example to keep in mind of a standard self-concordant function is for . The next definition will be key in order to describe the region of quadratic convergence for Newton’s method on self-concordant functions. Recall first that for an arbitrary norm on , its dual norm is defined by

In particular, by definition, Hölder’s inequality holds: . Recall also that for a norm of the form , the dual norm is given by .

Definition 3Let be a standard self-concordant function on . For , we say that is theNewton decrementof at .

We state the next theorem without a proof.

Theorem 4Let be a standard self-concordant function on , and such that , then

In other words the above theorem states that, if initialized at a point such that , then Newton’s iterates satisfy . Thus, Newton’s region of quadratic convergence for self-concordant functions can be described as a ‘Newton decrement ball’ .

**-self-concordant barriers**

Let us go back to our central path idea. Assume that we have computed , where and is a standard self-concordant function on . We want to use Newton’s method to compute , starting at . Thus, using the analysis of the previous section, we need to ensure that

Since the Hessian of is the Hessian of , one has with ,

Now, by the first-order optimality condition, one has , which yields

Thus we can take , and according to the analysis of the previous section the Newton’s method will converge very rapidly to . Now to obtain a reasonable optimization method overall, we need to make sure that we are increasing fast enough. That is we need to control . In particular we could do this if we had a uniform control over . There is a natural class of functions for which such a control exist. This is the set of functions such that

Indeed in that case one has:

Thus a safe choice to increase the penalization parameter is . Note that the condition (2) can also be written as the fact that the function is -exp-concave, that is is concave. We arrive at the following definition.

Definition 5is a -self-concordant barrier if it is a standard self-concordant function, and it is -exp-concave.

Again the canonical example is the logarithmic function, , which is a -self-concordant barrier for the set . We state the next (difficult) theorem without a proof.

Theorem 6Let be a closed convex set with non-empty interior. There exists which is a -self-concordant barrier for (where is some universal constant).

A key property of -self-concordant barriers is the following inequality:

**Path-following scheme**

Let us recap what we have seen so far. The basic path-following scheme with a -self-concordant barrier for goes as follows. Assume that we can find for some small value . Then for , let

Using the machinery that we developed it is not hard to analyze the above scheme. Indeed it is reasonable to expect that will be ‘close’ to , and thus an inequality similar to (3) should hold for . This is indeed the case, up to a multiplicate numerical constant. Furthermore, since one has , this leads to the following bound:

In other words, with one can get an -optimal point with the path-following scheme.

At this point we still need to explain how one can get close to an intial point of the central path. This can be done with a *damped Newton method* on , which uses the following iterates:

The complexity of this step is roughly of the same order than the rest of the path-following scheme.

## By Chenyi February 17, 2013 - 11:38 am

Excuse me, can you explain what is exact format of Definition 2 self-concordant "delta^3 f(x)[h,h,h]<=M||h||^3_x", so take f(x)=-log x as an example, does it means "(-2/x^3_k)*(x_(k+1)-x_k)^3<=M*sqrt(1/x^2_k*(x_(k+1)-x_k)^2)^3" ? But the LHS format is a little bit different from the previous Lipschitz assumption.

Thank you!

## By Sebastien Bubeck February 18, 2013 - 1:26 pm

For and fixed, let . Then, by definition, Thus the standard self-concordant assumption is equivalent to for all and .

In particular if the function is defined over the reals, then the definition is It is is immediate that the convex function satisfies this.