ORF523: IPMs for LPs and SDPs

We have seen that, roughly, the complexity of Interior Point Methods to optimize linear functions on a convex set {\mathcal{X}} (with non-empty interior) with a {\nu}-self-concordant barrier {F} is of order {O\left(M \sqrt{\nu} \log \frac{\nu}{\epsilon} \right)} where {M} is the complexity of computing the Newton direction (which can be done by computing and inverting the Hessian of {F}). Thus the efficiency of the method is directly related to the form of the self-concordant barrier that we can construct for {\mathcal{X}}. A few rules allows us to construct simple self-concordant barriers for the important cases of LPs and SDPs:

Proposition 7 Let {F_1} be a {\nu_1}-self-concordant barrier on {\mathcal{X}_1}, and {F_2} a {\nu_2}-self-concordant barrier on {\mathcal{X}_2}. Then {F_1 + F_2} is a {(\nu_1+\nu_2)}-self-concordant barrier on {\mathcal{X}_1 \cap \mathcal{X}_2}.

 

Proposition 8 Let {\mathcal{A}} be a linear map from {{\mathbb R}^n} to {{\mathbb R}^m}. Let {F} be a {\nu}-self-concordant barrier on {\mathcal{X} \subset {\mathbb R}^m}. Then {F \circ \mathcal{A}} is a {\nu}-self-concordant barrier on {\mathcal{A}^{-1}(\mathcal{X}) = \{x \in {\mathbb R}^n : \mathcal{A}(x) \in \mathcal{X}\} \subset {\mathbb R}^n}.

 

Proposition 9 {F(x) = - \sum_{i=1}^n \log x_i} is an {n}-self-concordant barrier on {{\mathbb R}_{+}^n}, and {F(x) = - \log \mathrm{det}(X)} is an {n}-self-concordant barrier on {\mathbb{S}_{+}^n}.

Note that the only non-trivial statement in the above propositions is the fact that {- \log \mathrm{det}(X)} is an {n}-self concordant barrier on {\mathbb{S}_{+}^n}.

Let us see how to use these rules to construct a self-concordant barrier for LPs. First consider the following LP, which does not contain equality constraints:

\displaystyle \min_{x \in {\mathbb R}^n : A x < b} c^{\top} x, \ \text{where} \ A \in {\mathbb R}^{m \times n}, b \in {\mathbb R}^m .

We need to construct a self-concordant barrier on {\{x \in {\mathbb R}^n : A x < b\}}. Consider the linear map {\mathcal{A}(x) = b - A x}, then clearly {\mathcal{A}^{-1}({\mathbb R}^m_{++}) = \{x \in {\mathbb R}^n : A x < b\}}, and thus using the composition rule and the fact that {F(x) = - \sum_{i=1}^m \log x_i} is an {m}-self-concordant barrier on {{\mathbb R}_{++}^n}, we obtain that {F \circ \mathcal{A} (x) = - \sum_{i=1}^m \log(b_i - A_i x)} is an {m}-self-concordant barrier on {\{x \in {\mathbb R}^n : A x < b\}}. Computing the Hessian takes {O(m n^2)}, and inverting it takes {O(n^3)}. Thus the total complexity of IPM for this problem is {\tilde{O}(\max(m,n) \sqrt{m} n^{2})}. In particular compared to the ellipsoid method we have a speed-up of {O(n/\sqrt{m})}, and thus (at least theoretically) IPM is more interesting than the ellipsoid method for this problem when {m=O(n^2)}.

 

Why did we restrict ourselves to LPs without equality constraints? Because in that case the set {\{x \in {\mathbb R}^n : A x < b\}} is either empty, or it has a non-empty interior in {{\mathbb R}^n}, and thus the whole theory that we developed above applies directly. If we consider an LP which may contain equality constraints, that is the constraint set is of the form {\mathcal{X} = \{x \in {\mathbb R}^n : A x \leq b\}}, then {\mathcal{X}} might have an empty interior in {{\mathbb R}^n}. In that case the trick is to consider the subspace generated by the equality constraints, and do all the computations in that subspace (this will change the evaluation of the Newton direction!). Let us see more precisely how this works. We consider now the general LP:

\displaystyle \min_{x \in {\mathbb R}^n : A x \leq b} c^{\top} x, \ \text{where} \ A \in {\mathbb R}^{m \times n}, b \in {\mathbb R}^m . \ \ \ \ \ (4)

Clearly by introducing slack variables this problem can be rewritten in the following form:

\displaystyle \min_{x \in {\mathbb R}_+^{n} : A x = b} c^{\top} x, \ \text{where} \ A \in {\mathbb R}^{m \times n}, m < n \; \text{and rows of} \; A \; \text{linearly independent}, b \in {\mathbb R}^m . \ \ \ \ \ (5)

Note that {n} (respectively {m}) in (5) corresponds to {2n+m} (respectively to {m}) in (4). Now to solve (5) we can use the restriction of the function {F(x) = - \sum_{i=1}^{n} \log x_i} to the subspace {\mathcal{L}=\{x \in {\mathbb R}^{n}: A x = b\}} which is an {n}-self-concordant barrier on {\{x\in {\mathbb R}^{n}_+ : A x = b\}}. To do everything in a clean way, we should reparametrize {\mathcal{L}} as {{\mathbb R}^{\mathrm{dim}(\mathcal{L})}}, and do the computations of gradients and Hessians of {F_t} with this reparametrization. While this would be cleaner, there is an equivalent way to do this, which is more powerful in terms of implementation. The idea is that when we are looking for a direction {h} to perform a Newton step on some function {f} defined on a subset of {{\mathbb R}^n} with non-empty interior, but that we are trying to optimize {f} over a subspace {\mathcal{L}}, we can simply restrict our attention to directions that end up in {\mathcal{L}}. This trick allows us to differentiate {f} in {{\mathbb R}^n}, which is more elegant. The Newton direction at {x \in \mathcal{L}} is thus given by:

\displaystyle \mathrm{argmin}_{h : x+h \in \mathcal{L}} \nabla f(x)^{\top} h + \frac12 h^{\top} \nabla^2 f(x) h .

In other words a Newton step requires to solve an optimization problem of the form:

\displaystyle \min_{x \in {\mathbb R}^n : A x = 0} d^{\top} x + \frac12 x^{\top} B x , \ \ \ \ \ (6)

where {A \in {\mathbb R}^{m \times n}}, {m < n} and rows of {A} linearly independent, and {B \in \mathbb{S}_{++}^n}. Using the KKT conditions, we obtain that at a pair of primal-dual optimal points {(x,\lambda)} the gradient of the Lagrangian {L(x,\lambda) = d^{\top} x + \frac12 x^{\top} B x + \lambda^{\top} A x} must be zero, which means {- (\lambda^{\top} A)^{\top} = d + B x}, that is {x = - B^{-1} (A^{\top} \lambda + d) }. Furthermore we also have that the point {x} must be valid, that is {A x = 0}, which can be rewritten as {A B^{-1} (A^{\top} \lambda + d) = 0}. Thus we can solve (6) with {\lambda = - [A B^{-1} A^{\top}]^{-1} A B^{-1} d} (note that {A B^{-1} A^{\top} \in \mathbb{S}_{++}^m}) and {x = - B^{-1} (A^{\top} \lambda + d)}, and these computations can be done in {O(n^3)} (since {m < n}). Thus we arrive at a complexity of {O(n^{3.5})} to solve (5), which corresponds to a complexity of {O(\max(m,n)^{3.5})} to solve (4) (slightly better results can be obtained by being more careful).

 

For the general SDP:

\displaystyle \min_{X \in \mathbb{S}^n_+ : \langle X, A_i \rangle = b_i, i \in \{1, \hdots, m\}} \langle X, C \rangle , \ \text{where} \ A_1, \hdots, A_m, C \in {\mathbb R}^{n \times n}, b \in {\mathbb R}^m ,

one can use the restriction of {F(X) = - \log \mathrm{det}(X)} to \{X \in {\mathbb R}^{n \times n} : \langle X, A_i \rangle = b_i, i \in \{1, \hdots, m\}\}, which is {n}-self-concordant on the set of constraints in the above SDP. One needs to work out the details for the gradients and Hessian of {F}, but eventually one can get to an overall complexity of {O(\max(m,n) m n^{2.5})}, which is much more reasonable than what we obtained with the ellipsoid method!

This entry was posted in Optimization. Bookmark the permalink.

2 Responses to "ORF523: IPMs for LPs and SDPs"

Leave a reply