ORF523: Lagrangian duality

In this post I review very basic facts on Lagrangian duality. The presentation is extracted from Chapter 5 of Boyd and Vandenberghe.

We consider the following problem:

$\displaystyle \inf_{x \in {\mathbb R}^n : f_i(x) \leq 0, h_j(x) = 0, i=1,\hdots,m, j=1,\hdots,p} f(x) . \ \ \ \ \ (1)$

Let ${\mathcal{D}}$ be the non-empty domain of definition for this problem (that is the intersection of the domain of definitions of ${f, f_1, \hdots, f_m, h_1, \hdots, h_p}$). For ${x \in \mathcal{D}, \lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p}$, the Lagrangian associated to the above problem is defined by

$\displaystyle L(x, \lambda, \nu) = f(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) .$

Weak duality

First note that, if ${x \in \mathcal{D}}$ is not a valid point for the primal problem (1), then

$\displaystyle \sup_{\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p} L(x, \lambda, \nu) = + \infty,$

whereas for ${x \in \mathcal{D}}$ which is valid point for (1) one has

$\displaystyle \sup_{\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p} L(x, \lambda, \nu) = f(x) .$

Thus one can conclude that

$\displaystyle \inf_{x\in \mathcal{D}} \sup_{\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p} L(x, \lambda, \nu) = \inf_{x \in {\mathbb R}^n : f_i(x) \leq 0, h_j(x) = 0, i=1,\hdots,m, j=1,\hdots,p} f(x) . \ \ \ \ \ (2)$

On the other hand note that for any ${x \in \mathcal{D}}$ which is valid point for (1), and ${\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p}$, one also has

$\displaystyle L(x, \lambda, \nu) \leq f(x) ,$

so that

$\displaystyle \inf_{x \in \mathcal{D}} L(x, \lambda, \nu) \leq \inf_{x \in {\mathbb R}^n : f_i(x) \leq 0, h_j(x) = 0, i=1,\hdots,m, j=1,\hdots,p} f(x),$

which also implies

$\displaystyle \sup_{\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p} \inf_{x \in \mathcal{D}} L(x, \lambda, \nu) \leq \inf_{x \in {\mathbb R}^n : f_i(x) \leq 0, h_j(x) = 0, i=1,\hdots,m, j=1,\hdots,p} f(x) . \ \ \ \ \ (3)$

Putting together (2) and (3) we arrive at the statement of weak duality:

$\displaystyle \sup_{\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p} \inf_{x \in \mathcal{D}} L(x, \lambda, \nu) \leq \inf_{x \in \mathcal{D}} \sup_{\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p} L(x, \lambda, \nu). \ \ \ \ \ (4)$

As we have seen in (2), the right-hand side of the above inequality corresponds to the primal problem given in (1). We now refer to the left-hand side as the dual problem, which corresponds to the maximization over ${\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p}$ of the following function:

$\displaystyle g(\lambda, \nu) := \inf_{x \in \mathcal{D}} L(x, \lambda, \nu) . \ \ \ \ \ (5)$

Strong duality

Weak duality always holds true (note that we did not make any convexity assumption so far), it corresponds to the fact that a ‘max-min’ is always smaller than a ‘min-max’. We say that strong duality holds if the inequality (4) holds with equality. The next two results give different conditions under which a ‘max-min’ is equal to a ‘min-max’.

Theorem 1 Assume that ${f, f_1, \hdots, f_m}$ are convex functions, ${h_1, \hdots, h_p}$ are affine functions, and that there exists a point ${x}$ in the relative interior of the convex set ${\mathcal{D}}$ (that is, in the interior of ${\mathcal{D}}$ when ${\mathcal{D}}$ is viewed as a subset of the affine subspace generated by ${\mathcal{D}}$) such that

$\displaystyle f_i(x) < 0, i =1,\hdots, m \; \text{(Slater's condition)} .$

Then strong duality holds, that is (4) holds with equality.

Theorem 2 (Sion’s minimax) Let ${\mathcal{X} \subset {\mathbb R}^n, \mathcal{Y} \subset {\mathbb R}^m}$ be convex sets such that one of them is compact. Let ${\phi : \mathcal{X} \times \mathcal{Y} \rightarrow {\mathbb R}}$ be a continuous function such that ${\phi(\cdot, y)}$ is convex and ${\phi(x, \cdot)}$ is concave. Then:

$\displaystyle \sup_{y \in \mathcal{Y}} \inf_{x \in \mathcal{X}} \phi(x,y) = \inf_{x \in \mathcal{X}} \sup_{y \in \mathcal{Y}} \phi(x,y) .$

KKT conditions

Now let us assume that we are in the conditions of Theorem 1 (with convex objective, convex inequalities constraints, affine equality constraints, and Slater’s condition). Furthermore we assume from now on that all functions in play are differentiable.

Let ${x^*}$ be a primal solution (that is a solution to the optimization problem (1)), and let ${(\lambda^*, \nu^*)}$ be a dual solution (that is a solution of the maximization of ${g(\lambda, \nu)}$ defined in (5) over ${\lambda \in {\mathbb R}_+^m, \nu \in {\mathbb R}^p}$).

The following holds true (the first line follows by strong duality, the second by definition, the third is trivial, and the last one by the fact that ${x^*}$ and ${(\lambda^*, \nu^*)}$ are valid points for their respective optimization problems):

$\displaystyle \begin{array}{rcl} f(x^*) & = & g(\lambda^*, \nu^*) \\ & = & \inf_{x \in \mathcal{D}} L(x, \lambda^*, \nu^*) \\ & \leq & L(x^*, \lambda^*, \nu^*) \\ & \leq & f(x^*) . \end{array}$

Thus both inequalities are in fact equalities. For the second one this implies that ${\sum_{i=1}^m \lambda_i^* f_i(x^*) = 0}$, but since each term is non-positive, this means

$\displaystyle \forall i =1, \hdots, m, \lambda_i f_i(x^*) = 0 \; \text{(Complementary slackness at optimum).}$

On the other hand the first inequality now says that ${x^*}$ is a minimizer over ${\mathcal{D}}$ of ${L(x, \lambda^*, \nu^*)}$, thus by first order optimality one has

$\displaystyle \nabla f(x^*) + \sum_{i=1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i=1}^p \nu_i^* \nabla h_i(x^*) = 0 .$

Complementary slackness, together with the above equality and the fact that ${x^*}$ and ${(\lambda^*, \nu^*)}$ are valid points for their respective optimization problems are called the KKT (Karush-Kuhn-Tucker) conditions. We just proved that these conditions are necessary (and for this we did not need convexity, just strong duality), but in fact (thanks to convexity) the KKT conditions are also sufficient for optimality.

Duality gap

By weak duality one always has ${g(\lambda, \nu) \leq f(x^*)}$, and thus

$\displaystyle f(x) - f(x^*) \leq f(x) - g(\lambda, \nu) .$

The right-hand side in the above inequality is called the duality gap, and it gives an upper bound on how suboptimal is the point ${x}$ for the primal problem. In other words a dual point ${(\lambda, \nu)}$ gives a certificate on how good is a primal point ${x}$.

Interior Point Methods can be turned into an optimization procedure for both the primal and the dual, by tracing a central path in each problem. This is particularly useful, as by doing so one can evaluate the suboptimality gap in the primal at any given time.

This entry was posted in Optimization. Bookmark the permalink.