In optimization we are ‘given’ a set , and a real-valued function
defined on this set. The objective is to find a point
which minimizes
on
.
This goal is ambitious. It encompasses an enormous variety of problems, from engineering to politics. Here is an example that immediately suggests the difficulty of the problem: building an AI (Artificial Intelligence) for a game is an optimization problem, since one wants to find a strategy that maximizes the chances of winning.
In this course we are going to explore the frontiers of what we know about mathematical optimization. We will discuss both positive results: how can you solve such and such optimization problems; but also negative ones with statements like: ‘this family of problems is too hard to be solved in reasonable time’. The course will be divided into three parts, each one explores a different aspect of mathematical optimization.
Part 1: Computational complexity of optimization
The idea here is to look at optimization from the most fundamental perspective: how many ‘elementary operations’ do I need to do to find ? We will see that this question is very far from being resolved, and that there might be good reasons why this is such a difficult question. Here is sample of contents that will go into this chapter:
- The ellipsoid method
- Conic Programming and Interior point methods
- P vs. NP
- SDP relaxation of MAX-CUT
- Extended formulations and Yannakakis’ Theorem
- Dynamic programming
Part 2: Oracle complexity of optimization
The first part of the course focuses on the complexity in terms of ‘elementary operations’. Here we take another point of view: we simply count the number of ‘questions’ we ask about the function . More formally we assume that we have an oracle that can give us the value and gradient of
at any point
. The oracle complexity is then the number of calls to this oracle one needs to do to find
. While being somehow less fundamental than the computational complexity, the oracle complexity allows for an almost ‘complete’ theory. Here is a partial list of topics for this part:
- Oracle complexity of Lipschitz optimization
- Gradient Descent for smooth convex optimization (respectively smooth and strongly convex)
- Oracle complexity lower bounds for smooth convex optimization (respectively smooth and strongly convex)
- Optimal methods: Nesterov’s Accelerated Gradient Descent, FISTA
- Mirror Descent
- Mirror Descent for saddle points
- Coordinate Descent, Conditional Gradient Method (aka Frank-Wolfe)
Part 3: Statistical complexity of stochastic optimization
In many situations one does not have a perfect knowledge about the function . In this last part of the course we explore what happens when the oracle gives noisy responses. Partial list of topics:
- Mirror Descent for stochastic optimization
- Lower bounds for stochastic optimization
- Randomized methods for deterministic optimization
order stochastic optimization
In most of this course we focus on convex set of constraints , and convex functions
. I would like here to review basic definitions and results related to convexity. I will also show why, ‘in some sense’, we can restrict our attention to convexity without loss of generality.
Convex sets
Definition 1
is convex if for all pair of points
, the segment
is included in
, that is
is a convex body if it is convex, compact, and with non-empty interior.
Definition 2 The convex hull
of
is defined as the set of convex combinations of points from
, that is
It is easy to check that the convex hull can also be defined as the minimal (for inclusion) convex set that contains . A basic result about convex sets that we shall use extensively is the Separation Theorem.
Theorem 3 (Separation Theorem) Let
be a closed convex set, and
. Then, there exists
and
such that
Note that if is not closed then one can only guarantee that
(and
). A related result is the Supporting Hyperplane Theorem.
Theorem 4 (Supporting Hyperplane Theorem) Let
be a convex set, and
. Then, there exists
such that
Convex functions
Definition 5 Let
be convex, and
.
is convex if it always lies below its chords, that is
It is obvious that a function is convex if and only if its epigraph
is a convex set.
Definition 6 Let
, and
. Then
is a subgradient of
at
if for any
one has
The set of subgradients of
at
is denoted
.
Proposition 7 Let
be convex, and
. If
then
is convex. Conversely if
is convex then for any
. Furthermore if
is convex and differentiable at
then
.
Proof: The first claim is almost trivial: let , then by definition one has
which clearly shows that is convex by adding the two (appropriately rescaled) inequalities.
Now let us prove that a convex function has subgradients in the interior of
. We build a subgradient by using a supporting hyperplane to the epigraph of the function. Let
. Then clearly
, and
is a convex set. Thus by using the Supporting Hyperplane Theorem, there exists
such that
Clearly, by letting tends to infinity, one can see that
. Now let us assume that
is in the interior of
. Then for
small enough,
, which implies that
cannot be equal to
. Thus rewriting (1) for
one obtains
Thus which concludes the proof of the second claim.
Finally let be a convex and differentiable function. Then by definition:
which shows that .
Why convexity?
This section is neither meant to provide a formal argument, nor to give an algorithmic idea. However it shows that, intuitively, assuming that and
are convex comes ‘without loss of generality’. In particular this should be taken as a word of caution: an optimization problem with convex constraints and convex objective is not easy in general. Somehow the difficulty of the problem is encoded in the geometry of
: if
is ‘nice’, then minimizing convex objectives over
is ‘easy’. The rest of the class will make this statement much more precise. Let us now show why convexity is ‘universal’. First note that we can do the following rewriting of our problem:
and a solution to the latter problem immediately gives a solution to the former. Thus we have already reduced the case of a general function
, to a simple linear function of the form
for some
. Now remark that we always have
This follows from the following computation:
Here a solution to gives a solution to
if one is computationally able to decompose
into a convex combination of points in
(this can be a very difficult task in some cases!). In any case we showed that
and thus we reduced an arbitrary optimization problem to a linear optimization over a convex set.
By Kunal Chaudhury March 11, 2015 - 6:41 am
Nice post!
I was wondering if the requiring X to have a non-empty interior is somewhat restrictive. It appears that the analysis on this page (and the subsequent analysis of the ellipsoid method) would go through (with minor changes) if one required X to have a non-empty relative interior instead.
By mbkv September 28, 2013 - 6:07 am
I got here from Amir’s guest post on SOS. First of all, thanks to your nice post. So,the take-home point from this is that “Any arbitrary optimization problem, regardless of the nature of the objective f(x) we want to minimlze and regardless of the nature of the set(the set can be semialgebraic?) over which we want to minimize, can be reduced to a linear optimization over a convex set”.
Have i understood and stated it correctly?
By Sebastien Bubeck September 28, 2013 - 2:25 pm
Yes this is correct, but note that the reduction is absolutely non-algorithmic, this is only a formal rewriting!
By haipeng February 12, 2013 - 10:10 am
Proposition 7 says that every interior point has subgradients for a convex function, what about points on the boundary?
By Sebastien Bubeck February 12, 2013 - 4:20 pm
Convex functions are really nice in the interior of the domain, but they can have basically an arbitrary behavior on the boundary. Think for example of a convex function defined on a disk
. The only constraint for the function on the circle
is basically that it has to take values larger than inside the disk. Apart from that the function can do anything on the circle.
If you add continuity to your function, then things cannot be too wild, but subgradients can still be empty, think for example of
on ![[0,1] . [0,1] .](https://web.archive.org/web/20210117230332im_/https://s0.wp.com/latex.php?latex=%5B0%2C1%5D+.&bg=ffffff&fg=000&s=0&c=20201002)
By Emre February 5, 2013 - 10:49 pm
First!
Also, really good lecture notes. Please keep this up.
By Sebastien Bubeck February 6, 2013 - 8:17 am
Thanks Emre ;).