In optimization we are ‘given’ a set , and a real-valued function defined on this set. The objective is to find a point which minimizes on .
This goal is ambitious. It encompasses an enormous variety of problems, from engineering to politics. Here is an example that immediately suggests the difficulty of the problem: building an AI (Artificial Intelligence) for a game is an optimization problem, since one wants to find a strategy that maximizes the chances of winning.
In this course we are going to explore the frontiers of what we know about mathematical optimization. We will discuss both positive results: how can you solve such and such optimization problems; but also negative ones with statements like: ‘this family of problems is too hard to be solved in reasonable time’. The course will be divided into three parts, each one explores a different aspect of mathematical optimization.
Part 1: Computational complexity of optimization
The idea here is to look at optimization from the most fundamental perspective: how many ‘elementary operations’ do I need to do to find ? We will see that this question is very far from being resolved, and that there might be good reasons why this is such a difficult question. Here is sample of contents that will go into this chapter:
- The ellipsoid method
- Conic Programming and Interior point methods
- P vs. NP
- SDP relaxation of MAX-CUT
- Extended formulations and Yannakakis’ Theorem
- Dynamic programming
Part 2: Oracle complexity of optimization
The first part of the course focuses on the complexity in terms of ‘elementary operations’. Here we take another point of view: we simply count the number of ‘questions’ we ask about the function . More formally we assume that we have an oracle that can give us the value and gradient of at any point . The oracle complexity is then the number of calls to this oracle one needs to do to find . While being somehow less fundamental than the computational complexity, the oracle complexity allows for an almost ‘complete’ theory. Here is a partial list of topics for this part:
- Oracle complexity of Lipschitz optimization
- Gradient Descent for smooth convex optimization (respectively smooth and strongly convex)
- Oracle complexity lower bounds for smooth convex optimization (respectively smooth and strongly convex)
- Optimal methods: Nesterov’s Accelerated Gradient Descent, FISTA
- Mirror Descent
- Mirror Descent for saddle points
- Coordinate Descent, Conditional Gradient Method (aka Frank-Wolfe)
Part 3: Statistical complexity of stochastic optimization
In many situations one does not have a perfect knowledge about the function . In this last part of the course we explore what happens when the oracle gives noisy responses. Partial list of topics:
- Mirror Descent for stochastic optimization
- Lower bounds for stochastic optimization
- Randomized methods for deterministic optimization
- order stochastic optimization
In most of this course we focus on convex set of constraints , and convex functions . I would like here to review basic definitions and results related to convexity. I will also show why, ‘in some sense’, we can restrict our attention to convexity without loss of generality.
Definition 1 is convex if for all pair of points , the segment is included in , that is
is a convex body if it is convex, compact, and with non-empty interior.
Definition 2 The convex hull of is defined as the set of convex combinations of points from , that is
It is easy to check that the convex hull can also be defined as the minimal (for inclusion) convex set that contains . A basic result about convex sets that we shall use extensively is the Separation Theorem.
Theorem 3 (Separation Theorem) Let be a closed convex set, and . Then, there exists and such that
Note that if is not closed then one can only guarantee that (and ). A related result is the Supporting Hyperplane Theorem.
Theorem 4 (Supporting Hyperplane Theorem) Let be a convex set, and . Then, there exists such that
Definition 5 Let be convex, and . is convex if it always lies below its chords, that is
It is obvious that a function is convex if and only if its epigraph
is a convex set.
Definition 6 Let , and . Then is a subgradient of at if for any one has
The set of subgradients of at is denoted .
Proposition 7 Let be convex, and . If then is convex. Conversely if is convex then for any . Furthermore if is convex and differentiable at then .
Proof: The first claim is almost trivial: let , then by definition one has
which clearly shows that is convex by adding the two (appropriately rescaled) inequalities.
Now let us prove that a convex function has subgradients in the interior of . We build a subgradient by using a supporting hyperplane to the epigraph of the function. Let . Then clearly , and is a convex set. Thus by using the Supporting Hyperplane Theorem, there exists such that
Clearly, by letting tends to infinity, one can see that . Now let us assume that is in the interior of . Then for small enough, , which implies that cannot be equal to . Thus rewriting (1) for one obtains
Thus which concludes the proof of the second claim.
Finally let be a convex and differentiable function. Then by definition:
which shows that .
This section is neither meant to provide a formal argument, nor to give an algorithmic idea. However it shows that, intuitively, assuming that and are convex comes ‘without loss of generality’. In particular this should be taken as a word of caution: an optimization problem with convex constraints and convex objective is not easy in general. Somehow the difficulty of the problem is encoded in the geometry of : if is ‘nice’, then minimizing convex objectives over is ‘easy’. The rest of the class will make this statement much more precise. Let us now show why convexity is ‘universal’. First note that we can do the following rewriting of our problem:
and a solution to the latter problem immediately gives a solution to the former. Thus we have already reduced the case of a general function , to a simple linear function of the form for some . Now remark that we always have
This follows from the following computation:
Here a solution to gives a solution to if one is computationally able to decompose into a convex combination of points in (this can be a very difficult task in some cases!). In any case we showed that
and thus we reduced an arbitrary optimization problem to a linear optimization over a convex set.