In optimization we are ‘given’ a set , and a real-valued function defined on this set. The objective is to find a point which minimizes on .

This goal is ambitious. It encompasses an enormous variety of problems, from engineering to politics. Here is an example that immediately suggests the difficulty of the problem: building an AI (Artificial Intelligence) for a game is an optimization problem, since one wants to find a strategy that maximizes the chances of winning.

In this course we are going to explore the frontiers of what we know about mathematical optimization. We will discuss both positive results: how can you solve such and such optimization problems; but also negative ones with statements like: ‘this family of problems is too hard to be solved in reasonable time’. The course will be divided into three parts, each one explores a different aspect of mathematical optimization.

**Part 1: Computational complexity of optimization**

The idea here is to look at optimization from the most fundamental perspective: how many ‘elementary operations’ do I need to do to find ? We will see that this question is very far from being resolved, and that there might be good reasons why this is such a difficult question. Here is sample of contents that will go into this chapter:

- The ellipsoid method
- Conic Programming and Interior point methods
- P vs. NP
- SDP relaxation of MAX-CUT
- Extended formulations and Yannakakis’ Theorem
- Dynamic programming

**Part 2: Oracle complexity of optimization**

The first part of the course focuses on the complexity in terms of ‘elementary operations’. Here we take another point of view: we simply count the number of ‘questions’ we ask about the function . More formally we assume that we have an oracle that can give us the value and gradient of at any point . The oracle complexity is then the number of calls to this oracle one needs to do to find . While being somehow less fundamental than the computational complexity, the oracle complexity allows for an almost ‘complete’ theory. Here is a partial list of topics for this part:

- Oracle complexity of Lipschitz optimization
- Gradient Descent for smooth convex optimization (respectively smooth and strongly convex)
- Oracle complexity lower bounds for smooth convex optimization (respectively smooth and strongly convex)
- Optimal methods: Nesterov’s Accelerated Gradient Descent, FISTA
- Mirror Descent
- Mirror Descent for saddle points
- Coordinate Descent, Conditional Gradient Method (aka Frank-Wolfe)

**Part 3: Statistical complexity of stochastic optimization**

In many situations one does not have a perfect knowledge about the function . In this last part of the course we explore what happens when the oracle gives noisy responses. Partial list of topics:

- Mirror Descent for stochastic optimization

- Lower bounds for stochastic optimization
- Randomized methods for deterministic optimization
- order stochastic optimization

In most of this course we focus on convex set of constraints , and convex functions . I would like here to review basic definitions and results related to convexity. I will also show why, ‘in some sense’, we can restrict our attention to convexity without loss of generality.

**Convex sets**

Definition 1is convex if for all pair of points , the segment is included in , that is

is a convex body if it is convex, compact, and with non-empty interior.

Definition 2The convex hull of is defined as the set of convex combinations of points from , that is

It is easy to check that the convex hull can also be defined as the minimal (for inclusion) convex set that contains . A basic result about convex sets that we shall use extensively is the Separation Theorem.

Theorem 3 (Separation Theorem)Let be a closed convex set, and . Then, there exists and such that

Note that if is not closed then one can only guarantee that (and ). A related result is the Supporting Hyperplane Theorem.

Theorem 4 (Supporting Hyperplane Theorem)Let be a convex set, and . Then, there exists such that

**Convex functions**

Definition 5Let be convex, and . is convex if it always lies below its chords, that is

It is obvious that a function is convex if and only if its epigraph

is a convex set.

Definition 6Let , and . Then is a subgradient of at if for any one has

The set of subgradients of at is denoted .

Proposition 7Let be convex, and . If then is convex. Conversely if is convex then for any . Furthermore if is convex and differentiable at then .

*Proof:* The first claim is almost trivial: let , then by definition one has

which clearly shows that is convex by adding the two (appropriately rescaled) inequalities.

Now let us prove that a convex function has subgradients in the interior of . We build a subgradient by using a supporting hyperplane to the epigraph of the function. Let . Then clearly , and is a convex set. Thus by using the Supporting Hyperplane Theorem, there exists such that

Clearly, by letting tends to infinity, one can see that . Now let us assume that is in the interior of . Then for small enough, , which implies that cannot be equal to . Thus rewriting (1) for one obtains

Thus which concludes the proof of the second claim.

Finally let be a convex and differentiable function. Then by definition:

which shows that .

**Why convexity?**

This section is neither meant to provide a formal argument, nor to give an algorithmic idea. However it shows that, intuitively, assuming that and are convex comes ‘without loss of generality’. In particular this should be taken as a word of caution: an optimization problem with convex constraints and convex objective is **not** easy in general. Somehow the difficulty of the problem is encoded in the geometry of : if is ‘nice’, then minimizing convex objectives over is ‘easy’. The rest of the class will make this statement much more precise. Let us now show why convexity is ‘universal’. First note that we can do the following rewriting of our problem:

and a solution to the latter problem immediately gives a solution to the former. Thus we have already reduced the case of a general function , to a simple linear function of the form for some . Now remark that we always have

This follows from the following computation:

Here a solution to gives a solution to if one is computationally able to decompose into a convex combination of points in (this can be a very difficult task in some cases!). In any case we showed that

and thus we reduced an arbitrary optimization problem to a linear optimization over a convex set.

## By Kunal Chaudhury March 11, 2015 - 6:41 am

Nice post!

I was wondering if the requiring X to have a non-empty interior is somewhat restrictive. It appears that the analysis on this page (and the subsequent analysis of the ellipsoid method) would go through (with minor changes) if one required X to have a non-empty relative interior instead.

## By mbkv September 28, 2013 - 6:07 am

I got here from Amir’s guest post on SOS. First of all, thanks to your nice post. So,the take-home point from this is that “Any arbitrary optimization problem, regardless of the nature of the objective f(x) we want to minimlze and regardless of the nature of the set(the set can be semialgebraic?) over which we want to minimize, can be reduced to a linear optimization over a convex set”.

Have i understood and stated it correctly?

## By Sebastien Bubeck September 28, 2013 - 2:25 pm

Yes this is correct, but note that the reduction is absolutely non-algorithmic, this is only a formal rewriting!

## By haipeng February 12, 2013 - 10:10 am

Proposition 7 says that every interior point has subgradients for a convex function, what about points on the boundary?

## By Sebastien Bubeck February 12, 2013 - 4:20 pm

Convex functions are really nice in the interior of the domain, but they can have basically an arbitrary behavior on the boundary. Think for example of a convex function defined on a disk . The only constraint for the function on the circle is basically that it has to take values larger than inside the disk. Apart from that the function can do anything on the circle.

If you add continuity to your function, then things cannot be too wild, but subgradients can still be empty, think for example of on

## By Emre February 5, 2013 - 10:49 pm

First!

Also, really good lecture notes. Please keep this up.

## By Sebastien Bubeck February 6, 2013 - 8:17 am

Thanks Emre ;).