In this post I would like to argue that showing -completeness of a problem is in fact only a weak certificate of difficulty.
Dynamic Programming
Consider the problem of the previous lecture (which is -complete): given positive integers , does there exist such that ?
Let . We will describe an algorithm which has a time complexity of . At first sight this looks like a polynomial-time complexity, but if it was the case it would prove that . In fact this complexity is not quite polynomial: Note that the size of the encoding of the input for the problem (that is the integers ) is of order . Thus is exponential in the size of the encoding, which means that the time complexity is not polynomial. However the algorithm is polynomial with respect to the ‘numerical value’ of the input, and we say in that case that the algorithm is pseudo-polynomial. An -complete problem with a pseudo-polynomial algorithm is called weakly -complete. In particular is weakly -complete. On the contrary a problem is strongly -complete if it is -complete even when one uses unary coding for numbers rather than binary codin (this is a trick to enforce that polynomial-time now refers to polynomial-time in the numerical size of the input).
Let us now describe the algorithm with -time complexity for . The idea is very simple, yet very powerful. It builds on the following observations: First remark that the question equivalently asks to find such that ( has to be even otherwise the partition problem is clearly impossible). Now Let , we want to know the value of . Note that is one if and only if either is one, or if is one. This relation gives us a dynamic programming algorithm: First compute , then using these values compute , and repeat this iteratively until computing with the previously computed values . Each step of this iterative procedure is of complexity , and there is a total of steps, resulting in a complexity of to solve .
In general the idea of dynamic programming is to break down recursively the problem into a limited number of simpler pieces. It is a very powerful technique of wide applicability, but we won’t have time to see other examples.
SDP relaxations
We discuss now the problem , which is the optimization version of the decision problem that we have seen in the previous lecture, where one is looking for the cut of maximal size. Let be the adjacency matrix of the graph , or more generally let be a symmetric matrix of non-negative weights. We encode a partition of into two sets as a vector . Then is equal to if both and are in the same element of the partition, and it is equal to if they are in different elements. In particular the value of the cut corresponding to is given by
Thus finding the maximal cut is equivalent to the following optimization problem:
Since is -complete (in the sense that the corresponding decision problem is -complete) we know that the above problem is ‘hard’, in the sense that a polynomial-time algorithm solving (1) would prove that . Note that naively one may overlook the difficulty of (1) since it is ‘simply’ a quadratic objective with quadratic equality constraints (). Remark in particular that if one replaces in the constraint the hypercube by the Euclidean sphere, then one obtains an efficiently solvable problem (it is the problem of computing the maximal eigenvalue of some symmetric matrix).
We will see now that while (1) is a difficult optimization problem, it is in fact possible to find relatively good approximate solutions! Let be uniformly drawn on the hypercube , then clearly:
This means that on average, the obtained cut is at least half as good (in terms of the value of the cut) than the optimal cut. In fact this implies that with probability at least , the obtained cut is a -approximation of the maximum. Thus by repeatedly sampling from the hypercube, one can get arbitrarily close (with probability approaching ) to a -approximation of !
Next we show that in fact one can get much better than a -approximation with a more subtle sampling strategy. First note that:
and the last optimization problem is an SDP (with linear equality constraints) which can thus be solved efficiently. We call the solution to the problem
the SDP relaxation of .
The next theorem is of tremendous importance. It gives a bound on the quality of the SDP relaxation, and the proof gives a randomized algorithm to compute an -approximate solution to (1).
Theorem 1 (Goemans and Williamson 1995) The SDP relaxation for satisfies
The proof of this result is based on the following elementary geometric lemma.
Lemma 2 Let with for , and . Then
Proof: Let (with row ) be such that . Note that since one has (remark also that necessarily , which will be important in the proof of the theorem). Let be such that . Then , and in particular
Now a quick picture shows that . Using the fact that and conclude the proof.
We can now get to the proof of the Goemans and Williamson’s result.
Proof: Let be the solution to the SDP relaxation of (that is a solution of (2)), , and . We shall use the following inequality:
Now remark that, thanks to the previous lemma and the facts that and , one has
The previous result crucially used the form of the objective. We show next a result that applies more generally but with a worse constant of approximation. First note that, as we already pointed out, the objective is of the form:
where . Indeed in one has which is also called the graph Laplacian of , and it is easy to verify that in this case which shows that .
Theorem 3 (Nesterov 1997) Let , then
Proof: Let be the solution of
, and . Remark that thanks to the previous lemma one has
Thus to prove the result it is enough to show that , which is itself implied by (since is positive semi-definite, just write the eigendecomposition). Now the last inequality is true thanks to a simple Taylor expansion. Indeed recall that and thus denoting by the matrix where the entries are raised to the power one has
Finally one can conclude using the fact if then . (This can be seen by writing , , and thus , which means that is a Gram-matrix and thus it is positive semi-definite.)
Hardness of approximation
In this previous section we have seen how to improve a trivial -approximation algorithm to a highly non-trivial -approximation algorithm. Of course at this point one may wonder whether better approximation guarantees are possible for . In the last two decades some technology have been developed to answer this question. In the early Nineties, Sanjeev Arora and his co-authors have discovered the so-called Theorem (Probabilistically Checkable Proofs). This result gives a new randomized perspective on the class , and it can be used to prove theorems such as the following one. We refer the interested reader to Chapter 11 in Arora and Barak for more details on this (including the proof).
Theorem 4 There is some such that computing a -approximation to is -hard. For every , computing a -approximation to is -hard.
The above theorem is nice, but for example for it might not give a sharp constant in terms of difficulty of approximation. The Unique Games Conjecture, a now decade-old conjecture made by Subhash Khot, implies sharper results for approximation. For example this conjecture (together with ) implies that the constant derived by Goemans and Williamson for is unimprovable: there is no polynomial-time algorithm that achieve a better constant in terms of approximation.
Extended formulations
Remark that (3) can be written as
which is ‘just’ a linear optimization over the polytope (which is called the correlation polytope). We have seen in previous lectures that linear optimization over a polytope can be solved in polynomial-time if the polytope can be described with a polynomial number of faces. Of course here we can immediately see that the correlation polytope has an exponential number of vertices. But this tells us nothing in terms of the faces, as one can see with the hypercube for example ( vertices but only faces). Even worse, the number of faces in the ambient space tells us nothing about possible representations of the polytope in higher dimension (in the sense that the polytope would be a projection of this higher dimensional representation). Think for example of the -ball which can represented as follows:
In other words while the -ball has faces, it can be represented as the projection of a polytope in dimension with only faces. This type of representation is called an extended formulation. Let be a polytope, then is an extension of if
A quantity of great interest, called the extension complexity of , is the minimal number of faces of an extension of . The extension complexity characterizes in some sense the difficulty of linear optimization over when one restricts to Linear Programming techniques. While this quantity seems completely untractable, a famous result of Yannakakis shows that it can be described as a completely combinatorial quantity: it is equal to the non-negative rank of the slack matrix of (the matrix where each row corresponds to a vertex of , and each column corresponds to a face , and the corresponding entry if ). Controlling the non-negative rank is no easy task, but in some cases it can be done. For instance a very recent results of Fiorini et al. shows that the extension complexity of the correlation polytope is at least for some constant . This can be viewed as an unconditional lower bound on when one restricts to Linear Programming techniques.
Average-case complexity, smoothed analysis
So far we restricted ourselves to worst-case complexity: An algorithm is polynomial only if for every instance the running time is polynomial in . This may be considered too strict, as it might be that an algorithm run in polynomial time on any ‘real-life’ instance , but it could have very bad running time on artificial instances. This is exactly what happens with the simplex method that you have studied in ORF522: there are instances of where the simplex method takes exponential time (the Klee-Minty cube), yet in practice it is much more efficient than the ellipsoid method (and it is competitive with interior-point methods). A great result of Spielman and Teng from 2001 shows that indeed the examples that make the simplex method fail are artificial: if one adds a little bit of noise to the instance, then with high probability the simplex method will run in polynomial time. This is called the smoothed analysis of the simplex method, and it falls under the more general research agenda of understanding average-case complexity rather than worst-case complexity.