We are now going to restrict our attention to Boolean functions . Computing a Boolean function is called a *decision problem*, since we need to decide for whether or not it is in the set . We call such a set a *language*.

Note that while for optimization it is more natural to consider functions with values in (since one needs to write down the solution of the optimization problem), if one is interested in proving hardness results it is fine to restrict to Boolean functions since one can always consider decision problems of the form ‘does the optimal solution achieve an objective value larger than ‘, for some fixed beforehand. If the latter decision problem is hard, then clearly the corresponding optimization problem is hard.

We define now the class as the set of languages such that there exists a Turing Machine that computes in polynomial time. One often says that the class consists of ‘easy’ decision problems.

Next we define the class as the set of languages such that there exists some , and a polynomial-time Turing Machine with the following property:

In words if , then for any , there exists a ‘small’ certificate such that if provided to the Turing Machine then it can decide in polynomial time the language . Thus the class consists of decision problems for which one can easily convince someone else that a candidate is indeed a solution to the decision problem (that is whether or not is in ).

The million dollars question asks whether . It is widely believe that this is not the case, but nobody has been able to prove it. Informally for problems in one simply needs to be able to verify a solution (the certificate), while in one needs to find the solution. In some sense the question is really about the problem of *creativity*, and whether or not creativity ‘truly’ exists.

**NP-completness**

We define now a concept of hardness related to the class . A language is -hard if for every language there exists a polynomial-time computable function such that for any , if and only if . We call the function a polynomial-time reduction of to . Intuitively a decision problem is -hard if it at least as hard as any other decision problem in .

The notion of -hardness can be viewed as a lower bound conditioned on the fact that . Indeed, if , then -hard decision problems cannot be decided in polynomial time.

If a language is -hard, and if , then one says that is -complete. It is not obvious at first sight that any such language exist. Indeed for a language to be -complete it means that it is in , and that it is as hard as any other language in . In the next section we will see our first example of an -complete language.

**Boolean formulas and the satisfiability problem**

Let be Boolean variables. A Boolean formula over these variables consists of the variables and the logical operations AND (), OR (), and NOT (). A Boolean formula is satisfiable if there exists an assignment to the Boolean variables (that is is assigned TRUE if , and FALSE if ) such that the corresponding value of the formula is TRUE. We define the language as the set of all satisfiable Boolean formulae.

Theorem (Cook-Levin 1971)is -complete.

*Proof:* First it is clear that , since it suffices to provide a truth assignment as a certificate. Now consider a language . We need to exhibit a polynomial-time algorithm that takes as input and outputs a Boolean formula with the following property: if and only if is satisfiable (that is we want to reduce to ). Let be the Turing Machine that proves that (with certificate of size and running time , both being polynomial). The idea of the proof is that is a Boolean formula that encodes the run of the Turing Machine on the input and ‘some’ certificate (the certificate being encode as part of the Boolean variables). Let us describe now how to construct the above Boolean formula. We will use the notation:

First we describe the Boolean variables involved: (which intuitively encodes the fact that the symbol is on the tape at location during the step of computation), (which encodes the fact that the state of the Machine is during the step of computation), and (which encodes the fact that the head is at location during the step of computation). Note that we can restrict to so that we have a polynomial number of variables.

The first ‘part’ of the formula describes the start configuration of on an input where the first portion is given by , that is

Note in particular that the initial symbols on the tape at location are left free, which is the critical idea since then they can model *any* possible certificate. The next parts of the formula will model the computation steps, which are of the form (assuming the head moves right for sake of notation)

with the additional constraints that there is only state (respectively one head position and one symbol per location) at each time

as well as the fact that a symbol can change only if the head is at the corresponding location ( in the following formula)

Taking the AND of the above formulae over all and , and the AND with the formula that encodes the fact that when the Machine halts its output is , one obtains the formula which is indeed polynomial-time computable. Clearly if then there exists a satisfying assignment that satisfies (assigning the variables to the value of a valid certificate, and the rest of the variables to what would happen during the computation of the Turing Machine with this certificate), and the converse is also clearly true.

**More NP-complete problems**

By now thousands of problems are known to be -complete. Here I just want to give a list of a few interesting examples. Many of these examples are languages over pairs where is a graph and is an integer. The language is then usually the set of such pairs such that in there exists a subgraph with at least vertices and with a certain prespecified structure.

A litteral is a Boolean variable or its negation , a clause is an OR of litterals (such as ), and a CNF is a Boolean formula which is an AND of clauses. A CNF is a CNF formula in which all clauses contain at most literals. We denote by the language of all satisfiable CNF formulae. One can show that is -complete (by reducing to ). One can also show that . The latter statement is easy to understand: a clause with variables is a statement of the form , thus a CNF can be represented as a graph of implications (over the Boolean variables and their negation). Verifying the statisfiability of the formula is then equivalent to searching for paths from a variable to its negation.

Does a given graph contain a subgraph of size at least with no edges between them (such a subgraph is called an *independent set*)? This ‘question’ (i.e., the corresponding language over pairs of graph and integers) is -complete.

Let us see how to prove this claim. First it is clear that . Thus it suffices now to show how to reduce (in polynomial time) any instance of to an instance of . Specifically, we show how to map a -clause 3CNF to a graph with vertices and such that is satisfiable if and only if contains an independent set of size at least . The idea is very simple: first for any clause in one associates a cluster of vertices in such that each vertex in this cluster corresponds to one of the possible satisfying assignment to the three variables involved in (we repeat one of the assignment in case the clause involves only one or two variables). Then one put an edge between any two vertices in the same cluster, and also an edge between any two vertices that correspond to two inconsistent assignment to the Boolean variables. A minute of thinking shows that clearly this graph has the property we were looking for, and clearly one can construct it in polynomial time.

The following ‘questions’ can be shown to be -complete via a reduction to :

- Does a graph contain a
*clique*(a subgraph where any two distinct vertices is connected) of size at least ? - Does a graph contain a
*vertex cover*(a subset of vertices such that any edge in the graph always have one endpoint ) of size at most ? - Does a graph contain a
*cut*of weight at least (i.e. a subset of vertices such that there is at least edges with one endpoint in and one endpoint in )?

The following problem is interesting because it looks extremely simple, yet it is -complete. Given integers , does there exist such that ?