Guest post by Amir Ali Ahmadi: Sum of Squares (SOS) Techniques: An Introduction, Part II/II

The reader is encouraged to read Part I of this series before reading this post. For ease of reference, we recall the polynomial optimization problem from our previous post:

(1)   \begin{equation*} \begin{array}{lll} \mbox{minimize} & p(x) &\ \\ \mbox{subject to} & x\in K\mathrel{\mathop:}=\{x\in\mathbb{R}^n\ |\ g_i(x)\geq 0, h_i(x)=0\}. &\ \end{array} \end{equation*}

This is the task of minimizing a multivariate polynomial p over a basic semialgebraic set K defined by the polynomials g_i, h_i.


Sum of squares and semidefinite programming

If a polynomial is nonnegative, can we write it in a way that its nonnegativity becomes obvious? This is the meta-question behind Hilbert’s 17th problem. As the title of this post suggests, one way to achieve this goal is to try to write the polynomial as a sum of squares of polynomials. We say that a polynomial p is a sum of squares (sos), if it can be written as p(x)=\sum_i{q_i^2(x)} for some polynomials q_i. Existence of an sos decomposition is an algebraic certificate for nonnegativity. Remarkably, it can be decided by solving a single semidefinite program.

 Theorem 1 A multivariate polynomial p in n variables and of degree 2d is a sum of squares if and only if there exists a positive semidefinite matrix Q (often called the Gram matrix) such that

(2)   \begin{equation*} p(x)=z^{T}Qz, \end{equation*}

where z is the vector of monomials of degree up to d

    \begin{equation*} z=[1,x_{1},x_{2},\ldots,x_{n},x_{1}x_{2},\ldots,x_{n}^d]. \end{equation*}

The proof is straightforward and is left to the reader. Note that the feasible set defined by these constraints is the intersection of an affine subspace (arising from the equality constraints matching the coefficients of p with the entries of Q in (2)) with the cone of positive semidefinite matrices. This is precisely the semidefinite programming (SDP) problem. The size of the Gram matrix Q is \binom{n+d}{d} \times \binom{n+d}{d}, which for fixed d is polynomial in n. Depending on the structure of p, there are well-documented techniques for further reducing the size of the Gram matrix Q and the monomial vector z. We do not pursue this direction here but state as an example that if p is homogeneous of degree 2d, then it suffices to place in the vector z only monomials of degree exactly d.

Example: Consider the task proving nonnegativity of the polynomial

    \begin{equation*} \begin{array}{lll} p(x)&=&x_1^4-6x_1^3x_2+2x_1^3x_3+6x_1^2x_3^2+9x_1^2x_2^2-6x_1^2x_2x_3-14x_1x_2x_3^2 \\ \ &\ &+4x_1x_3^3+5x_3^4-7x_2^2x_3^2+16x_2^4. \end{array} \end{equation*}

Since this is a form (i.e., a homogeneous polynomial), we take


One feasible solution to the SDP in (2) is given by

    \[Q=\begin{pmatrix} 1 & -3 & 0 & 1 & 0 & 2 \\ -3 & 9 & 0 & -3 & 0 & -6 \\ 0 & 0 & 16 & 0 & 0 & -4 \\ 1 & -3 & 0 & 2 & -1 & 2 \\ 0 & 0 & 0 & -1 & 1 & 0 \\ 2 & -6 & 4 & 2 & 0 & 5 \end{pmatrix}.\]

Upon a decomposition Q=\sum_{i=1}^3a_i a_i^T, with a_1=(1,-3,0,1,0,2)^T, a_2=(0,0,0,1,-1,0)^T, a_3=(0,0,4,0,0,-1)^T, one obtains the sos decomposition

(3)   \begin{equation*} p(x)=(x_1^2-3x_1x_2+x_1x_3+2x_3^2)^2+(x_1x_3-x_2x_3)^2+(4x_2^2-x_3^2)^2. \end{equation*}


You are probably asking yourself right now whether every nonnegative polynomial can be written as a sum of squares. Did we just get lucky on the above example? Well, from complexity considerations alone (see previous post), we know that we should expect a gap between nonnegative and sos polynomials, at least for large n.

In a seminal 1888 paper, Hilbert was the first to show that there exist nonnegative polynomials that are not sos. In fact, for each combination of degree and dimension, he showed whether such polynomials do or do not exist. For example, one famous theorem of Hilbert from that paper is that all nonnegative ternary quartic forms are sos. So we did not really use any luck in our example above! The same would have happened for any other nonnegative quartic form in three variables. Hilbert’s proof of existence of nonnegative polynomials that are not sos was not constructive. The first explicit example interestingly appeared nearly 80 years later and is due to Motzkin:

(4)   \begin{equation*} M(x_1,x_2,x_3)\mathrel{\mathop:}=x_1^4x_2^2+x_1^2x_2^4-3x_1^2x_2^2x_3^2+x_3^6. \end{equation*}

Nonnegativity of M follows from the arithmetic-geometric inequality (why?). Non-existence of an sos decomposition can be shown by some algebraic manipulations or by the systematic tools we have today for showing infeasibility of a semidefinite program (separating hyperplane theorems). From an application viewpoint, the good news for sum of squares optimization is that constructing polynomials of the type in (4) is not a trivial task. This is especially true if additional structure is required on the polynomial. For example, the following problem is still open.

Open problem. Construct an explicit example of a convex, nonnegative polynomial that is not a sum of squares.

This question is due to Parrilo. The motivation behind it is quite clear: we would like to understand whether sos optimization is exact for the special case of convex polynomial optimization problems. Blekherman has shown with non-constructive arguments that such “bad” convex polynomials must exist when the degree is four or larger and the number of variables goes to infinity. However, we do not know the smallest (or any reasonable) dimension for which this is possible and lack any explicit examples. For the reader interested in tackling this problem, it is known that any such convex polynomial must necessarily be not “sos-convex”. Roughly speaking, sos-convex polynomials are convex polynomials whose convexity is certified by an appropriately defined sum of squares identity. Examples of convex but not sos-convex polynomials have recently appeared and a characterization of the degrees and dimensions where they exist is now available. Interestingly, this characterization coincides with that of Hilbert, for reasons that are also not well-understood; see Chapter 3 of our thesis.

Having shown that not every nonnegative polynomial is a sum of squares of polynomials, Hilbert asked in his 17th problem whether every such polynomial can be written as a sum of squares of rational functions. Artin answered the question in the affirmative in 1927. As we will see next, such results allow for a hierarchy of semidefinite programs that approximate the set of nonnegative polynomials better and better.


Positivstellensatz and the SOS hierarchy

Consider proving a statement that we all learned in high school:

    \[\forall a,b,c,x,\ ax^2+bx+c=0\Rightarrow b^2-4ac\geq0.\]

Just for the sake of illustration, let us pull an algebraic identity out of our hat which certifies this claim:

(5)   \begin{equation*} b^2-4ac=(2ax+b)^2-4a(ax^2+bx+c). \end{equation*}

Think for a second why this constitutes a proof. The Positivstellensatz is a very powerful algebraic proof system that vastly generalizes what we just did here. It gives a systematic way of certifying infeasibility of any system of polynomial equalities and inequalities over the reals. Sum of squares representations play a central role in it. (They already did in our toy example above if you think about the role of the first term on the right hand side of (5)). Modern optimization theory adds a wonderfully useful aspect to this proof system: we can now use semidefinite programming to automatically find suitable algebraic certificates of the type in (5).

The Positivstellensatz is an example of a theorem of the alternative. The reader may already be familiar with similar fundamental theorems, such as the Hilbert’s (weak) Nullstellensatz (1893),

a system of polynomial equations f_i(z)=0 is infeasible over the complex numbers


there exist polynomials t_i(z) such that \sum_i t_i(z)f_i(z)=-1,

or the Farkas Lemma (1902) of linear programming,

a system of linear (in)equalities Ax+b=0, Cx+d\geq0 is infeasible over the reals


there exist \lambda\geq0, \mu such that A^T\mu+C^T\lambda=0, b^T\mu+d^T\lambda=-1.

These theorems typically have an “easy” (well, trivial) direction and a “hard” direction. The same is true for the Positivstellensatz.

Theorem 2  (Positivstellensatz — Stengle (1974)) The basic semialgebraic set

    \[K\mathrel{\mathop:}=\{x\in\mathbb{R}^n\ |\ g_i(x)\geq 0, i=1,\ldots,m, h_i(x)=0, i=1,\ldots,k\}\]

 is empty


there exist polynomials t_1,\ldots,t_k and sum of squares polynomials s_0,s_1,\ldots,s_m, s_{12},s_{13},\ldots,s_{m-1m}, s_{123},\ldots,s_{m-2m-1m}, \ldots,s_{12\ldots m} such that

(6)   \begin{equation*} \begin{array}{lll} -1&=&\sum\limits_{i=1}^kt_i(x)h_i(x)+s_0(x)+ \sum\limits_{\{i\}}s_i(x)g_i(x)\\ \ &+&\sum\limits_{\{i,j\}}s_{ij}(x)g_i(x)g_j(x)+\sum\limits_{\{i,j,k\}}s_{ijk}(x)g_i(x)g_j(x)g_k(x) \\ \ &+&\cdots + s_{ijk\ldots m}(x)g_i(x)g_j(x)g_k(x)\ldots g_m(x). \end{array} \end{equation*}

The number of terms in this expression is finite since we never raise any polynomial g_i to a power larger than one. The sum of squares polynomials s_{i\ldots j} are of course allowed to be the zero polynomial, and in practice many of them often are. There are bounds in the literature on the degree of the polynomials t_i, s_{i\ldots j}, but of exponential size as one would expect for complexity reasons. There is substantial numerical evidence, however, from diverse application areas, indicating that in practice (whatever that means) the degrees of these polynomials are usually quite low. We remark that the Positivstellensatz is a very powerful result. For example, it is straightforward to show that the solution to Hilbert’s 17th problem follows as a corollary of this theorem (how?).

Under minor additional assumptions, refined versions of the Positivstellensatz we presented are available. The two most well-known are perhaps due to Schmüdgen and Putinar. For example, Putinar’s Positivstellensatz states that if the set K satisfies the so-called Archimedean property (a property slightly stronger than compactness), then emptiness of K guarantees a representation of the type (6), where the second and third line are scratched out; i.e., there is no need to take products of the constraints g_i(x)\geq 0. While this may look like a simplification at first, there is a tradeoff: the degree of the sos multipliers s_i may need to be higher in Putinar’s representation than in Stengle’s. This makes intuitive sense as the proof system needs to additionally prove statements of the type g_i\geq 0,g_j\geq 0\Rightarrow g_ig_j\geq 0, while in Stengle’s representation this is taken as an axiom.

SOS hierarchies. Positivstellensatz results form the basis of sos hierarchies of Parrilo and Lasserre for solving the polynomial optimization problem (1). The two approaches only differ in the version of the Positivstellensatz they use (originally, Parrilo’s paper follows Stengle’s version and Lasserre’s follows Putinar’s), and the fact that Lasserre presents the methodology from the dual (but equivalent) viewpoint of moment sequences. In either case though, the basic idea is pretty simple. We try to obtain the largest lower bound for problem (1), by finding the largest \gamma for which the set \{x\in K, p(x)\leq\gamma\} is empty. We certify this emptiness by finding Positivstellensatz certificates. In level l of the hierarchy, the degree of the polynomials t_i and the sos polynomails s_i in (6) is bounded by l. As l increases, the quality of the lower bound monotonically increases, and for each fixed l, the search for the optimal \gamma, and the polynomials t_i,s_i is a semidefinite optimization problem (possibly with some bisection over \gamma).


Revisiting MAXCUT

Recall from Seb’s earlier lecture that we saw a beautiful algorithm for MAXCUT due to Goemans and Williamson which produced an approximation ratio of 0.878. The algorithm had two steps: first we solved a semidefinite program, then we performed some clever randomized rounding. In this section we focus only on the first step. We show that even low degree Positivstellensatz refutations can produce stronger bounds than the standard SDP relaxation.

Consider the 5-cycle with all edge weights equal to one. It is easy to see that the MAXCUT value of this graph is equal to 4. However, the standard SDP relaxation (i.e. the one used in the Goemans and Williamson algorithm) produces an upper bound of \frac{5}{8}(\sqrt{5}+5)\approx 4.5225.

The MAXCUT value of the 5-cycle is equal to minus the optimal value of the quadratic program

(7)   \begin{equation*}\nonumber \begin{array}{lll} \mbox{minimize} & \frac{1}{2}(x_1x_2+x_2x_3+x_3x_4+x_4x_5+x_1x_5)-\frac{5}{2} &\ \\ \mbox{subject to} & x_i^2=1, \ \ \ i=1,\ldots,5. &\ \end{array} \end{equation*}

We will find the largest constant \gamma such that the objective function minus \gamma is algebraically certified to be nonnegative on the feasible set. To do this, we solve the sos optimization problem

    \begin{equation*} \begin{array}{ll} \mbox{maximize} & \gamma  \\ \mbox{such that} & \frac{1}{2}(x_1x_2+x_2x_3+x_3x_4+x_4x_5+x_1x_5)-\frac{5}{2}-\gamma+\sum\limits_{i=1}^5 t_i(x)(x_i^2-1) \\ & \mbox{is sos.} \\ \end{array} \end{equation*}

The decision variables of this problem are the constant \gamma and the coefficients of the polynomials t_i(x), which in this case we parametrize to be quadratic functions. This sos program results in a polynomially sized semidefinite optimization problem via Theorem 1. The optimal value of the program is -4; i.e., we have solved the MAXCUT instance exactly.

You may be wondering, “can we show that a certain level of the sos hierarchy combined with an appropriately designed rounding procedure produces an approximation ratio of better than 0.878?” Let’s just say that if you did this, you would probably become an overnight celebrity.



There are very nice implementations of sum of squares optimization solvers that automate the process of setting up the resulting semidefinite programs. The interested reader may want to play around with SOSTOOLS, YALMIP, or GloptiPoly.



While we focused in this post on the polynomial optimization problem, the impact of sum of squares optimization goes much beyond this area. In dynamics and control, sos optimization has enabled a paradigm shift from classical linear control to an efficient framework for design of nonlinear controllers that are provably safer, more agile, and more robust. Papers on applications of sos optimization have appeared in areas as diverse as quantum information theory, robotics, geometric theorem proving, formal verification, derivative pricing, stochastic optimization, and game theory, among others. In theoretical computer science, sos techniques are currently a subject of intense study. Not too long ago, I attended a plenary presentation at the CCC by Subhash Khot, where one of the last things he said was, “the sos hierarchies are currently the greatest threat to the unique games conjecture.” If you are an ORFE student and still not convinced that this SOS business is actually useful, you may find relief in knowing that SOS is in fact the core discipline of ORFE 😉

This entry was posted in Optimization. Bookmark the permalink.

One Response to "Guest post by Amir Ali Ahmadi: Sum of Squares (SOS) Techniques: An Introduction, Part II/II"