Convex body chasing, Steiner point, Sellke point, and SODA 2020 best papers

Big congratulations to my former intern Mark Sellke, and to the CMU team (C. J. Argue, Anupam Gupta, Guru Guruganesh, and Ziye Tang) for jointly winning the best paper award at SODA 2020 (as well as the best student paper for Mark)! They obtain a linear in the dimension competitive ratio for convex body chasing, a problem which was entirely open for any d \geq 3 just two years ago. What’s more is that they found the algorithm (and proof) from The Book! Let me explain.


Convex body chasing

In convex body chasing an online algorithm is presented at each time step with a convex body K_t \subset \mathbb{R}^d, and it must choose a point x_t \in K_t. The online algorithm is trying to minimize its total movement, namely \sum_{t \geq 0} \|x_{t+1} - x_{t}\| (all the norms here will be Euclidean, for simplicity). To evaluate the performance of the algorithm, one benchmarks it against the smallest movement one could achieve for this sequence of bodies, if one knew in advance the whole sequence. A good example to have in mind is if the sequence K_t corresponds to lines rotating around some fixed point, then the best thing to do is to move to this fixed point and never move again, while a greedy algorithm might have infinite movement (in the continuous time limit). The competitive ratio is the worst case (over all possible sequences of convex bodies) ratio of the algorithm’s performance to those of the oracle optimum. This problem was introduced in 1993 by Linial and Friedman, mainly motivated by considering a more geometric version of the k-server problem.

As it turns out, this problem in fact has a lot of “real” applications. It is not too surprising given how elementary the problem is. Just to give a flavor, Adam Wierman and friends show that dynamic powering of data centers can be viewed as an instance of convex *function* chasing. In convex function chasing, instead of a convex body one gets a convex function f_t, and there is then a movement cost \|x_t - x_{t-1}\| and a *service cost* f_t(x_t). While this is more general than convex body chasing, it turns out that convex body chasing in dimension d+1 is enough to solve convex function chasing in dimension d (this is left to the reader as an exercise). Roughly speaking in dynamic power management, one controls various power levels (number of servers online, etc), and a request correspond to a new set of jobs. Turning on/off servers has an associated cost (the movement cost), while servicing the jobs with a certain number of servers has an associated delay (the service cost). You get the idea.


The Steiner point

In a paper to appear at SODA too, we propose (with Yin Tat Lee, Yuanzhi Li, Bo’az Klartag, and Mark Sellke) to use the Steiner point for the nested convex body chasing problem. In nested convex body chasing the sequence of bodies is nested, and to account for irrelevant additive constant we assume that the oracle optimum starts from the worst point in the starting set K_0. In other words, opt is starting at the point the furthest away in K_0 from the closest point in K_T, so its movement is exactly equal to the Hausdorff distance between K_0 and K_T. So all we have to do is to give an online algorithm whose movement is proportional to this Hausdorff distance. Some kind of Lipschitzness in Hausdorff distance. Well, it turns out that mathematicians have been looking at this for centuries already, and Steiner back in the 1800’s introduced just what we need: a selector (a map from convex sets to points in them, i.e. S(K) \in K for any convex set K) which is Lipschitz with respect to the Hausdorff distance, that is:

    \[ \|S(K) - S(K')\| \leq L \cdot \mathrm{dist}_H(K,K') \,. \]

Note that obtaining such a selector is not trivial, for example the first natural guess would be the center of gravity, but it turns out that this is not Lipschitz (consider for example what happens with K being a very thin triangle and K' being the base of this triangle). Crazy thing is, this Steiner point selector (to be defined shortly) is not only Lipschitz, it is in fact the most Lipschitz selector for convex sets. How would you prove such a thing? Well, you clearly need a miracle to happen, and the miracle here is that the Steiner point satisfies some symmetries, which define it uniquely. From there all you need to do is that starting with any selector, you can add some of these symmetries while also improving the Lipschitz constant (for more on this see the references in the linked paper above).

OK, so what is the Steiner point? It has many definitions, but a particular appealing one from an optimizer’s viewpoint is as follows. For a convex set K and a vector \theta \in \mathbb{R}^d let g_K(\theta) be the maximizer of the linear function \theta in the set K. Then the Steiner point of K, \mathrm{St}(K) is defined by:

    \[ \mathrm{St}(K) = \mathbb{E}_{\theta : \|\theta\| \leq 1} [g_K(\theta)] \,. \]

In words: for any direction take the furthest point in K in that direction, and average over all directions. This feels like a really nice object, and clearly it satisfies \mathrm{St}(K) \in K (i.e., it is a selector). The algorithm we proposed in our SODA paper is simply to move to x_t = \mathrm{St}(K_t). Let us now see how to analyze this algorithm.


The nested case proof

First we give an alternative formula for the Steiner point. Denote h_K(\theta) = \max_{x \in K} \theta \cdot x, and observe that g_K(\theta) = \nabla h_K(\theta) for almost all \theta. Thus using the divergence theorem one obtains:

    \[ \mathrm{St}(K) = d \cdot \mathbb{E}_{\theta : \|\theta\| = 1} [\theta h_K(\theta)] \,. \]

(The factor d comes from the ratio of the volume of the ball to the sphere, and the \theta in the expectation is because the normal to \theta is \theta on the sphere.)

Now we have the following one-line inequality to control the movement of the Steiner point:

    \begin{align*} \|\mathrm{St}(K) - \mathrm{St}(K')\| & = d \cdot \| \mathbb{E}_{\theta : \|\theta\| = 1} [\theta (h_K(\theta) - h_{K'}(\theta))] \| \\ & \leq d \cdot \mathbb{E}_{\theta : \|\theta\| = 1} [ |h_K(\theta) - h_{K'}(\theta)| ] \end{align*}

It only remains to observe that |h_K(\theta) - h_{K'}(\theta)| \leq \mathrm{dist}_H(K,K') to obtain a proof that Steiner is d-Lipschitz (in fact as an exercise you can try to improve the above argument to obtain O(\sqrt{d})-Lipschitz). Now for nested convex body chasing we will have K' \subset K so that

|h_K(\theta) - h_{K'}(\theta)| = h_K(\theta) - h_{K'}(\theta) (i.e., no absolute values), and thus the upper bound on the movement will telescope! More precisely:

    \begin{align*} \sum_{t \geq 0} \|\mathrm{St}(K_t) - \mathrm{St}(K_{t+1})\| & \leq d \cdot \sum_{t \geq 0} \mathbb{E}_{\theta : \|\theta\| = 1} [ h_{K_t}(\theta) - h_{K_{t+1}}(\theta) ] \\ & = d \cdot \mathbb{E}_{\theta : \|\theta\| = 1} [ h_{K_0}(\theta) - h_{K_{T}}(\theta) ] \\ & \leq d \cdot \mathrm{dist}_H(K_0, K_T) \,. \end{align*}

This proves that the Steiner point is d-competitive for nested convex body chasing. How to generalize this to the non-nested case seemed difficult, and the breakthrough of Mark and the CMU team is to bring back an old friend of the online algorithm community: the work function.


The work function

The work function W_t : \mathbb{R}^d \rightarrow \mathbb{R} is defined as follows. For any point x \in \mathbb{R}^d it is the smallest cost one can pay to satisfy all the requests K_0, \hdots, K_t and end up at x. First observe that this function is convex. Indeed take two point x and y, and z the middle point between x and y (any point on the segment between x and y would be treated similarly). In the definition of the work function there is an associated trajectory for both x and y. By convexity of the requests, the sequence of mid points between those two trajectories is a valid trajectory for z! And moreover the movement of this mid trajectory is (by triangle inequality) less than the average of the movement of the trajectory for x and y. Hence W_t((x+y)/2) \leq (W_t(x) + W_t(y))/2.

A very simple property we will need from the work function is that the norm of its gradient carries some information on the current request. Namely, x \not\in K_t \Rightarrow \|\nabla W_t(x)\| = 1 (indeed, if x is not in the current request set, then the best way to end up there is to move to a point z in K_t and then move to x, and if K_t is a polytope then when you move a little bit x you don’t move z, so the cost is changing at rate 1, hence the norm of the gradient being 1). Or to put it differently: \|\nabla W_t(x)\| < 1 \Rightarrow x \in K_t.


The Sellke point

Mark’s beautiful idea (and the CMU team very much related –in fact equivalent– idea) to generalize the Steiner point to the non-nested case is to use the work function as a surrogate for the request. Steiner of K_t will clearly not work since all of K_t does not matter in the same way, as some points might be essentially irrelevant because they are very far from the previous requests, a fact which will be uncovered by the work function (while on the other hand the random direction from the Steiner point definition is oblivious to the geometry of the previous requests). So how to pick an appropriately random point in K_t while respecting the work function structure? Well we just saw that \|\nabla W_t(x)\| < 1 \Rightarrow x \in K_t. So how about taking a random direction \theta, and applying the inverse of the gradient map, namely the gradient of the Fenchel dual W_t^*? Recall that the Fenchel dual is defined by:

    \[ W_t^*(\theta) = \max_{x \in \mathbb{R}^d} \theta \cdot x - W_t(x) \,. \]

This is exactly the algorithm Mark proposes and it goes like this (Mark calls it the *functional Steiner point*):

    \[ \mathrm{Se}(K_t) = \mathbb{E}_{\theta : \|\theta\| < 1} [\nabla W_t^*(\theta)] \,. \]

Crucially this is a valid point, namely \mathrm{Se}(K_t) \in K_t. Moreover just like for the Steiner point we can apply the divergence theorem and obtain:

    \[ \mathrm{Se}(K_t) = d \cdot \mathbb{E}_{\theta : \|\theta\| = 1} [\theta W_t^*(\theta)] \,. \]

The beautiful thing is that we can now exactly repeat the nested convex body argument, since W_{t+1}^*(\theta) \leq W_t^*(\theta) for all \theta (just like we had in the nested case h_{K_{t+1}}(\theta) \leq h_{K_t}(\theta)) and so we get:

    \[ \sum_{t \geq 0} \|\mathrm{Se}(K_t) - \mathrm{Se}(K_{t+1})\| \leq d \cdot \mathbb{E}_{\theta : \|\theta\| = 1} [ W_{0}^*(\theta) - W^*_{{T}}(\theta) ] \]

The first term with W_0^* is just some additive constant which we can ignore, while the second term is bounded as follows:

    \begin{align*} \mathbb{E}_{\theta : \|\theta\| = 1} [ - W^*_{{T}}(\theta) ] & = \mathbb{E}_{\theta : \|\theta\| = 1} [ \min_{x \in \mathbb{R}^d} W_t(x) - \theta \cdot x ] \\ & \leq \min_{x \in \mathbb{R}^d} \mathbb{E}_{\theta : \|\theta\| = 1}[ W_t(x) - \theta \cdot x ] \\ & = \min_{x \in \mathbb{R}^d} W_t(x) \,. \end{align*}

Thus we exactly proved that the movement of the Sellke point is upped bounded by a constant plus d times the minimum of the work function, and the latter is nothing but the value of the oracle optimum!

Note that once everything is done and said, the proof has only two inequalities (triangle inequality as in the nested case, and the minimum of the expectation is less than the expectation of the minimum). It doesn’t get any better than this!

This entry was posted in Theoretical Computer Science. Bookmark the permalink.

32 Responses to "Convex body chasing, Steiner point, Sellke point, and SODA 2020 best papers"