k-server, part 3: entropy regularization for weighted k-paging

If you have been following the first two posts (post 1, post 2), now is time to reap the rewards! I will show here how to obtain a O(\log(k))-competitive algorithm for (weighted) paging, i.e., when the metric space corresponds to the leafs of a weighted star. This was viewed as a breakthrough result 10 years ago (with a JACM publication by Bansal, Buchbinder and Naor in 2012), and for good reasons as this simplest instance of k-server was in fact the one studied in the seminal paper by Sleator and Tarjan in 1985 which introduced the competitive analysis of online algorithms (actually to be precise Sleator and Tarjan considered the unweighted case, for which a O(\log(k)) algorithm was known much before).

State space for weighted k-paging

Let w_i be the weight of the edge from leaf i \in [n] to the root. Recall from the previous post that we want to find a norm and a convex body that can express the Wasserstein-1 distance between two fractional k-server configurations.

Consider a fractional move from z \in \Delta_n(k) to z + \xi. Then clearly, the amount of mass |\xi_i| has to transfer through the edge (i, \mathrm{root}), so that the Wasserstein-1 distance is at least \sum_i w_i |\xi_i|. Furthermore there is trivially a transport plan achieving that much total mass transfer. In other words we just proved that in this case the appropriate norm is a weighted \ell_1 norm (namely \|\xi\| := \sum_{i=1}^n w_i |\xi_i|) and one can simply use the basic state space K = \{x \in [0,1]^n : \sum_i x_i = n-k\} (recall from the previous post that we have to work with anticonfiguration, and that the mapping to a configuration is simply given by z=1-x).

Applying the general mirror descent framework

Given a request at location r \in [n] and a current anticonfiguration x_0, our proposed (fractional) algorithm is to run the mirror descent dynamics with a continuous time linear cost c(t) = e_r from x(0)=x_0 (i.e., x'(t) = (\nabla^2 \Phi(x(t)))^{-1} (e_r + \lambda(t)) for some \lambda(t) \in N_K(x(t))) and until the first time t at which x_r(t) = 0 (i.e., one has a server at location r in z(t) := 1-x(t)). By the lemma at the end of the previous post one has (denote r(t) for the request being serviced at time t)

    \[ \int x_{r(t)}(t) dt \leq \mathrm{Lip}_{\|\cdot\|}(\Phi) \times \mathrm{OPT} + O(1) \]

One can think of \int x_{r(t)}(t) dt as a “virtual service cost”. In k-server this quantity has no real meaning, but the above inequality shows that this quantity, which only depends on the algorithm, is tightly related to the value of OPT (provided that \Phi has a small Lipschitz norm). Thus we see that we have two key desiderata for the choice of the mirror map \Phi: (i) it should have a small Lipschitz norm, (ii) one should be able to relate the movement cost \|x'(t)\| = \|(\nabla^2 \Phi(x(t)))^{-1} (e_r + \lambda(t))\| to this “virtual service cost” x_{r}(t), say up to a multiplicative factor \alpha. One would then obtain a \alpha \times \mathrm{Lip}_{\|\cdot\|}(\Phi)-competitive algorithm.

Entropy regularization for weighted k-paging

Let us look at (ii), and we shall see that the entropy regularization comes out very naturally. Ignore for a moment the Lagrange multiplier \lambda(t) and let us search for a separable regularizer of the form \Phi(x) = \sum_i \phi_i(x_i). We want to relate x_r(t) to w_r/\phi_i''(x_r(t)) (= (\nabla^2 \Phi(x(t)))^{-1} e_r). Making those two quantities equal gives \phi_i(x) = w_i x_i \log(x_i) and thus the regularizer is a weighted negentropy: \Phi(x) = \sum_i w_i x_i \log x_i.

We now need to verify that this relation between the virtual service cost and the movement cost remains true even when the Lagrange mutilplier \lambda(t) is taken into account. Note that because of the form of the state space K the multiplier contains a term of the form (\mu(t), \mu(t), \hdots, \mu(t)) (which corresponds to the constraint \sum_{i=1}^n x_i = n-k) and for each location a term forcing the derivative to be 0 if the value of the missing mass has reached 1. In other words we obtain the following dynamics for mirror descent with the weighted negentropy regularization:

    \begin{align*} & x_r'(t) = \frac{x_r(t)}{w_r} (-1 + \mu(t)) \\ & x_i'(t) = \frac{x_i(t)}{w_i} \mu(t) 1\{x_i(t) < 1\} \ \text{for} \ i \neq r \end{align*}

Notice that up to a factor 2 one can focus on controlling \|(x'(t))_-\| (that is only movement into a location is charged). In that view the Lagrange multipliers simply have no effect, since one has \mu(t) \geq 0 (indeed recall that \sum_{i=1}^n x_i'(t) = 0). Thus we see that the movement cost \|(x'(t))_-\| is exactly bounded by the virtual service cost x_r(t) in this case.

Making \Phi Lipschitz

It remains to deal with a non-trivial issue, namely the entropy is not Lipschitz on the simplex! A similar issue is faced in online learning when one tries to prove tracking expert regret bounds, i.e., bounds with respect to a shifting expert. The standard solution (perhaps first used by Herbster and Warmuth in 98, see also Blum and Burch 00) is to shift the variables so that they never get below some \delta>0, in which case the Lipschitz constant would be O(\log(1/\delta)). In the k-server scenario one can stop the dynamics when x_{r(t)}(t)=\delta (instead of x_{r(t)}(t)=0) provided that the mapping from x to the actual fractional configuration z is now given by z=\frac{1-x}{1-\delta}. This raises a final issue: the total amount of server mass in such a z is \frac{k}{1-\delta} > k. Next we show that if \delta is small enough then z can be “rounded” online to a fractional k-server configuration at the expense of a multiplicative O(1) movement. Precisely we show that \delta<1/(2k) is sufficient, which in turns gives a final competitive ratio of O(\log(k)) for weighted k-paging.

From k+\epsilon servers to k servers

Consider a fractional server configuration z with total mass k+\epsilon (i.e., \sum_{i=1}^n z_i = k+\epsilon), which we want to transform into a server configuration \sigma(z) with total mass k. A natural way to “round down” is at each location to put 0 if the mass was \leq \epsilon. Furthermore a mass of 1 should stay 1. This suggests the mapping \sigma(z_i) = \frac{z_i-\epsilon}{1-\epsilon} 1\{z_i > \epsilon\}, which is \frac{1}{1-\epsilon}-Lipschitz. Thus the movement of \sigma(z(t)) is controlled by the movement of z(t) up to a multiplicative factor \frac{1}{1-\epsilon}. Moreover one clearly has \sum_{i=1}^n \sigma(z_i) \leq k (in fact the inequality can be strict, in which case one should think of the “lost mass” as being stored at the root, this incurs no additional movement cost).

This entry was posted in Theoretical Computer Science. Bookmark the permalink.

30 Responses to "k-server, part 3: entropy regularization for weighted k-paging"