First Big Data workshop at the Simons Institute

This week at the Simons Institute we had the first Big Data workshop on Succinct Data Representations and Applications. Here I would like to briefly talk about one of the ‘stars’ of this workshop: the squared-length sampling technique. I will illustrate this method on three examples (taken from three seminal papers).


Fast low rank approximation

Frieze, Kannan and Vempala proposed the following algorithm to compute an approximate low rank decomposition of a matrix A \in \mathbb{R}^{m \times n} (this specific version is taken from Chapter 6 here). We denote by A_1, \hdots A_n the columns of A.

Let j_1, \hdots, j_s be s i.i.d. random variables taking values in [n] whose distribution is proportional to the squared-length of the columns, more precisely the probability that j_1 is equal to i \in [n] is proportional to \|A_i\|_2^2. Let B \in \mathbb{R}^{m \times s} be such that its i^{th} column is \frac{1}{\sqrt{s p_{j_i}}} A_{j_i}. Next compute an SVD of B and let u_1, \hdots, u_k be the k top left singular vectors of B. The low rank approximation to A is finally given by

    \[\tilde{A} = \sum_{i=1}^k u_i u_i^{\top} A .\]

One can prove that for s = 4 k / \epsilon^2 this approximation satisfies:

    \[\mathbb{E} \|A - \tilde{A}\|_F^2 \leq \|A - A_k\|^2_F + \epsilon \|A\|_F^2 ,\]

where A_k is the best rank k approximation to A.

The amazing feature of this algorithm is that the complexity of computing the projection matrix \sum_{i=1}^k u_i u_i^{\top} is independent of n (one needs to be careful with how the sampling of j_1, \hdots, j_s is done but this can be taken care of). In fact it is even possible to obtain an algorithm whose complexity is independent of both n and m and depends only polynomially in k and 1/\epsilon (though the version described above might be better in practice because the complexity is simply m k^2 /\epsilon^4).


Fast graph sparsification

Spielman and Teng looked at the following graph sparsification problem: given a weighted graph G, find a ‘sparse’ weighted graph H such that

    \[(1-\epsilon) L_G \preceq L_H \preceq (1+\epsilon) L_G ,\]

where L_H and L_G are the graph Laplacians of H and G. The idea is that if one is able to find ‘fast enough’ a sparse approximation H to G then one can solve many problems on G (which is dense) by instead solving them much faster on H (think of solving linear systems of the form L_G x = b for example).

Spielman and Srivastava proposed a method to carry this graph sparsification task using the squared-length sampling technique. First they reduce the problem to the following: given vectors v_1, \hdots, v_N that forms a decomposition of the identity, that is

    \[\sum_{i=1}^N v_i v_i^{\top} = I_n ,\]

find a small subset of them such that appropriately reweighted they form an approximate decomposition of the identity. Rudelson showed that one can solve this problem by simply sampling s = O \left(n \log(n) / \epsilon^2 \right) vectors from \{v_1, \hdots, v_N\} i.i.d. proportionally to their squared-length and reweight them by the square root of the inverse probability of selecting this vector. In other words, he showed that if j_1, \hdots, j_s is an i.i.d. sequence in [N] such that p_i = \mathbb{P}(j_1 = i) is proportional to \|v_i\|_2^2 then with high probability one has

    \[(1-\epsilon) I_n \preceq \sum_{t=1}^s \frac{1}{p_t} v_t v_t^{\top} \preceq (1+\epsilon) I_n.\]

Batson, Spielman and Srivastava also showed how to find this decomposition with a deterministic algorithm, see also this very nice survey of Naor.


Approximation guarantee for k-means

The last example is the now famous k-means++ algorithm of Arthur and Vassilvitskii. Given x_1, \hdots, x_N \in \mathbb{R}^n one would like to find c_1, \hdots, c_k \in \mathbb{R}^n minimizing the following quantity:

    \[\sum_{t=1}^n \min_{1 \leq i \leq k} \|x_t - c_i\|_2^2 .\]

The following strategy gives a randomized O(\log(k))-approximation algorithm to this NP-hard problem. First select c_1 at random from x_1, \hdots, x_n. Then select iteratively the new centers at random, by sampling a point at random from x_1, \hdots, x_n proportionally to their distance squared to the current nearest center.

This entry was posted in Conference/workshop. Bookmark the permalink.

2 Responses to "First Big Data workshop at the Simons Institute"

    • Avatar photo