This week at the Simons Institute we had the first Big Data workshop on Succinct Data Representations and Applications. Here I would like to briefly talk about one of the ‘stars’ of this workshop: the squared-length sampling technique. I will illustrate this method on three examples (taken from three seminal papers).
Fast low rank approximation
Frieze, Kannan and Vempala proposed the following algorithm to compute an approximate low rank decomposition of a matrix (this specific version is taken from Chapter 6 here). We denote by the columns of .
Let be i.i.d. random variables taking values in whose distribution is proportional to the squared-length of the columns, more precisely the probability that is equal to is proportional to . Let be such that its column is . Next compute an SVD of and let be the top left singular vectors of . The low rank approximation to is finally given by
One can prove that for this approximation satisfies:
where is the best rank approximation to .
The amazing feature of this algorithm is that the complexity of computing the projection matrix is independent of (one needs to be careful with how the sampling of is done but this can be taken care of). In fact it is even possible to obtain an algorithm whose complexity is independent of both and and depends only polynomially in and (though the version described above might be better in practice because the complexity is simply ).
Fast graph sparsification
Spielman and Teng looked at the following graph sparsification problem: given a weighted graph , find a ‘sparse’ weighted graph such that
where and are the graph Laplacians of and . The idea is that if one is able to find ‘fast enough’ a sparse approximation to then one can solve many problems on (which is dense) by instead solving them much faster on (think of solving linear systems of the form for example).
Spielman and Srivastava proposed a method to carry this graph sparsification task using the squared-length sampling technique. First they reduce the problem to the following: given vectors that forms a decomposition of the identity, that is
find a small subset of them such that appropriately reweighted they form an approximate decomposition of the identity. Rudelson showed that one can solve this problem by simply sampling vectors from i.i.d. proportionally to their squared-length and reweight them by the square root of the inverse probability of selecting this vector. In other words, he showed that if is an i.i.d. sequence in such that is proportional to then with high probability one has
Approximation guarantee for k-means
The last example is the now famous k-means++ algorithm of Arthur and Vassilvitskii. Given one would like to find minimizing the following quantity:
The following strategy gives a randomized -approximation algorithm to this NP-hard problem. First select at random from . Then select iteratively the new centers at random, by sampling a point at random from proportionally to their distance squared to the current nearest center.