This week at the Simons Institute we had the first Big Data workshop on Succinct Data Representations and Applications. Here I would like to briefly talk about one of the ‘stars’ of this workshop: the squared-length sampling technique. I will illustrate this method on three examples (taken from three seminal papers).

**Fast low rank approximation**

Frieze, Kannan and Vempala proposed the following algorithm to compute an approximate low rank decomposition of a matrix (this specific version is taken from Chapter 6 here). We denote by the columns of .

Let be i.i.d. random variables taking values in whose distribution is proportional to the *squared-length of the columns*, more precisely the probability that is equal to is proportional to . Let be such that its column is . Next compute an SVD of and let be the top left singular vectors of . The low rank approximation to is finally given by

One can prove that for this approximation satisfies:

where is the best rank approximation to .

The amazing feature of this algorithm is that the complexity of computing the projection matrix is *independent* of (one needs to be careful with how the sampling of is done but this can be taken care of). In fact it is even possible to obtain an algorithm whose complexity is independent of both and and depends only polynomially in and (though the version described above might be better in practice because the complexity is simply ).

**Fast graph sparsification**

Spielman and Teng looked at the following graph sparsification problem: given a weighted graph , find a ‘sparse’ weighted graph such that

where and are the graph Laplacians of and . The idea is that if one is able to find ‘fast enough’ a sparse approximation to then one can solve many problems on (which is dense) by instead solving them much faster on (think of solving linear systems of the form for example).

Spielman and Srivastava proposed a method to carry this graph sparsification task using the squared-length sampling technique. First they reduce the problem to the following: given vectors that forms a decomposition of the identity, that is

find a small subset of them such that appropriately reweighted they form an approximate decomposition of the identity. Rudelson showed that one can solve this problem by simply sampling vectors from i.i.d. proportionally to their squared-length and reweight them by the square root of the inverse probability of selecting this vector. In other words, he showed that if is an i.i.d. sequence in such that is proportional to then with high probability one has

Batson, Spielman and Srivastava also showed how to find this decomposition with a deterministic algorithm, see also this very nice survey of Naor.

**Approximation guarantee for k-means**

The last example is the now famous k-means++ algorithm of Arthur and Vassilvitskii. Given one would like to find minimizing the following quantity:

The following strategy gives a randomized -approximation algorithm to this NP-hard problem. First select at random from . Then select iteratively the new centers at random, by sampling a point at random from proportionally to their distance squared to the current nearest center.

## By Jeremie September 23, 2013 - 7:53 am

Why is there a weight over the selected columns of A in the B matrix ?

How to draw columns with probability proportionally to their Frobenius norm without having computed the norms ? I guess you have to introduce a assumption over the distribution of the norms of the columns.

Is it possible to just use the s columns with the highest frobenius norm ?

## By Sebastien Bubeck September 23, 2013 - 11:20 am

Hey Jeremie,

thanks for your questions! There are weights in the matrix B in order to ensure unbiasedness. Also the randomization is absolutely key, selecting the highest Frobenius norm columns would not work. As far as the precise sampling algorithm go you should check the linked papers, it is not difficult but not trivial either.