This week at the Simons Institute we had the first Big Data workshop on Succinct Data Representations and Applications. Here I would like to briefly talk about one of the ‘stars’ of this workshop: the squared-length sampling technique. I will illustrate this method on three examples (taken from three seminal papers).
Fast low rank approximation
Frieze, Kannan and Vempala proposed the following algorithm to compute an approximate low rank decomposition of a matrix (this specific version is taken from Chapter 6 here). We denote by
the columns of
.
Let be
i.i.d. random variables taking values in
whose distribution is proportional to the squared-length of the columns, more precisely the probability that
is equal to
is proportional to
. Let
be such that its
column is
. Next compute an SVD of
and let
be the
top left singular vectors of
. The low rank approximation to
is finally given by
One can prove that for this approximation satisfies:
where is the best rank
approximation to
.
The amazing feature of this algorithm is that the complexity of computing the projection matrix is independent of
(one needs to be careful with how the sampling of
is done but this can be taken care of). In fact it is even possible to obtain an algorithm whose complexity is independent of both
and
and depends only polynomially in
and
(though the version described above might be better in practice because the complexity is simply
).
Fast graph sparsification
Spielman and Teng looked at the following graph sparsification problem: given a weighted graph , find a ‘sparse’ weighted graph
such that
where and
are the graph Laplacians of
and
. The idea is that if one is able to find ‘fast enough’ a sparse approximation
to
then one can solve many problems on
(which is dense) by instead solving them much faster on
(think of solving linear systems of the form
for example).
Spielman and Srivastava proposed a method to carry this graph sparsification task using the squared-length sampling technique. First they reduce the problem to the following: given vectors that forms a decomposition of the identity, that is
find a small subset of them such that appropriately reweighted they form an approximate decomposition of the identity. Rudelson showed that one can solve this problem by simply sampling vectors from
i.i.d. proportionally to their squared-length and reweight them by the square root of the inverse probability of selecting this vector. In other words, he showed that if
is an i.i.d. sequence in
such that
is proportional to
then with high probability one has
Batson, Spielman and Srivastava also showed how to find this decomposition with a deterministic algorithm, see also this very nice survey of Naor.
Approximation guarantee for k-means
The last example is the now famous k-means++ algorithm of Arthur and Vassilvitskii. Given one would like to find
minimizing the following quantity:
The following strategy gives a randomized -approximation algorithm to this NP-hard problem. First select
at random from
. Then select iteratively the new centers at random, by sampling a point at random from
proportionally to their distance squared to the current nearest center.
By Jeremie September 23, 2013 - 7:53 am
Why is there a weight over the selected columns of A in the B matrix ?
How to draw columns with probability proportionally to their Frobenius norm without having computed the norms ? I guess you have to introduce a assumption over the distribution of the norms of the columns.
Is it possible to just use the s columns with the highest frobenius norm ?
By Sebastien Bubeck September 23, 2013 - 11:20 am
Hey Jeremie,
thanks for your questions! There are weights in the matrix B in order to ensure unbiasedness. Also the randomization is absolutely key, selecting the highest Frobenius norm columns would not work. As far as the precise sampling algorithm go you should check the linked papers, it is not difficult but not trivial either.