Embeddings of finite metric spaces in Hilbert space

In this post we discuss the following notion: Let (X,d_X) and (Y,d_Y) be two metric spaces, one says that X embeds into Y with distortion D>0 if there exists \lambda >0 and f : X \rightarrow Y such that for any x,y \in X,

    \[1 \leq \frac{d_Y(f(x), f(y))}{\lambda \ d_X(x,y)} \leq D .\]

We write this as X \overset{D}\hookrightarrow Y.

Note that if Y is a normed vector space (over the reals) then the scaling parameter \lambda in the above definition is unnecessary since one can always perform this scaling with the embedding f (indeed in this case d_Y(\lambda f(x), \lambda f(y)) = \lambda d_Y(f(x), f(y))).

Informally if X embeds into Y with distortion D then up to an ‘error’ of order D one can view X (once mapped with the embedding f) as a subset of Y. This is particularly useful when Y has a specific known structure that we may want to exploit, such as Y = \ell_p.

The case p=2 is of particular interest since then one embeds into a Hilbert space which comes with a lot of ‘structure’. Thus we mainly focus on embeddings in \ell_2, and we also focus on (X,d) being a finite metric space with n = |X|.

First observe that the case p=\infty (embeddings in \ell_{\infty}) is trivial:

Proposition X \overset{1}\hookrightarrow \ell_{\infty}^n

Proof: Consider the embedding f(x) = (d(x,z))_{z \in X} \in \mathbb{R}^n. It is obvious that

    \[\|(d(x,z) - d(y,z))_{z \in X}\|_{\infty} \geq d(x,y) ,\]

and the other side of the inequality follows from the triangle inequality for d. Thus one has \|f(x) - f(y)\|_{\infty} = d(x,y) which concludes the proof.

\Box

Could it be that the situation is that simple for p=2? It seems very unlikely because \ell_2 is a much more structured space than an arbitrary metric space. Consider for example X=\{r,x,y,z\} with d(r,x) = d(r,y) = d(r,z) = 1 and d(x,y) = d(x,z) = d(z,y) = 2 (that is X is a tree with root r and three leaves x,y,z). It is easy to see that it is impossible to embed X into \ell_2 with distortion 1. Indeed f(x), f(y) and f(z) would be on a Euclidean ball of radius 1 and they would be at distance 2 from each other, which is impossible for 3 points.

So what can we say in general about embeddings into \ell_2? The following beautiful result gives the answer:

Theorem (Bourgain 1985) X \overset{O(\log n)}\hookrightarrow \ell_{2}

It can be shown that this result is unimprovable in general. However for specific classes of finite metric spaces one can hope for some improvement and we refer to the survey by Linial for more details on this.

 

Embeddings in \ell_2 with small distortion

While Bourgain’s theorem is a very strong statement in the sense that it applies to any finite metric space, it is also quite weak in the sense that the distortion is large, of order \log n. For some applications it is desirable to have a bounded distortion that does not grow with the size of X, or even a distortion of order 1+\epsilon with small \epsilon. As we said previously the only way to obtain such a statement is to make further assumptions on X. We will not take this route in this post but let me just mention the Johnson-Lindenstrauss lemma which should be well-known to Machine Learners and alike:

Lemma (Johnson-Lindenstrauss 1984) Let X \subset \ell_2 then X\overset{1+\epsilon}\hookrightarrow \ell_{2}^{O(\log(n) / \epsilon^2)}.

Now what can we say in general about embeddings with small distortion into \ell_2? There is a well-known result in convex geometry with the same flavor than this question, this is Dvoretzky’s theorem: any symmetric convex body in \mathbb{R}^n has a section of dimension O(c(\epsilon) \log n) which is Euclidean up to an error of order \epsilon. The analogue for the theory of metric embeddings is called the non-linear Dvoretzky’s theorem and it goes as follows:

Theorem (Bourgain–Figiel–Milman 1986) Let \epsilon > 0. There exists Y \subset X with |Y| \geq c(\epsilon) \log n and Y\overset{1+\epsilon}\hookrightarrow \ell_{2}.

Again this theorem has a weakness: while the distortion is small it also only applies to a small subset of the original metric space (of size O(\log n)). In some sense we have now two extreme results, Bourgain’s theorem which applies to the entire metric space but with a large distortion, and Bourgain–Figiel–Milman which applies to a small small subset but with a small distortion. A natural question at this point is to try to ‘interpolate’ between these two results. Such a statement was first obtained by Bartal, Linial, Mendel and Naor (2002) and later improved in Mendel and Naor (2006):

Theorem (Mendel and Naor 2006) Let \epsilon > 0. There exists Y \subset X with |Y| \geq n^{1-\epsilon} and Y\overset{O(1/\epsilon)}\hookrightarrow \ell_{2}.

 

In future posts I will try to prove some of these theorems. This might take a while as in the coming weeks I will be traveling to various conferences (including COLT 2013 and ICML 2013) and I will try to blog about some of the stuff that I learn there.

This entry was posted in Theoretical Computer Science. Bookmark the permalink.