The Wayback Machine - https://web.archive.org/web/20210228100925/https://blogs.princeton.edu/imabandit/2017/01/01/guest-post-by-miklos-racz-confidence-sets-for-the-root-in-uniform-and-preferential-attachment-trees/

Guest post by Miklos Racz: Confidence sets for the root in uniform and preferential attachment trees

In the final post of this series (see here for the previous posts) we consider yet another point of view for understanding networks. In the previous posts we studied random graph models with community structure and also models with an underlying geometry. While these models are important and lead to fascinating problems, they are also static in time. Many real-world networks are constantly evolving, and their understanding requires models that reflect this. This point of view brings about a host of new interesting and challenging statistical inference questions that concern the temporal dynamics of these networks. In this post we study such questions for two canonical models of randomly growing trees: uniform attachment and preferential attachment.

Models of growing graphs

A natural general model of randomly growing graphs can be defined as follows. For n \geq k \geq 1 and a graph S on k vertices, define the random graph G(n,S) by induction. First, set G(k,S) = S; we call S the seed of the graph evolution process. Then, given G(n,S), G(n+1, S) is formed from G(n,S) by adding a new vertex and some new edges according to some adaptive rule. If S is a single vertex, we write simply G(n) instead of G(n,S).

There are several rules one can consider; here we study perhaps the two most natural ones: uniform attachment and preferential attachment (denoted \mathrm{UA} and \mathrm{PA} in the following). Moreover, for simplicity we focus on the case of growing trees, where at every time step a single edge is added. Uniform attachment trees are defined recursively as follows: given \mathrm{UA}(n,S), \mathrm{UA}(n+1,S) is formed from \mathrm{UA}(n,S) by adding a new vertex u and adding a new edge uv where the vertex v is chosen uniformly at random among vertices of \mathrm{UA} \left( n, S \right), independently of all past choices. Preferential attachment trees are defined similarly, except that v is chosen with probability proportional to its degree:

    \[ \mathbb{P}\left( v = i \, \middle| \, \mathrm{PA}(n, S) \right) = \frac{d_{\mathrm{PA}(n, S)}(i)}{2 \left( n - 1 \right)}, \]

where for a tree T we denote by d_{T} (u) the degree of vertex u in T.

Questions: detection and estimation

The most basic questions to consider are those of detection and estimation. Can one detect the influence of the initial seed graph? If so, is it possible to estimate the seed? Can one find the root if the process was started from a single node? We introduce these questions in the general model of randomly growing graphs described above, even though we study them in the special cases of uniform and preferential attachment trees later.

The detection question can be rephrased in the terminology of hypothesis testing. Given two potential seed graphs S and T, and an observation R which is a graph on n vertices, one wishes to test whether R \sim G(n, S) or R \sim G(n, T). The question then boils down to whether one can design a test with asymptotically (in n) nonnegligible power. This is equivalent to studying the total variation distance between G(n, S) and G(n, T), so we naturally define

    \begin{equation*} \delta(S, T) := \lim_{n \to \infty} \mathrm{TV}(G(n, S), G(n, T)), \end{equation*}

where G(n,S) and G(n,T) are random elements in the finite space of unlabeled graphs with n vertices. This limit is well-defined because \mathrm{TV}(G(n, S), G(n, T)) is nonincreasing in n (since if G(n,S) = G(n,T), then the evolution of the random graphs can be coupled such that G(n', S) = G(n', T) for all n' \geq n) and always nonnegative.

If the seed has an influence, it is natural to ask whether one can estimate S from G(n,S) for large n. If so, can the subgraph corresponding to the seed be located in G(n,S)? We study this latter question in the simple case when the process starts from a single vertex called the root. (In the case of preferential attachment, starting from a single vertex is not well-defined; in this case we start the process from a single edge and the goal is to find one of its endpoints.) A root-finding algorithm is defined as follows. Given G(n) and a target accuracy \epsilon \in (0,1), a root-finding algorithm outputs a set H\left( G(n), \epsilon \right) of K(\epsilon) vertices such that the root is in H\left( G(n), \epsilon \right) with probability at least 1-\epsilon (with respect to the random generation of G(n)).

An important aspect of this definition is that the size of the output set is allowed to depend on \epsilon, but not on the size n of the input graph. Therefore it is not clear that root-finding algorithms exist at all. Indeed, there are examples when they do not exist: consider a path that grows by picking one of its two ends at random and extending it by a single edge. However, it turns out that in many interesting cases root-finding algorithms do exist. In such cases it is natural to ask for the best possible value of K(\epsilon).

The influence of the seed

Consider distinguishing between a PA tree started from a star with 10 vertices, S_{10}, and a PA tree started from a path with 10 vertices, P_{10}. Since the preferential attachment mechanism incorporates the rich-get-richer phenomenon, one expects the degree of the center of the star in \mathrm{PA}(n,S_{10}) to be significantly larger than the degree of any of the initial vertices in the path in \mathrm{PA}(n,P_{10}). This intuition guided Bubeck, Mossel, and Racz when they initiated the theoretical study of the influence of the seed in PA trees. They showed that this intuition is correct: the limiting distribution of the maximum degree of the PA tree indeed depends on the seed. Using this they were able to show that for any two seeds S and T with at least 3 vertices and different degree profiles we have

    \[\delta_{\mathrm{PA}} (S,T) > 0.\]

However, statistics based solely on degrees cannot distinguish all pairs of nonisomorphic seeds. This is because if S and T have the same degree profiles, then it is possible to couple \mathrm{PA}(n,S) and \mathrm{PA}(n,T) such that they have the same degree profiles for every n. In order to distinguish between such seeds, it is necessary to incorporate information about the graph structure into the statistics that are studied. This was done successfully by Curien, Duquesne, Kortchemski, and Manolescu, who analyzed statistics that measure the geometry of large degree nodes. These results can be summarized in the following theorem.

Theorem: The seed has an influence in PA trees in the following sense. For any trees S and T that are nonisomorphic and have at least 3 vertices, we have

    \[\delta_{\mathrm{PA}}(S,T) > 0.\]

In the case of uniform attachment, degrees do not play a special role, so initially one might even think that the seed has no influence in the limit. However, it turns out that the right perspective is not to look at degrees but rather the sizes of appropriate subtrees (we shall discuss such statistics later). By extending the approach of Curien et al. to deal with such statistics, Bubeck, Eldan, Mossel, and Racz showed that the seed has an influence in uniform attachment trees as well.

Theorem: The seed has an influence in UA trees in the following sense. For any trees S and T that are nonisomorphic and have at least 3 vertices, we have

    \[\delta_{\mathrm{UA}}(S,T) > 0.\]

These results, together with a lack of examples showing opposite behavior, suggest that for most models of randomly growing graphs the seed has influence.

Question: How common is the phenomenon observed in Theorems 1 and 2? Is there a natural large class of randomly growing graphs for which the seed has an influence? That is, models where for any two seeds S and T (perhaps satisfying an extra condition), we have \delta (S,T) > 0. Is there a natural model where the seed has no influence?

Finding Adam

These theorems about the influence of the seed open up the problem of finding the seed. Here we present the results of Bubeck, Devroye, and Lugosi who first studied root-finding algorithms in the case of UA and PA trees.

They showed that root-finding algorithms indeed exist for PA trees and that the size of the best confidence set is polynomial in 1/\epsilon.

Theorem: There exists a polynomial time root-finding algorithm for PA trees with

    \[K(\epsilon) \leq c \frac{\log^{2} (1/\epsilon)}{\epsilon^{4}}\]

for some finite constant c. Furthermore, there exists a positive constant c' such that any root-finding algorithm for PA trees must satisfy

    \[K(\epsilon) \geq \frac{c'}{\epsilon}.\]

They also showed the existence of root-finding algorithms for UA trees. In this model, however, there are confidence sets whose size is subpolynomial in 1/\epsilon. Moreover, the size of any confidence set has to be at least superpolylogarithmic in 1/\epsilon.

Theorem: There exists a polynomial time root-finding algorithm for UA trees with

    \[K(\epsilon) \leq \exp \left( c \tfrac{\log (1/\epsilon)}{\log \log (1/\epsilon)} \right)\]

for some finite constant c. Furthermore, there exists a positive constant c' such that any root-finding algorithm for UA trees must satisfy

    \[K(\epsilon) \geq \exp \left( c' \sqrt{\log (1/\epsilon)} \right).\]

These theorems show an interesting quantitative difference between the two models: finding the root is exponentially more difficult in PA than in UA. While this might seem counter-intuitive at first, the reason behind this can be traced back to the rich-get-richer phenomenon: the effect of a rare event where not many vertices attach to the root gets amplified by preferential attachment, making it harder to find the root.

Proofs using Polya urns

We now explain the basic ideas that go into proving Theorems 3 and 4 and prove some simpler cases. While UA and PA are arguably the most basic models of randomly growing graphs, the evolution of various simple statistics, such as degrees or subtree sizes, can be described using even simpler building blocks: Polya urns. In this post we assume familiarity with Polya urns; we refer to the lecture notes for a primer on Polya urns for the interested reader.

A root-finding algorithm based on the centroid

We start by presenting a simple root-finding algorithm for UA trees. This algorithm is not optimal, but its analysis is simple and highlights the basic ideas.

For a tree T, if we remove a vertex v \in V(T), then the tree becomes a forest consisting of disjoint subtrees of the original tree. Let \psi_{T} \left( v \right) denote the size (i.e., the number of vertices) of the largest component of this forest. A vertex v that minimizes \psi_{T} \left( v \right) is known as a centroid of T; one can show that there can be at most two centroids. We define the confidence set H_{\psi} by taking the set of K vertices with smallest \psi values.

Theorem: The centroid-based H_{\psi} defined above is a root-finding algorithm for the UA tree. More precisely, if

    \[K \geq \frac{5}{2} \frac{\log \left( 1 / \epsilon \right)}{\epsilon},\]

then

    \[ \liminf_{n \to \infty} \mathbb{P} \left( 1 \in H_{\psi} \left( \mathrm{UA} \left( n \right)^{\circ} \right) \right) \geq 1 - \frac{4\epsilon}{1-\epsilon}, \]

where 1 denotes the root, and \mathrm{UA} \left( n \right)^{\circ} denotes the unlabeled version of \mathrm{UA} \left( n \right).

Proof: We label the vertices of the UA tree in chronological order. We start by introducing some notation that is useful throughout the proof. For 1 \leq i \leq k, denote by T_{i,k} the tree containing vertex i in the forest obtained by removing in \mathrm{UA}\left( n \right) all edges between vertices \left\{ 1, \dots, k \right\}. See the figure for an illustration.

polya_blog

Let \left| T \right| denote the size of a tree T, i.e., the number of vertices it contains. Note that the vector

    \[\left( \left| T_{1, k} \right|, \dots, \left| T_{k,k} \right| \right)\]

evolves according to the classical P\’olya urn with k colors, with initial state \left( 1, \dots, 1 \right). Therefore the normalized vector

    \[\left( \left| T_{1, k} \right|, \dots, \left| T_{k,k} \right| \right) / n\]

converges in distribution to a Dirichlet distribution with parameters \left( 1, \dots, 1 \right).

Now observe that

    \[ \mathbb{P} \left( 1 \notin H_{\psi} \right) \leq \mathbb{P} \left( \exists i > K : \psi \left( i \right) \leq \psi \left( 1 \right) \right) \leq \mathbb{P} \left( \psi \left( 1 \right) \geq \left( 1 - \epsilon \right) n \right) + \mathbb{P} \left( \exists i > K : \psi \left( i \right) \leq \left( 1 - \epsilon \right) n \right). \]

We bound the two terms appearing above separately, starting with the first one. Note that

    \[\psi \left( 1 \right) \leq \max \left\{ \left| T_{1,2} \right|, \left| T_{2,2} \right| \right\},\]

and both \left| T_{1,2} \right| / n and \left| T_{2,2} \right| / n converge in distribution to a uniform random variable in \left[ 0, 1 \right]. Hence a union bound gives us that

    \[ \limsup_{n \to \infty} \mathbb{P} \left( \psi \left( 1 \right) \geq \left( 1 - \epsilon \right) n \right) \leq 2 \lim_{n \to \infty} \mathbb{P} \left( \left| T_{1,2} \right| \geq \left( 1 - \epsilon \right) n \right) = 2 \epsilon. \]

For the other term, first observe that for any i > K we have

    \[ \psi \left( i \right) \geq \min_{1 \leq k \leq K} \sum_{j = 1, j \neq k}^{K} \left| T_{j,K} \right|. \]

Now using results on P\’olya urns we have that for every k such that 1 \leq k \leq K, the random variable

    \[\frac{1}{n} \sum_{j = 1, j \neq k}^{K} \left| T_{j,K} \right|\]

converges in distribution to the \mathrm{Beta} \left( K - 1, 1 \right) distribution. Hence by a union bound we have that

    \begin{align*} \limsup_{n \to \infty} \mathbb{P} \left( \exists i > K : \psi \left( i \right) \leq \left( 1 - \epsilon \right) n \right) &\leq \lim_{n \to \infty} \mathbb{P} \left( \exists 1 \leq k \leq K : \sum_{j = 1, j \neq k}^{K} \left| T_{j,K} \right| \leq \left( 1 - \epsilon \right) n \right) \\ &\leq K \left( 1 - \epsilon \right)^{K - 1}. \end{align*}

Putting together the two bounds gives that

    \[ \limsup_{n \to \infty} \mathbb{P} \left( 1 \notin H_{\psi} \right) \leq 2 \epsilon + K \left( 1 - \epsilon \right)^{K-1}, \]

which concludes the proof due to the assumption on K.

 

 

The same estimator H_{\psi} works for the preferential attachment tree as well, if one takes

    \[K \geq C \frac{\log^{2} \left( 1 / \epsilon \right)}{\epsilon^{4}}\]

for some positive constant C. The proof mirrors the one above, but involves a few additional steps; we refer to Bubeck et al. for details.

For uniform attachment the bound on K given by Theorem 5 is not optimal. It turns out that it is possible to write down the maximum likelihood estimator (MLE) for the root in the UA model; we do not do so here, see Bubeck et al. One can view the estimator H_{\psi} based on the centroid as a certain “relaxation” of the MLE. By constructing a certain “tighter” relaxation of the MLE, one can obtain a confidence set with size subpolynomial in 1/\epsilon as described in Theorem 4. The analysis of this is the most technical part of Bubeck et al. and we refer to this paper for more details.

Lower bounds

As mentioned above, the MLE for the root can be written down explicitly. This aids in showing a lower bound on the size of a confidence set. In particular, Bubeck et al. define a set of trees whose probability of occurrence under the UA model is not too small, yet the MLE provably fails, giving the lower bound described in Theorem 4. We refer to Bubeck et al. for details.

On the other hand, for the PA model it is not necessary to use the structure of the MLE to obtain a lower bound. A simple symmetry argument suffices to show the lower bound in Theorem 3, which we now sketch.

First observe that the probability of error for the optimal procedure is non-decreasing with n, since otherwise one could simulate the process to obtain a better estimate. Thus it suffices to show that the optimal procedure must have a probability of error of at least \epsilon for some finite n. We show that there is some finite n such that with probability at least 2\epsilon, the root is isomorphic to at least 2c / \epsilon vertices in \mathrm{PA}(n). Thus if a procedure outputs at most c/\epsilon vertices, then it must make an error at least half the time (so with probability at least \epsilon).

Observe that the probability that the root is a leaf in \mathrm{PA}(n) is

    \[\frac{1}{2} \times \frac{3}{4} \times \dots \times \left( 1 - \frac{1}{2n} \right) = \Theta \left( 1 / \sqrt{n} \right).\]

By choosing n = \Theta \left( 1 / \epsilon^{2} \right), this happens with probability \Theta \left( \epsilon \right). Furthermore, conditioned on the root being a leaf, with constant probability vertex 2 is connected to \Theta \left( \sqrt{n} \right) = \Theta \left( 1 / \epsilon \right) leaves, which are then isomorphic to the root.

Open problems

There are many open problems and further directions that one can pursue; the four papers we have discussed contain 20 open problems and conjectures alone, and we urge the reader to have a look and try to solve them!

This entry was posted in Probability theory, Random graphs. Bookmark the permalink.

Leave a reply