The preferential attachment model, introduced in 1992 by Mahmoud and popularized in 1999 by Barabási and Albert, has attracted a lot of attention in the last decade. In its simplest form it describes the evolution of a random tree. Formally we denote by the preferential attachment tree on vertices which is defined by induction as follows. First is the unique tree on two vertices. Then, given , is formed from by adding a new vertex and a new edge where is selected at random among vertices in according to the following probability distribution:

where denotes the degree of vertex in a tree . In other words vertices of large degrees are more likely to attract the new nodes. This model of evolution is argued to be a good approximation for things such a network of citations, or the internet network.

One of the main reason for the success of the preferential attachment model is the following theorem, which shows that the degree distribution in follows a power-law, a feature that many real-world networks (such as the internet) exhibit but which is not reproduced by standard random graph models such as the Erdös-Rényi model.

Theorem [Bollobás, Riordan, Spencer and Tusnády (2001)]Let be fixed. Then as , the proportion of vertices with degree tends in probability to

While the above theorem is a fine and interesting mathematical result, I do not view it as the critical aspect of the preferential attachment model (note that Wikipedia disagrees). In my opinion is simply interesting merely because of its natural rule of evolution.

Now think about the application of the PA model to the internet. Of course there is a few obvious objections, such as the fact that in a website can only link to one other website. While this is clearly ridiculous I think that still contains the essence of what one would like to capture to model the evolution of the internet. However there is one potentially important aspect which is overlooked: in the early days of the internet the PA model was probably very far from being a good approximation to the evolution of the network. It is perhaps reasonable to assume that after 1995 the network was evolving according to PA, but certainly from 1970 to 1995 the evolution followed fundamentally different rules. This observation suggests to study the preferential attachment model *with a seed*.

Thus we are now interested in , where is a finite seed tree. Formally is also defined by induction, where and is formed from as before. A very basic question which seems to have been overlooked in the literature is the following: **what is the influence of the seed on as goes to infinity?**

In our recent joint work with Elchanan Mossel and Miklos Racz we looked exactly at this question. More precisely we ask the following: given two seed trees and , do the distributions and remain separated (say in total variation distance) as goes to infinity? In other words we are interested in the following quantity:

A priori it could be that for any and , which would mean that the seed has no influence and that the preferential attachment “forgets” its initial conditions. We prove that this is far from true:

Theorem [Bubeck, Mossel and Racz (2014)]Let and be two finite trees on at least vertices. If the degree distributions in and are different, then .

If I wanted to make a bold statement I could say that this theorem implies the following: by looking at the internet network today, one can still “see” the influence of the topological structure of the internet back in the 90’s. In other words to a certain extent one can go back in time and potentially infer some properties that people may have believed to be lost (perhaps some army’s secrets hidden in the structure of the ARPANET?). Of course at this stage this is pure science fiction, but the theorem certainly leaves that possibility open. Note that we believe that the theorem can even be strengthen to the following statement:

ConjectureLet and be two finite trees on at least vertices. If and are non-isomorphic, then .

These statements show that even when is large one can still “see” in the influence of the original seed . However by considering the total variation distance we allow global statistics that depend on entire tree . What about local statistics that could be computed by an agent looking only at a finite neighborhood around her? Mathematically this question can be interpreted in the framework of the Benjamini-Schramm limit. Recall that a sequence of random graphs tends to a random infinite rooted tree ( is the random root) if for any , the random ball of radius around a random vertex in tends in distribution to the random ball of radius around in . In other words when is large enough a random agent cannot tell if she is in or in by looking at a finite neighborhood around her. One has the following theorem for the weak limit of the PA model:

Theorem [Berger, Borgs, Chayes and Saberi (2014)]The Benjamini-Schramm limit of is the Pólya-point graph with .

We extend this result to an arbitrary seed and show that locally the seed has no influence:

Theorem [Bubeck, Mossel and Racz (2014)]For any tree the Benjamini-Schramm limit of is the Pólya-point graph with .

Thus, while the army’s secret of the 90’s might be at risk if one looks at the overall topology of the current internet network, these secrets are safe from any local agent who would only access a (random) finite part of the network.

These new results on the PA model naturally lead to a host of new problems. We end the paper with a list of 7 open problems, I recommend to take a look at them (and try to solve them)!

## By Community Detection in Graphs — a Casual Tour | Math ∩ Programming May 19, 2014 - 10:30 pm

[…] model, but I think it’s likely to be trivial and uninteresting. On the other hand, Bubeck et al. have looked at changing the initial graph (the “seed”) from a 2-clique to […]

## By Jeremy Kun May 15, 2014 - 5:28 pm

I can’t say what the first reference is, but someone I talked to about this post seemed to be at least mildly bothered by it.

A different question: if the initial seed is a clique whose size depends on T, do you know if it’s easy to recover the clique after T steps? Or, since the clique may have a nontrivial chance to grow, to find any clique of comparable size?

## By morteza April 21, 2014 - 8:43 pm

I am missing something in understanding the conjecture. It looks that the degree distribution of PA(n+1,T) only depends on the degree distribution of the previous step (PA(n,T)) and not on its tree structure. So if S and T have the same degree distribution, then \delta(S,T) = 0 even if S and T are not isomorph.

What am i missing?

## By Sebastien Bubeck April 22, 2014 - 8:14 am

Your first observation is absolutely correct: if T and S have the same degree sequences then one can couple the evolution of PA(n,T) and PA(n, S) so that they have the same degree sequences for any n. Thus any statistics based on the degree sequence (such as the maximum degree) will fail to distinguish between PA(n,T) and PA(n,S) in this case. However one can think of more complicated statistics (we propose one in the paper).

## By Jeremy Kun April 8, 2014 - 12:33 pm

It looks like the history of the preferential attachment model predates the 90’s by quite a bit…

## By Sebastien Bubeck April 9, 2014 - 5:16 pm

Yes you’re right, sorry about that… What is the first mathematical reference in your opinion?