The preferential attachment model, introduced in 1992 by Mahmoud and popularized in 1999 by Barabási and Albert, has attracted a lot of attention in the last decade. In its simplest form it describes the evolution of a random tree. Formally we denote by the preferential attachment tree on
vertices which is defined by induction as follows. First
is the unique tree on two vertices. Then, given
,
is formed from
by adding a new vertex
and a new edge
where
is selected at random among vertices in
according to the following probability distribution:
where denotes the degree of vertex
in a tree
. In other words vertices of large degrees are more likely to attract the new nodes. This model of evolution is argued to be a good approximation for things such a network of citations, or the internet network.
One of the main reason for the success of the preferential attachment model is the following theorem, which shows that the degree distribution in follows a power-law, a feature that many real-world networks (such as the internet) exhibit but which is not reproduced by standard random graph models such as the Erdös-Rényi model.
Theorem [Bollobás, Riordan, Spencer and Tusnády (2001)] Let
be fixed. Then as
, the proportion of vertices with degree
tends in probability to
While the above theorem is a fine and interesting mathematical result, I do not view it as the critical aspect of the preferential attachment model (note that Wikipedia disagrees). In my opinion is simply interesting merely because of its natural rule of evolution.
Now think about the application of the PA model to the internet. Of course there is a few obvious objections, such as the fact that in a website can only link to one other website. While this is clearly ridiculous I think that
still contains the essence of what one would like to capture to model the evolution of the internet. However there is one potentially important aspect which is overlooked: in the early days of the internet the PA model was probably very far from being a good approximation to the evolution of the network. It is perhaps reasonable to assume that after 1995 the network was evolving according to PA, but certainly from 1970 to 1995 the evolution followed fundamentally different rules. This observation suggests to study the preferential attachment model with a seed.
Thus we are now interested in , where
is a finite seed tree. Formally
is also defined by induction, where
and
is formed from
as before. A very basic question which seems to have been overlooked in the literature is the following: what is the influence of the seed
on
as
goes to infinity?
In our recent joint work with Elchanan Mossel and Miklos Racz we looked exactly at this question. More precisely we ask the following: given two seed trees and
, do the distributions
and
remain separated (say in total variation distance) as
goes to infinity? In other words we are interested in the following quantity:
A priori it could be that for any
and
, which would mean that the seed has no influence and that the preferential attachment “forgets” its initial conditions. We prove that this is far from true:
Theorem [Bubeck, Mossel and Racz (2014)] Let
and
be two finite trees on at least
vertices. If the degree distributions in
and
are different, then
.
If I wanted to make a bold statement I could say that this theorem implies the following: by looking at the internet network today, one can still “see” the influence of the topological structure of the internet back in the 90’s. In other words to a certain extent one can go back in time and potentially infer some properties that people may have believed to be lost (perhaps some army’s secrets hidden in the structure of the ARPANET?). Of course at this stage this is pure science fiction, but the theorem certainly leaves that possibility open. Note that we believe that the theorem can even be strengthen to the following statement:
Conjecture Let
and
be two finite trees on at least
vertices. If
and
are non-isomorphic, then
.
These statements show that even when is large one can still “see” in
the influence of the original seed
. However by considering the total variation distance we allow global statistics that depend on entire tree
. What about local statistics that could be computed by an agent looking only at a finite neighborhood around her? Mathematically this question can be interpreted in the framework of the Benjamini-Schramm limit. Recall that a sequence of random graphs
tends to a random infinite rooted tree
(
is the random root) if for any
, the random ball of radius
around a random vertex
in
tends in distribution to the random ball of radius
around
in
. In other words when
is large enough a random agent cannot tell if she is in
or in
by looking at a finite neighborhood around her. One has the following theorem for the weak limit of the PA model:
Theorem [Berger, Borgs, Chayes and Saberi (2014)] The Benjamini-Schramm limit of
is the Pólya-point graph with
.
We extend this result to an arbitrary seed and show that locally the seed has no influence:
Theorem [Bubeck, Mossel and Racz (2014)] For any tree
the Benjamini-Schramm limit of
is the Pólya-point graph with
.
Thus, while the army’s secret of the 90’s might be at risk if one looks at the overall topology of the current internet network, these secrets are safe from any local agent who would only access a (random) finite part of the network.
These new results on the PA model naturally lead to a host of new problems. We end the paper with a list of 7 open problems, I recommend to take a look at them (and try to solve them)!
By Babak April 5, 2016 - 2:55 pm
Have you seen these papers by any chance? Because they address the same question—the similarities of the findings might be useful.
bit.ly/1UTHhVq (pure preferential attachment as well as uniform attachment)
bit.ly/1RM0xOq (shifted-linear preferential attachment)
By Sebastien Bubeck July 27, 2016 - 2:29 pm
Hi Babak,
yes I have seen these papers: they study a different question in the sense that they look at the influence of the seed for FINITE n. This is a major difference in my opinion (for instance they consider the behavior of the tail of degrees, which is clearly unaffected by the seed for large n).
By Babak July 27, 2016 - 2:42 pm
Thanks for your response.
You are correct that the papers I alluded to focus on finite n,
however, our analysis is not confined to “tail”—it’s for general k.
Also, asymptotic temporal behavior can readily be obtained by taking the limit of large n in the given solution.
In any case, I thought those papers were relevant and you might enjoy reading them, which is why I mentioned them. I wish you all the best for your future research in this direction.
Cheers.
By Community Detection in Graphs — a Casual Tour | Math ∩ Programming May 19, 2014 - 10:30 pm
[…] model, but I think it’s likely to be trivial and uninteresting. On the other hand, Bubeck et al. have looked at changing the initial graph (the “seed”) from a 2-clique to […]
By Jeremy Kun May 15, 2014 - 5:28 pm
I can’t say what the first reference is, but someone I talked to about this post seemed to be at least mildly bothered by it.
A different question: if the initial seed is a clique whose size depends on T, do you know if it’s easy to recover the clique after T steps? Or, since the clique may have a nontrivial chance to grow, to find any clique of comparable size?
By morteza April 21, 2014 - 8:43 pm
I am missing something in understanding the conjecture. It looks that the degree distribution of PA(n+1,T) only depends on the degree distribution of the previous step (PA(n,T)) and not on its tree structure. So if S and T have the same degree distribution, then \delta(S,T) = 0 even if S and T are not isomorph.
What am i missing?
By Sebastien Bubeck April 22, 2014 - 8:14 am
Your first observation is absolutely correct: if T and S have the same degree sequences then one can couple the evolution of PA(n,T) and PA(n, S) so that they have the same degree sequences for any n. Thus any statistics based on the degree sequence (such as the maximum degree) will fail to distinguish between PA(n,T) and PA(n,S) in this case. However one can think of more complicated statistics (we propose one in the paper).
By Jeremy Kun April 8, 2014 - 12:33 pm
It looks like the history of the preferential attachment model predates the 90’s by quite a bit…
By Sebastien Bubeck April 9, 2014 - 5:16 pm
Yes you’re right, sorry about that… What is the first mathematical reference in your opinion?