Lecture 5. Giant component (2)

Consider the Erdös-Rényi graph model G(n,\frac{c}{n}), and denote as usual by C_v the connected component of the graph that contains the vertex v. In the last lecture, we focused mostly on the subcritical case c < 1, where we showed that \max_v|C_v|\lesssim\log n. Today we will begin developing the supercritical case c > 1, where \max_v|C_v| \sim (1-\rho)n for a suitable constant 0<\rho<1. In particular, our aim for this and next lecture is to prove the following theorem.

Theorem. Let c>1. Then

    \[\frac{\max_v|C_v|}{n}\xrightarrow{n\to\infty} 1 - \rho 	\quad\mbox{in probability},\]

where \rho is the smallest positive solution of the equation \rho = e^{c(\rho-1)}. Moreover, there is a \beta>0 such that all but one of the components have size \leq \beta\log n with probability tending to 1 as n\to\infty.

Beginning of the proof. Define the set

    \[K = \{ v: |C_v| > \beta \log n\}.\]

The proof of the Theorem consists of two main parts:

  • Part 1: \mathbb{P}[C_v = C_{v'}~ \forall v,v' \in K] \xrightarrow{n \to \infty} 1.
  • Part 2: \frac{|K|}{n} \xrightarrow{n \to \infty} 1-\rho in probability.

Part 1 states that all “large” components of the graph must intersect, forming one giant component. Some intuition for why this is the case was given at the end of the previous lecture. Part 2 computes the size of this giant component. In this lecture, we will concentrate on proving Part 2, and we will find out where the mysterious constant \rho comes from. In the next lecture, we will prove Part 1, and we will develop a detailed understanding of why all large components must intersect.

Before we proceed, let us complete the proof of the Theorem assuming Parts 1 and 2 have been proved. First, note that with probability tending to one, the set K is itself a connected component. Indeed, if v\in K and v'\not\in K then v,v' must lie in disjoint connected components by the definition of K. On the other hand, with probability tending to one, all v,v'\in K must lie in the same connected component by Part 1. Therefore, with probability tending to one, the set K forms a single connected component of the graph. By Part 2, the size of this component is \sim (1-\rho)n, while by the definition of K, all other components have size \le \beta\log n. This completes the proof. \quad\square

The remainder of this lecture is devoted to proving Part 2 above. We will first prove that the claim holds on average, and then prove concentration around the average. More precisely, we will show:

  1. \mathbb{E}\big[\frac{|K|}{n}\big] \xrightarrow{n\to\infty} 1-\rho.
  2. \mathrm{Var}\big[\frac{|K|}{n}\big] \xrightarrow{n\to\infty}0.

Together, these two claims evidently prove Part 2.

Mean size of the giant component

We begin by writing out the mean size of the giant component:

    \[\mathbb{E}\bigg[\frac{|K|}{n}\bigg] =  	\mathbb{E}\Bigg[ 	\frac{1}{n} \sum_{v \in [n]}  	\mathbf{1}_{|C_v|>\beta\log n} 	\Bigg] =  	\frac{1}{n} \sum_{v \in [n]}  	\mathbb{P}[\left|C_v\right| > \beta \log n ] = 	\mathbb{P}[\left|C_v\right| > \beta \log n ],\]

where we note that \mathbb{P}[\left|C_v\right| > \beta \log n ] does not depend on the vertex v by the symmetry of the Erdös-Rényi model. Therefore, to prove convergence of the mean size of the giant component, it suffices to prove that

    \[\mathbb{P}[\left|C_v\right| \le \beta \log n ] 	\xrightarrow{n\to\infty}\rho.\]

This is what we will now set out to accomplish.

In the previous lecture we defined exploration process (U_t, A_t, R_t). We showed that

    \[|C_v| = \tau := \inf \{t: |A_t| = 0\}\]

and that for t<\tau

    \[|A_{t+1}| = |A_t| - 1 + \sum_{w \in U_t} \eta_{wi_t}, 	\qquad |A_0|=1,\]

where i_t \in A_t is an arbitrary nonanticipating choice, say, i_t = \min A_t (recall that (\eta_{ij})_{i,j\in[n]} denotes the adjacency matrix of the random graph). As \eta_{ij} are i.i.d. \mathrm{Bernoulli}(\frac{c}{n}) and as edges emanating from the set of unexplored vertices U_t have not yet appeared in previous steps, the process |A_t| is “almost'' a random walk: it fails to be a random walk as we only add |U_t| Bernoulli variables in each iteration, rather than a constant number. In the last lecture, we noted that we can estimate |A_t| from above by a genuine random walk S_t by adding some fictitious vertices. To be precise, we define

    \[S_{t+1}=S_t - 1 + \sum_{w\in U_t} \eta_{wi_t} + 	\sum_{w=1}^{n-|U_t|}\tilde\eta_w^t,\qquad S_0=1,\]

where \tilde\eta_w^t are i.i.d. \mathrm{Bernoulli}(\frac{c}{n}) independent of the \eta_{ij} (if t\ge\tau, then A_t=\varnothing and thus i_t is undefined; in this case, we simply add all n variables \tilde\eta_w^t). In the present lecture, we also need to bound |A_t| from below. To this end, we introduce another process \bar S_t as follows:

    \[\bar S_{t+1}=\left\{ 	\begin{array}{ll} 	\bar S_t - 1 + \sum_{w\in U_t} \eta_{wi_t} + 		\sum_{w=1}^{n-\beta\log n-|U_t|}\tilde\eta_w^t 	& \mbox{if }|U_t|< n-\beta\log n,\\ 	\bar S_t - 1 + \sum_{w\in \bar U_t} \eta_{wi_t} 	& \mbox{if }|U_t|\ge n-\beta\log n, 	\end{array}\right. 	\qquad \bar S_0=1,\]

where \bar U_t is the set consisting of the first n-\beta\log n elements of U_t in increasing order of the vertices (if t\ge\tau, we add n-\beta\log n variables \tilde\eta_w^t). The idea behind these processes is that S_t is engineered, by including “fictitious'' vertices, to always add n i.i.d. Bernoulli variables in every iteration, while \bar S_t is engineered, by including “fictitious'' vertices when |U_t| is small and omitting vertices when |U_t| is large, to always add n-\beta\log n i.i.d. Bernoulli variables in every iteration. The following facts are immediate:

  • S_t is a random walk with i.i.d. \mathrm{Binomial}(n,\frac{c}{n})-1 increments.
  • \bar S_t is a random walk with i.i.d. \mathrm{Binomial}(n-\beta\log n,\frac{c}{n})-1 increments.
  • S_t\ge |A_t| for all t\le\tau.
  • \bar S_t\le |A_t| for all t\le\tau on the event \{|C_v|\le\beta\log n\}.

To see the last property, note that the exploration process can only explore as many vertices as are present in the connected component C_v, so that |U_t|\ge n-|C_v| for all t; therefore, in this situation only the second possibility in the definition of \bar S_t occurs, and it is obvious by construction that then \bar S_t\le |A_t| (nonetheless, the first possibility in the definition must be included to ensure that \bar S_t is a random walk).

We now define the hitting times

    \[T = \inf\{t:S_t=0\},\qquad 	\bar T = \inf\{t:\bar S_t=0\}.\]

Then we evidently have

    \[\mathbb{P}[T \le \beta \log n] \le  	\mathbb{P}[|C_v| \le \beta \log n] \le  	\mathbb{P}[\bar T \le \beta \log n].\]

(Note how we cleverly chose the random walk \bar S_t precisely so that \bar T\le |C_v| whenever |C_v|\le \beta\log n). We have therefore reduced the problem of computing \mathbb{P}[|C_v| \le \beta \log n] to computing the hitting probabilities of random walks. Now we are in business, as this is something we know how to do using martingales!

The hitting time computation

Let us take a moment to gather some intuition. The random walks S_t and \bar S_t have increments distributed as \mathrm{Binomial}(n,\frac{c}{n})-1 and \mathrm{Binomial}(n-\beta\log n,\frac{c}{n})-1, respectively. As n\to\infty, both increment distributions converge to a \mathrm{Poisson}(c)-1 distribution, so we expect that \mathbb{P}[|C_v|\le\beta\log n]\sim\mathbb{P}[T_0\le\beta\log n] where T_0 is the first hitting time of the Poisson random walk. On the other hand, as \mathbb{P}[T_0\le\beta\log n] \to \mathbb{P}[T_0<\infty], we expect that \mathbb{P}[|C_v|\le\beta\log n]\to \mathbb{P}[T_0<\infty]. The problem then reduces to computing the probability that a Poisson random walk ever hits the origin. This computation can be done explicitly, and this is precisely where the mysterious constant \rho=\mathbb{P}[T_0<\infty] comes from!

We now proceed to make this intuition precise. First, we show that the probability \mathbb{P}[T<\beta\log n] can indeed be replaced by \mathbb{P}[T<\infty], as one might expect.

Lemma. \mathbb{P}[T \le \beta \log n] = \mathbb{P}[T < \infty] - o(1) as n\to\infty.

Proof. We need to show that

    \[\mathbb{P} [\beta \log n < T < \infty] \xrightarrow{n \to \infty} 0.\]

Note that as S_T=0 when T<\infty,

    \[\mathbb{P}[k \le T < \infty] = \sum_{t=k}^{\infty}  	\mathbb{P}[T=t] \le \sum_{t=k}^{\infty} \mathbb{P}[S_t = 0].\]

We can evidently write

    \[\mathbb{P}[S_t = 0] =  	\mathbb{E}[\mathbf{1}_{S_t=0} e^{\gamma S_t} ]  	\le  	\mathbb{E} [e^{\gamma S_t}] =  	e^{\gamma} \mathbb{E}[e^{\gamma X_1}]^t\]

where S_t - S_{t-1} =: X_t \sim \mathrm{Binomial}(n,\frac{c}{n})-1 and

    \[\mathbb{E} [e^{\gamma X_1}] = e^{-\gamma}  	(1+\tfrac{c}{n}(e^{\gamma}-1))^n  	\le e^{c(e^{\gamma}-1)-\gamma}.\]

Choosing \gamma = -\log c, we obtain \mathbb{E} [e^{\gamma X_1}] \le e^{1-c + \log c} < 1 for c \ne 1. Therefore,

    \[\mathbb{P} [\beta\log n < T < \infty] \le  	\frac{1}{c} \sum_{t=\beta\log n}^{\infty}  	e^{(1-c + \log c)t} 	\xrightarrow{n\to\infty}0.\]

This completes the proof. \qquad\square

By the above Lemma, and a trivial upper bound, we obtain

    \[\mathbb{P} [T< \infty] - o(1) \le  	\mathbb{P}[|C_v| \le \beta \log n] \le \mathbb{P}[\bar T < \infty].\]

To complete computation of the mean size of the giant component, it therefore remains to show that \mathbb{P} [T<\infty] and \mathbb{P}[\bar T<\infty] converge to \rho. In fact, we can compute these quantities exactly.

Lemma. Let c>1, \frac{c}{n} < 1. Then

    \[\mathbb{P} [T<\infty] = \rho_n\]

where \rho_n is the smallest positive solution of \rho_n = (1+\frac{c}{n}(\rho_n-1))^n.

Proof. Recall the martingale M_t used in last lecture:

    \[M_t = e^{\gamma S_t - \phi(\gamma)t}, 	\qquad 	\phi(\gamma)=\log \mathbb{E}[e^{\gamma X_1}] = 	\log\big[e^{-\gamma}(1+\tfrac{c}{n}(e^{\gamma}-1))^n\big].\]

Suppose that \gamma < 0 and \phi(\gamma) > 0. Then

    \[\mathbb{E} [e^{-\phi(\gamma)T}] =  	\mathbb{E} \Big[\lim_{k \to \infty} M_{k \wedge T}\Big]  	= \lim_{k \to \infty} \mathbb{E} [M_{k \wedge T}] = M_0 = e^{\gamma}.\]

The first equality holds since if T < \infty then S_T = 0 and M_{k \wedge T} \to M_T = e^{-\phi(\gamma)T}, while if T = \infty then S_{k} \ge 0 and M_k \to 0 = e^{-\phi(\gamma)T}. The second equality holds by dominated convergence since 0 \le M_{k \wedge T} \le 1, and the third equality is by the optional stopping theorem.

Now suppose we can find \gamma_n < 0 such that \phi(\gamma) \downarrow 0 as \gamma \uparrow \gamma_n. Then we have

    \[\rho_n := 	e^{\gamma_n} = \lim_{\gamma \uparrow \gamma_n}  	\mathbb{E}[e^{-\phi(\gamma)T}] = 	\mathbb{E} \Big[\lim_{\gamma \uparrow \gamma_n} e^{-\phi(\gamma)T}\Big]  	= \mathbb{P}[T < \infty]\]

by dominated convergence. Thus, evidently, it suffices to find \gamma_n with the requisite properties. Now note that as \phi(\gamma_n)=0, \gamma_n<0, and \phi(\gamma)>0 for \gamma<\gamma_n, we evidently must have

    \[\rho_n = (1+\tfrac{c}{n}(\rho_n-1))^n,\qquad 	\rho_n < 1,\qquad 	\rho < (1+\tfrac{c}{n}(\rho-1))^n \mbox{ for }\rho<\rho_n.\]

We can find such \rho_n by inspecting the following illustration:

Rendered by QuickLaTeX.com

Evidently the requisite assumptions are satisfied when \rho_n is the smallest root of the equation \rho_n=(1+\tfrac{c}{n}(\rho_n-1))^n (but not for the larger root at 1!) \quad\square

Remark. Note that the supercritical case c>1 is essential here. If c\le 1 then the equation for \rho_n has no solutions <1, and the argument in the proof does not work. In fact, when c\le 1, we have \mathbb{P}[T<\infty]=1.

By an immediate adaptation of the proof of the previous lemma, we obtain

    \[\mathbb{P}[\bar T < \infty] = \bar \rho_n\]

where \bar\rho_n is the smallest positive solution of \bar \rho_n = (1+\frac{c}{n}(\bar \rho_n-1))^{n-\beta \log n}. Letting n\to\infty, we see that

    \[\lim_{n \to \infty} \mathbb{P}[|C_v|\le\beta\log n] =  	\lim_{n \to \infty} \mathbb{P}[T<\infty] =  	\lim_{n \to \infty} \mathbb{P}[\bar T<\infty] = \rho,\]

where \rho is the smallest solution of the equation \rho=e^{c(\rho-1)} (which is precisely the probability that the Poisson random walk hits zero, by the identical proof to the lemma above). We have therefore proved

    \[\mathbb{E}\bigg[\frac{|K|}{n}\bigg] \xrightarrow{n\to\infty} 1-\rho.\]

Variance of the giant component size

To complete the proof of Part 2 of the giant component theorem, it remains to show that

    \[\mathrm{Var}\bigg[\frac{|K|}{n}\bigg] = 	\mathrm{Var}\bigg[1-\frac{|K|}{n}\bigg]  	\xrightarrow{n\to\infty} 0.\]

To this end, let us consider

    \[\mathbb{E}\bigg[\bigg(1-\frac{|K|}{n}\bigg)^2\bigg] = 	\mathbb{E}\bigg[\bigg(\frac{1}{n} \sum_{v \in [n]}  		\mathbf{1}_{|C_v| \le \beta \log n}\bigg)^2\bigg] =  	\frac{1}{n^2} \sum_{v,w \in [n]}  	\mathbb{P}[|C_v| \le \beta \log n, |C_w| \le \beta \log n].\]

To estimate the terms in this sum, we condition on one of the components:

    \[\begin{array}{lcl} 	\mathbb{P}[|C_v| \le \beta \log n, |C_w| \le \beta \log n] 	&=& 	\mathbb{E}[ 	\,\mathbb{P}[|C_v| \le \beta \log n|C_w]\, 	\mathbf{1}_{|C_w| \le \beta \log n}\,] \\ 	&=& 	\sum_{I \subseteq [n], |I| \le \beta\log(n)}  	\mathbb{P}[C_w = I]\, \mathbb{P}[|C_v| \le \beta\log n|C_w = I]. 	\end{array}\]

To proceed, note that the event \{C_w=I\} can be written as

    \[\{C_w=I\} = \{(\eta_{ij})_{i,j\in I}\mbox{ defines a connected  	subgraph and }\eta_{ij}=0\mbox{ when }i\in I,~j\not\in I\}.\]

In particular, the event \{C_w=I\} is independent of the edges \eta_{ij} for i,j\not\in I. Therefore, for v\not\in I, the conditional law of C_v given C_w=I coincides with the (unconditional) law of C_v^{[n]\backslash I}, the conncted component containing v in the induced subgraph on the vertices [n]\backslash I:

    \[\mathbb{P}[|C_v| \le \beta\log n|C_w = I] = 	\mathbb{P}[|C_v^{[n]\backslash I}|\le \beta\log n]\qquad 	\mbox{for }v\not\in I.\]

As this quantity only depends on |I| by the symmetry of the Erdös-Rényi model, we can evidently write

    \[\mathbb{P}[|C_v| \le \beta\log n|C_w = I] = 	\mathbb{P}[|C_1^{[n-|I|]}|\le \beta\log n] \le 	\mathbb{P}[|C_1^{[n-\beta\log n]}|\le \beta\log n]\]

for v\not\in I, |I|\le\beta\log n. In particular, we obtain

    \[\sum_{v,w\in[n]} 	\mathbb{P}[|C_v| \le \beta \log n, |C_w| \le \beta \log n] 	\le 	\mathbb{E}[n-|K|]\,\{ 	\beta\log n + n\, \mathbb{P}[|C_1^{[n-\beta\log n]}|\le \beta\log n]\}.\]

Now note that, by its definition, C_1^{[k]} is distributed precisely as the component containing vertex 1 in the G(k,\frac{c}{n}) random graph model. We can therefore show, repeating exactly the proof of the mean size of the giant component above, that

    \[\mathbb{P}[|C_1^{[n-\beta\log n]}|\le \beta\log n] = 	\rho + o(1) =  	\mathbf{E}\bigg[1-\frac{|K|}{n}\bigg] + o(1).\]

We have therefore shown that

    \[\mathbb{E}\bigg[\bigg(1-\frac{|K|}{n}\bigg)^2\bigg] = 	\frac{1}{n^2}\sum_{v,w\in[n]}  	\mathbb{P}[|C_v| \le \beta \log n, |C_w| \le \beta \log n] 	\le 	\mathbf{E}\bigg[1-\frac{|K|}{n}\bigg]^2+o(1),\]

which evidently implies

    \[\mathrm{Var}\bigg[1-\frac{|K|}{n}\bigg] \le o(1).\]

This is what we set out to prove.

Remark. It should be noted that the proof of Part 2 did not depend on the value of \beta, or even on the \beta\log n rate, in the definition of the set K: any sequence that grows sublinearly to infinity would have given the same result. This suggests that all but a vanishing fraction of vertices are contained in connected components of order \sim 1 or \sim n. We find out only in the next lecture why the rate \beta\log n (for \beta sufficiently large!) is important: only sufficiently large connected components are guaranteed to intersect, while there might (and do) exist components of order \sim\log n that are disjoint from the giant component. If we do not exclude the latter, we will not be able to prove Part 1.

Many thanks to Weichen Wang for scribing this lecture!

17. April 2013 by Ramon van Handel
Categories: Random graphs | Comments Off

css.php