Lecture 6. Giant component (3)

Let us begin with a brief recap from the previous lecture. We consider the Erdös-Rényi random graph G(n,\frac{c}{n}) in the supercritical case c>1. Recall that C_v denotes the connected component of the graph that contains the vertex v. Our goal is to prove the existence of the giant component with size \max_v|C_v|\sim (1-\rho)n, while the remaining components have size \lesssim\log n.

Fix \beta>0 sufficiently large (to be chosen in the proof), and define the set

    \[K=\{v :  |C_v| > \beta \log n\}\]

of vertices contained in “large” components. The proof consists of two parts:

  • Part 1: \mathbb{P}[C_v = C_{v'}~\forall v,v' \in K] \to 1.
  • Part 2: \frac{|K|}{n} \to 1-\rho in probability.

Part 1 states that all the sufficiently large components must intersect, forming the giant component. Part 2 counts the number of vertices in the giant component. Part 2 was proved in the previous lecture. The goal of this lecture is to prove Part 1, which completes the proof of the giant component.

Overview

As in the previous lectures, the central idea in the study of the giant component is the exploration process (U_t,A_t,R_t), where

    \[|C_v| = \tau := \inf\{ t  :  |A_t|=0\}.\]

We have seen that |A_t| \approx S_t, where S_t is a random walk with increments

    \[X_t = S_t -S_{t-1} \sim \text{Binomial}(n,\tfrac{c}{n})-1, 	\qquad |A_0| = S_0 =1.\]

When c>1, we have \mathbb{E}[X_t] = c-1 > 0. Thus |A_t| is approximately a random walk with positive drift. The intuitive idea behind the proof of Part 1 is as follows. Initially, the random walk can hit 0 rapidly, in which case the component is small. However, if the random walk drifts away from zero, then with high probability it will never hit zero, in which case the component must keep growing until the random walk approximation is no longer accurate. Thus there do not exist any components of intermediate size: each component is either very small (|C_v|\le \beta\log n) or very large (we will show |C_v|\ge n^{2/3}, but the precise exponent is not important).

We now want to argue that any pair of large components must necessarily intersect. Consider two disjoint sets I and J of vertices of size |I|,|J|\ge n^{2/3}. As each edge is present in the graph with probability c/n, the probability that there is no edge between I and J is

    \[\mathbb{P}[\eta_{ij}=0\mbox{ for all }i\in I,~j\in J]= 	\bigg(1-\frac{c}{n}\bigg)^{|I|\,|J|} \le 	\bigg(1-\frac{c}{n}\bigg)^{n^{4/3}} \le 	e^{-cn^{1/3}}.\]

We therefore expect that any pair of large components must intersect with high probability. The problem with this argument is that we assumed that the sets I and J are nonrandom, while the random sets C_v themselves depend on the edge structure of the random graph (so the events \{C_v=I,C_{v'}=J\} and \{\mbox{no edges between }I,J\} are highly correlated). To actually implement this idea, we therefore need a little bit more sophisticated approach.

To make the proof work, we revisit more carefully our earlier random walk argument. The process |A_t|\approx S_t has positive drift as \mathbb{E}[S_t] = S_0 + (c-1)t. Thus the process |A_t|-(c-1)t/2 is still approximately a random walk with positive drift! Applying the above intuition, either |A_t| dies rapidly (the component is small), or |A_t| grows linearly in t as is illustrated in the following figure:

Rendered by QuickLaTeX.com

This means that the exploration process for a component of size >\beta\log n will not only grow large (|A_{n^{2/3}}|>0) with high probability, but that the exploration process will also possess a large number of active vertices (|A_{n^{2/3}}|\gtrsim n^{2/3}). To prove that all large components intersect, we will run different exploration processes simultaneously starting from different vertices. We will show that if two of these processes reach a large number of active vertices then there must be an edge between them with high probability, and thus the corresponding components must coincide. This resolves the dependence problem in our naive argument, as the edges between the sets of active vertices have not yet been explored and are therefore independent of the history of the exploration process.

The component size dichotomy

We now begin the proof in earnest. We will first show the dichotomy between large and small components: either the component size is \le\beta\log n, or the number of active vertices |A_t| grows linearly up to time n^{2/3}. To be precise, we consider the following event:

    \[B := \Big\{\text{either} ~  	|C_v| \le \beta \log n , 	~ \text{or} ~  	|A_t| \ge \Big(\frac{c-1}{2}\Big) t  	~ \text{for all} ~ \beta \log n \le t  	\le n^{2/3}\Big\}.\]

Our goal is to show that \mathbb{P}[B] is large.

Define the stopping time

    \[\sigma := \inf\Big\{t \ge \beta \log n  : 	|A_t| < \Big(\frac{c-1}{2}\Big)t \Big\}.\]

We can write

    \[B = \{ \tau \le \beta \log n ~ \text{or} ~ \sigma > n^{2/3} \}.\]

Now suppose \tau >\beta \log n and \sigma =t. Then \tau \ge t, as exploration process is alive at time \beta \log n and stays alive until time t. We can therefore write

    \[\mathbb{P}[B^c] = \sum_{s=\beta \log n}^{n^{2/3}}  	\mathbb{P}[\tau>\beta \log n ~ \text{and} ~ \sigma = s] \le 	\sum_{s=\beta \log n}^{n^{2/3}}  	\mathbb{P}\Big[|A_s|  	< \Big(\frac{c-1}{2}\Big)s 	~ \text{and} ~ 	s\le\tau\Big].\]

To bound the probabilities inside the sum, we compare |A_s| to a suitable random walk.

The random walk argument

To bound the probability that |A_s| < (c-1)s/2, we must introduce a comparison random walk that lies beneath |A_t|. We use the same construction as was used in the previous lecture. Let

    \begin{equation*} 	\bar S_{t+1}=\left\{ 	\begin{array}{ll} 	\bar S_t -1 +  	\sum_{w \in U_t} \eta_{wi_t} +  	\sum_{w=1}^{n-(\frac{c+1}{2})n^{2/3} - |U_t|}  	\tilde \eta _w^t &\quad\text{if }  	|U_t|<n-(\frac{c+1}{2}) n^{2/3},\\ 	\bar S_t -1 +  	\sum_{w \in \bar U_t} \eta_{wi_t} & \quad   	\text{if } 	|U_t|\ge n-(\frac{c+1}{2}) n^{2/3} . 	\end{array}  	\right. \end{equation*}

where \bar S_0=1, \tilde \eta_w^t are i.i.d. \text{Bernoulli}(\frac{c}{n}) random variables independent of \eta_{ij}, i_t=\min A_t (the same i_t used in the exploration process), and \bar U_t is the set of the first n-(\frac{c+1}{2})n^{2/3} components of U_t (if t \ge \tau, then A_t=\varnothing and thus i_t is undefined; then we simply add n-(\frac{c+1}{2})n^{2/3} variables \tilde\eta_w^t).

As in the previous lecture, we have:

  • \bar S_t is a random walk with \text{Binomial}(n-(\frac{c+1}{2})n^{2/3},\frac{c}{n})-1 increments.
  • \bar S_t \le |A_t| whenever t \le \tau and |U_t| \ge n-(\frac{c+1}{2})n^{2/3}.

Now suppose that s\le n^{2/3} and |A_s|<(\frac{c-1}{2})s. Then

    \[|U_s| = n - |A_s| - |R_s| \ge 	n - (\tfrac{c+1}{2})s \ge n - (\tfrac{c+1}{2})n^{2/3}.\]

We therefore obtain for s\le n^{2/3}

    \[\mathbb{P}\Big[|A_s|  	< \Big(\frac{c-1}{2}\Big)s 	~ \text{and} ~ 	s\le\tau\Big] \le 	\mathbb{P}\Big[\bar S_s < \Big(\frac{c-1}{2}\Big)s\Big].\]

Thus computing \mathbb{P}[B^c] reduces to compute the tail probability of a random walk (or, in less fancy terms, a sum of i.i.d. random variables). That is something we know how to do.

Lemma (Chernoff bound). Let X\sim \text{Binomial}(n,p). Then

    \[\mathbb{P}[X\le np-t] \le e^{-t^2/2np}.\]

Proof. Let \gamma>0. Then

    \begin{eqnarray*} 	\mathbb{P}[X\le np-t] &=& 	\mathbb{P}[e^{-\gamma X} \ge e^{-\gamma np + \gamma t}] \\ 	&\le& e^{\gamma np - \gamma t} \, \mathbb{E}[e^{-\gamma X}]\\ 	&=& e^{\gamma np - \gamma t} (1- (1-e^{-\gamma})p)^n \\ 	&\le& e^{\{\gamma-(1-e^{-\gamma})\}np - \gamma t}\\ 	&\le& e^{\gamma^2 np/2 - \gamma t}. \end{eqnarray*}

The result follows by optimizing over \gamma>0. \quad\square

Note that \bar S_s \sim 1 - s + \text{Binomial}(\{n-(\frac{c+1}{2})n^{2/3}\}s,\frac{c}{n}). We therefore have by the Chernoff bound

    \[\mathbb{P}\Big[\bar S_s < \Big(\frac{c-1}{2}\Big)s\Big] \le 	\mathbb{P}\Big[ 	\text{Binomial}(\{n-(\tfrac{c+1}{2})n^{2/3}\}s,\tfrac{c}{n}) 	\le \Big(\frac{c+1}{2}\Big)s\Big] \le 	e^{-(c-1-o(1))^2 s/8c}\]

for all s (here o(1) depends only on n and c). In particular, we have

    \[\mathbb{P}\Big[\bar S_s < \Big(\frac{c-1}{2}\Big)s\Big] \le 	n^{-\beta(c-1)^2/9c} 	\quad\text{for all }s\ge\beta\log n\]

provided n is sufficiently large. Thus we can estimate

    \[\mathbb{P}[B^c] \le 	\sum_{s=\beta \log n}^{n^{2/3}}  	\mathbb{P}\Big[\bar S_s < \Big(\frac{c-1}{2}\Big)s\Big]  	\le 	n^{2/3-\beta(c-1)^2/9c},\]

which goes to zero as n\to\infty provided that \beta is chosen sufficiently large. In particular, the component size dichotomy follows: choosing any \beta>15c/(c-1)^2, we obtain

    \[\mathbb{P}[|C_v|\le\beta\log n 	\text{ or }|C_v|\ge n^{2/3}\text{ for all }v] \ge 	1-n\mathbb{P}[B^c] \ge  	1-n^{5/3-\beta(c-1)^2/9c}\xrightarrow{n\to\infty}0.\]

Remark: Unlike in the proof of Part 2 in the previous lecture, here we do need to choose \beta sufficiently large for the proof to work. If \beta is too small, then the random walk \bar S_t cannot move sufficiently far away from zero to ensure that it will never return. In particular, even in the supercritical case, the second largest component has size of order \log n.

Large components must intersect

To complete the proof, it remains to show that all large components must intersect. To do this, we will run several exploration processes at once starting from different vertices. If the sets of active vertices of two of these processes grow large, then there must be an edge between them with high probability, and thus the corresponding components intersect. Let us make this argument precise.

In the following, we denote by (U_t^v,A_t^v,R_t^v) the exploration process started at A_0=\{v\}. For each such process, we define the corresponding set B^v that we have investigated above:

    \[B_v := \Big\{\text{either} ~  	|C_v| \le \beta \log n , 	~ \text{or} ~  	|A_t^v| \ge \Big(\frac{c-1}{2}\Big) t  	~ \text{for all} ~ \beta \log n \le t  	\le n^{2/3}\Big\}.\]

We have shown above that, provided \beta>15c/(c-1)^2, we have

    \[\mathbb{P}\bigg[\bigcap_v B_v\bigg] 	\ge 1 - \sum_v \mathbb{P}[B_v^c] \ge 	1-o(1).\]

We can therefore estimate

    \begin{eqnarray*} 	&& 	\mathbb{P}[\exists\, v,v' \in K \text{ such that } C_v \neq C_{v'}] \\ 	&& \mbox{}= 	\mathbb{P}\big[\exists\, v,v' \in K \text{ such that } C_v \neq C_{v'}, 	\text{ and } 	|A^v_{n^{2/3}}| \ge (\tfrac{c-1}{2})n^{2/3} 	~\forall\, v \in K\big] + o(1) \phantom{\sum}\\ 	&& \mbox{}\le 	\sum_{v,v'} 	\mathbb{P}\big[C_v \neq C_{v'} \text{ and } 	 |A^v_{n^{2/3}}| \wedge |A_{n^{2/3}}^{v'}| \ge  	(\tfrac{c-1}{2})n^{2/3}\big]+o(1). \end{eqnarray*}

Now note that by time t, the exploration process (U_t^v,A_t^v,R_t^v) has only explored edges \eta_{ij} where i\in R_t^v (or j\in R_t^v), and similarly for (U_t^{v'},A_t^{v'},R_t^{v'}). It follows that

    \[\text{The conditional law of } 	\{\eta_{ij} : i,j \not\in R_t^v\cup R_t^{v'}\} 	\text{ given } 	(A_t^v,R_t^v,A_t^{v'},R_t^{v'}) 	\text{ is i.i.d.\ Binomial}(\tfrac{c}{n}).\]

In particular, if I^v,J^v,I^{v'},J^{v'} are disjoint subsets of vertices, then

    \[\mathbb{P}[\text{no edge between } I^v, I^{v'}  	| A_t^v =I^v, R_t^v=J^v,A_t^{v'}=I^{v'},R_t^{v'}=J^{v'}]  	= \bigg(1-\frac cn \bigg)^{|I^v|\, |I^{v'}|}.\]

On the other hand, C_v \ne C_{v'} implies that R_t^v,A_t^v,R_t^{v'},A_t^{v'} must be disjoint at every time t. Thus if C_v \ne C_{v'}, there can be no edges between vertices in A_t^v and A_t^{v'} at any time t (if such an edge exists, then the vertices connected by this edge will eventually be explored by both exploration processes, and then the sets of removed vertices will no longer be disjoint). Therefore,

    \begin{eqnarray*} 	&&  	\mathbb{P}\big[C_v \neq C_{v'} \text{ and } 	 |A^v_{n^{2/3}}| \wedge |A_{n^{2/3}}^{v'}| \ge  	(\tfrac{c-1}{2})n^{2/3}\big] \\ 	&& \mbox{} \le 	\mathbb{P}\big[ 	\text{no edge between }A^v_{n^{2/3}},A^{v'}_{n^{2/3}},\\ 	&& \phantom{\mbox{} \le \mathbb{P}\big[} 	A^v_{n^{2/3}},R^v_{n^{2/3}}, 	A^{v'}_{n^{2/3}},R^{v'}_{n^{2/3}}\text{ are disjoint}, ~ 	|A^v_{n^{2/3}}| \wedge |A_{n^{2/3}}^{v'}| \ge  	(\tfrac{c-1}{2})n^{2/3}\big] \\ 	&& \mbox{}\le 	\bigg(1-\frac{c}{n} \bigg)^{(c-1)^2n^{4/3}/4} 	\le e^{-c(c-1)^2n^{1/3}/4}. \end{eqnarray*}

Thus we finally obtain

    \[\mathbb{P}[C_v = C_{v'}~\forall v,v' \in K] \ge 	1- n^2e^{-c(c-1)^2n^{1/3}/4} - o(1) 	\xrightarrow{n\to\infty}1,\]

and the proof of the giant component theorem is complete.

Many thanks to Quentin Berthet for scribing this lecture!

27. April 2013 by Ramon van Handel
Categories: Random graphs | Leave a comment

css.php