## Lecture 4. Giant component (1)

Consider the Erdös-Rényi graph model . In previous lectures, we focused on the “high complexity regime”, i.e., as goes to infinity, is fixed. We discussed topics such as clique numbers and chromatic numbers. From now on, we shall consider the “low complexity regime”, where as goes to infinity, for a fixed constant . As before, let be the adjacency matrix of . Then are i.i.d. Bernoulli random variables with success probability .

Theorem 1Let be the connected component of that contains .

- If , then in probability.
- If , then in probability, for some .
- If , then in distribution.

In the following lectures, we will aim to prove at least parts 1 and 2.

**The exploration process**

How to study ? We will explore by starting an “exploration process'' at that moves around until all its sites have been visited. This walk will be constructed so that it hits each site once. So, the time it takes to explore all of is exactly . As a consequence, studying reduces to studying a hitting time of a certain random walk, and to the latter we can apply martingale theory.

At each time , we maintain three sets of vertices:

Below is an illustration of how these sets are updated on a simple example.

- At , initialize , and . Namely, only is active, all the vertices other than are unexplored, and no vertices have been removed.
- At , update , and . Namely, all neighbors of are moved from the unexplored set to the active set, and itself is removed.
- At , pick some and update , and . Namely, all unexplored neighbors of are moved into the active set, and itself is removed.
- At time , we pick some vertex from the current active set , activate all unexplored neighbors of and remove itself.

This is a sort of local search along the connected component : much like playing a game of Minesweeper! At each , the choice of can be made arbitrarily (e.g., selecting the vertex with the smallest index or randomly selecting a vertex in ). The only requirement is that it is nonanticipating (only depending on the edges visited up to time ). For example, we cannot pick the vertex in which has the largest number of unexplored neighbors, as this choice relies on unexplored edges.

A formal description of the “exploration process'':

- Initialize , and .
- For , we set
where is a nonanticipating but otherwise arbitrary choice.

This process stops when there are no more active vertices. It hits each vertex in once and only once. At each time , we remove one vertex in . So the stopping time is exactly equal to the size of :

So, we only need to study the stopping time .

Recall that indicates whether there is an edge between vertices and , and . By construction,

Now, let's do a thought experiment (wrong, but intuitive). Let's forget for the moment that some sites were previously visited, and assume that in each step all neighbors of are unvisited still (note that when is really large and is relatively small, this assumption makes sense). Then is the sum of independent ) variables, which has a distribution. This binomial variable is independent of the past because it only depends on unexplored edges; in addition, its distribution does not depend on . Therefore, would be a random walk with increment distribution . Then, studying boils down to studying first time a Poisson random walk hits zero! Of course, we cannot really ignore previously visited sites, but this rough intuition nonetheless captures the right idea as and will serve as a guiding principle for the proof.

**A comparison random walk**

The reason that is not a random walk is that there are only edges (not ) to explore at time . We can artificially create a random walk by adding “fictitious'' points at each time as follows.

Let be i.i.d. for , , which are independent of . Define

(When , then and thus is undefined; in this case, we simply add all variables .)

Note that is the sum of edges from to . Since we have not explored yet, those edges are independent of all edges explored up to time (here we use that is nonanticipating). We therefore see that is indeed a random walk with increment

Moreover, since all are nonnegative,

as long as . It follows that is dominated by the genuine random walk , that is,

We can now obtain bounds on by analyzing hitting times of the random walk .

**The subcritical case **

Define the first time the comparison walk hits zero as

Since for , it is obvious that

Now we study . The intuition is that as , is a random walk with *negative* drift in the subcritical case . Thus , and in fact the hitting time has nice tails!

Lemma 2Let and . Then for any positive integer ,

We will prove this lemma below. Using the lemma, we immediately obtain:

Corollary 3If , then for any

**Proof.** Applying the Lemma 2 and the union bound,

This corollary proves part 1 of Theorem 1. In fact, it turns out that the constant is tight: by using the second moment method, one can prove a matching lower bound on (see, for example, the lecture notes by van der Hofstad), which implies that in fact in probability. The proof is not much more difficult, but we prefer to move on to the supercritical case.

**Remark.** It might seem somewhat surprising that the result we obtain is so sharp, considering that we have blindly replaced by the larger quantity . However, in going from to we do not lose as much as one might think at first sight. When is large and is relatively small, the excess term in the definition of is zero with high probability, as most vertices are unexplored and the Bernoulli success probability of the is very small. With a bit of work, one can show that and will actually stick together for times with probability going to one as . Thus, in the subcritical case where the random walk only lives for time steps, nothing is lost in going from to , and our rough intuition that should behave like a random walk as is vindicated.

To wrap up the subcritical case, it remains to prove the lemma.

**Proof of Lemma 2.** By the Markov inequality,

It remains to bound , which is a standard exercise in martingale theory.

Recall that

where are i.i.d. Define the moment generating function , and let

Since and is independent of ,

where we have used . So is a martingale.

In the case and ,

The inequality is by Fatou's lemma and the second equality is by the optional stopping theorem. To see the first equality, note that if , then and as , while if , then for all and . Thus .

Next, we compute . Recall that . It has the same distribution as the sum of i.i.d. variables. For , we have . Therefore,

where the last inequality is because for any . Now, by setting , we obtain that . Thus we have shown .

**The supercritical case **

The goal of the following lectures is to prove part 2 of Theorem 1. More precisely, we will prove:

Theorem 4Let . Then

in probability, where is the smallest positive solution of the equation . Moreover, there is such that all but one of the components have size , with probability tending to as .

This theorem says that with probability tending to , there is a *unique* giant component whose size is , and all other components are small with size .

Here we provide some vague intuition for this theorem. When , the random walk satisfies , i.e., has *positive* drift. Then ! In fact, the further away it starts from , the smaller the probability it will ever hit . Consider the two situations:

- dies quickly: this implies that the component is small.
- lives long: then it must live
*very*long, as once it gets far away from , the probability of returning is very small. This implies that the component must be*very*large (if we pretend that ).

Of course, is not (obviously eventually hits ). But the intuition explains that there cannot be components of *intermediate* size: given any vertex , either is small (), or it must get very large (, say). In fact, we will find that all components of size must grow all the way to . However, any pair of large components must intersect with high probability, as there are many potential edges between them! Therefore, all vertices with should be in the *same* giant component. We then show that the number of such vertices is with high probability.

*Many thanks to Tracy Ke for scribing this lecture!*