## Lecture 5. Giant component (2)

Consider the Erdös-Rényi graph model , and denote as usual by the connected component of the graph that contains the vertex . In the last lecture, we focused mostly on the subcritical case , where we showed that . Today we will begin developing the supercritical case , where for a suitable constant . In particular, our aim for this and next lecture is to prove the following theorem.

Theorem.Let . Then

where is the smallest positive solution of the equation . Moreover, there is a such that all but one of the components have size with probability tending to as .

**Beginning of the proof.** Define the set

The proof of the Theorem consists of two main parts:

**Part 1:**.**Part 2:**in probability.

Part 1 states that all “large” components of the graph must intersect, forming one *giant component*. Some intuition for why this is the case was given at the end of the previous lecture. Part 2 computes the size of this giant component. In this lecture, we will concentrate on proving Part 2, and we will find out where the mysterious constant comes from. In the next lecture, we will prove Part 1, and we will develop a detailed understanding of why all large components must intersect.

Before we proceed, let us complete the proof of the Theorem assuming Parts 1 and 2 have been proved. First, note that with probability tending to one, the set is itself a connected component. Indeed, if and then must lie in disjoint connected components by the definition of . On the other hand, with probability tending to one, all must lie in the same connected component by Part 1. Therefore, with probability tending to one, the set forms a single connected component of the graph. By Part 2, the size of this component is , while by the definition of , all other components have size . This completes the proof.

The remainder of this lecture is devoted to proving Part 2 above. We will first prove that the claim holds on average, and then prove concentration around the average. More precisely, we will show:

- .
- .

Together, these two claims evidently prove Part 2.

**Mean size of the giant component**

We begin by writing out the mean size of the giant component:

where we note that does not depend on the vertex by the symmetry of the Erdös-Rényi model. Therefore, to prove convergence of the mean size of the giant component, it suffices to prove that

This is what we will now set out to accomplish.

In the previous lecture we defined exploration process . We showed that

and that for

where is an arbitrary nonanticipating choice, say, (recall that denotes the adjacency matrix of the random graph). As are i.i.d. and as edges emanating from the set of unexplored vertices have not yet appeared in previous steps, the process is “almost'' a random walk: it fails to be a random walk as we only add Bernoulli variables in each iteration, rather than a constant number. In the last lecture, we noted that we can estimate from above by a genuine random walk by adding some fictitious vertices. To be precise, we define

where are i.i.d. independent of the (if , then and thus is undefined; in this case, we simply add all variables ). In the present lecture, we also need to bound from below. To this end, we introduce another process as follows:

where is the set consisting of the first elements of in increasing order of the vertices (if , we add variables ). The idea behind these processes is that is engineered, by including “fictitious'' vertices, to always add i.i.d. Bernoulli variables in every iteration, while is engineered, by including “fictitious'' vertices when is small and omitting vertices when is large, to always add i.i.d. Bernoulli variables in every iteration. The following facts are immediate:

- is a random walk with i.i.d. increments.
- is a random walk with i.i.d. increments.
- for all .
- for all on the event .

To see the last property, note that the exploration process can only explore as many vertices as are present in the connected component , so that for all ; therefore, in this situation only the second possibility in the definition of occurs, and it is obvious by construction that then (nonetheless, the first possibility in the definition must be included to ensure that is a random walk).

We now define the hitting times

Then we evidently have

(Note how we cleverly chose the random walk precisely so that whenever ). We have therefore reduced the problem of computing to computing the hitting probabilities of random walks. Now we are in business, as this is something we know how to do using martingales!

**The hitting time computation**

Let us take a moment to gather some intuition. The random walks and have increments distributed as and , respectively. As , both increment distributions converge to a distribution, so we expect that where is the first hitting time of the Poisson random walk. On the other hand, as , we expect that . The problem then reduces to computing the probability that a Poisson random walk *ever* hits the origin. This computation can be done explicitly, and this is precisely where the mysterious constant comes from!

We now proceed to make this intuition precise. First, we show that the probability can indeed be replaced by , as one might expect.

Lemma.as .

**Proof.** We need to show that

Note that as when ,

We can evidently write

where and

Choosing , we obtain for . Therefore,

This completes the proof.

By the above Lemma, and a trivial upper bound, we obtain

To complete computation of the mean size of the giant component, it therefore remains to show that and converge to . In fact, we can compute these quantities exactly.

Lemma.Let . Then

where is the smallest positive solution of .

**Proof.** Recall the martingale used in last lecture:

Suppose that and . Then

The first equality holds since if then and , while if then and . The second equality holds by dominated convergence since , and the third equality is by the optional stopping theorem.

Now suppose we can find such that as . Then we have

by dominated convergence. Thus, evidently, it suffices to find with the requisite properties. Now note that as , , and for , we evidently must have

We can find such by inspecting the following illustration:

Evidently the requisite assumptions are satisfied when is the smallest root of the equation (but not for the larger root at !)

**Remark.** Note that the supercritical case is essential here. If then the equation for has no solutions , and the argument in the proof does not work. In fact, when , we have .

By an immediate adaptation of the proof of the previous lemma, we obtain

where is the smallest positive solution of . Letting , we see that

where is the smallest solution of the equation (which is precisely the probability that the Poisson random walk hits zero, by the identical proof to the lemma above). We have therefore proved

**Variance of the giant component size**

To complete the proof of Part 2 of the giant component theorem, it remains to show that

To this end, let us consider

To estimate the terms in this sum, we condition on one of the components:

To proceed, note that the event can be written as

In particular, the event is independent of the edges for . Therefore, for , the conditional law of given coincides with the (unconditional) law of , the conncted component containing in the induced subgraph on the vertices :

As this quantity only depends on by the symmetry of the Erdös-Rényi model, we can evidently write

for , . In particular, we obtain

Now note that, by its definition, is distributed precisely as the component containing vertex in the random graph model. We can therefore show, repeating exactly the proof of the mean size of the giant component above, that

We have therefore shown that

which evidently implies

This is what we set out to prove.

**Remark.** It should be noted that the proof of Part 2 did not depend on the value of , or even on the rate, in the definition of the set : any sequence that grows sublinearly to infinity would have given the same result. This suggests that all but a vanishing fraction of vertices are contained in connected components of order or . We find out only in the next lecture why the rate (for sufficiently large!) is important: only sufficiently large connected components are guaranteed to intersect, while there might (and do) exist components of order that are disjoint from the giant component. If we do not exclude the latter, we will not be able to prove Part 1.

*Many thanks to Weichen Wang for scribing this lecture!*