## Lecture 7. Entropic CLT (4)

This lecture completes the proof of the entropic central limit theorem.

**From Fisher information to entropy (continued)**

In the previous lecture, we proved the following:

Theorem.If are independent random variables with , then

(1)

where are weights satisfying and is any fractional packing of the hypergraph in the sense that for every .

By optimizing the choice of the weights, it is easy to check that (1) is equivalent to

In fact, an analogous statement holds for entropy:

(2)

Proving (2) for general fractional packings and general hypergraphs requires some additional steps that we will avoid here. Instead, we will prove the a special case, due to Artstein, Ball, Barthe and Naor (2004), that suffices to resolve the question of monotonicity of entropy in the CLT.

In the following, let be the set of all subsets of of size and take for every . In this case, (2) takes the following form.

Theorem EPI.If are independent random variables, then

provided that all the entropies exist.

To prove Theorem EPI, we will use the de Bruijn identity which was discussed in the previous lecture. Let us rewrite it in a useful form for the coming proof and give a proof.

Theorem (de Bruijn identity).Let be a random variable with density on . Let , , where is independent of . We have

(3)

**Proof.** Let be the density of . Then

where . It is easy to check that . Hence,

Integrating from to gives (3).

**Proof of Theorem EPI.** In (1), let be the set of all subsets of of size and . We have

(4)

where, is the weight corresponding to the set . It is easy to check that (4) is equivalent, for every with , to

Note the last equality is due to the scaling property of Fisher information. Recalling that are arbitrarily chosen weights with , we conclude that

(5)

for any choice of with and with .

Let us keep the choice of such arbitrary for the moment and let . Then

(6)

Let , where is independent of all else. Then

Note that

where the equality in distribution is due to the fact that . Hence, by (6),

We use de Bruijn identity to integrate this from to over time and get

By (6) again,

(7)

This is the entropy analog of the Fisher information inequality obtained above.

As a final step, let . We are to show

If for some , the result is immediate due to the general fact (convolution increases entropy). Hence, we assume that for every , so

Set , . Note that

With this choice of , we apply (7). After some computations, we get

Using the definition of the , the final result follows from here immediately.

**Proof of the Entropic Central Limit Theorem**

Entropic CLT.Suppose that are i.i.d. random variables with and . Let . If for some , then as . Equivalently, if for some , then as .

In the case of i.i.d. random variables, Theorem EPI gives

which is equivalent to because of the identity . Therefore, it remains to show as . For this, we need some analytical properties of the relative entropy. Henceforth, we use to denote the set of all probability measures on some Polish space (take for instance).

Proposition (Variational characterization of ).Let . Then

where denotes the set of all bounded measurable functions .

**Remark.**

- In the variational characterization of above, it is enough to take the supremum over the set of all bounded continuous functions .
- For fixed , the mapping is convex and continuous. Since is the supremum over a class of convex, continuous functions, it is convex and lower semicontinuous. These properties of relative entropy, made transparent by the variational characterization, are very useful in many different contexts.

Corollary.Sublevel sets of are compact, that is, the set is compact (with respect to the topology of weak convergence) for every and .

Before we prove the corollary, let us recall the definition of tightness and Prohorov Theorem.

Definition (Tightness).A set is called tight if for every there exists a compact set such that for every .

Prohorov Theorem.A set is weakly precompact if and only if it is tight.

**Proof of Corollary.** Let be a sequence in with . By the variational characterization of , for every , we have

(8)

Note that is a tight set as a singleton. We claim that the sequence is also tight. Indeed, let and let be a compact set with . We take and apply (8) to get

Hence,

where the rightmost term can be made arbitrarily small. Hence, is tight. By Prohorov Theorem, there exists such that as . By lower semicontinuity of , we have

This finishes the proof of the corollary.

**Proof of Variational Characterization of .** As a first case, suppose . So and there exists a Borel set with and . Let . We have

as . Hence, both sides of the variational characterization are equal to .

For the rest, we assume . First, we show the part. If , the inequality is obvious. Suppose . Given , define a probability measure by

So

Since is chosen arbitrarily, taking supremum on the right hand side gives the part.

Next, we prove the part. Note that if , then . However, this choice of may not be in , that is, may fail to be bounded or bounded away from zero. So we employ the following truncation argument. Let

so that as . Note that and . Thus we have

by monotone convergence. On the other hand, by Fatou’s Lemma, we have

Hence,

from which the part of the result follows.

Building on these now standard facts (whose exposition above follows that in the book of Dupuis and Ellis), Harremoes and Vignat (2005) gave a short proof of the desired convergence, which we will follow below. It relies on the fact that for uniformly bounded densities within the appropriate moment class, pointwise convergence implies convergence of entropies.

Lemma.If are random variables with , , and the corresponding densities are uniformly bounded with as (pointwise) for some density , then and as .

**Proof.** Recall for with mean and variance . By lower semicontinuity of , we have

On the other hand, letting , we have

and using Fatou’s Lemma,

Hence, as .

**End of proof of Entropic CLT.** Assume . We will use to denote normalized Fisher information. For any , we have that for . So as for every . We want to show that , since then we will get by Lebesgue’s dominated convergence theorem that

as . But since

it is enough to show that for each .

By the monotonicity property we have proved, we know that

for any . By compactness of sublevel sets of , the sequence must therefore have a subsequence whose distribution converges to a probability measure (let us call a random variable with this limiting measure as its distribution). For , the smoothing caused by Gaussian convolution implies that the density of converges pointwise to that of , and also that the density of converges pointwise to that of , where is an independent copy of . By the previous lemma

as , and

so that necessarily

By the equality condition in the entropy power inequality, this can only happen if is Gaussian, which in turn implies that .

*Lecture by Mokshay Madiman* | *Scribed by Cagin Ararat*