Lecture 4. Entropic CLT (1)
The subject of the next lectures will be the entropic central limit theorem (entropic CLT) and its proof.
Theorem (Entropic CLT). Let be i.i.d. real-valued random variables with mean zero and unit variance. Let
If for some , then , or equivalently . That is, the entropy of increases monotonically to that of the standard Gaussian.
Recall that when is Gaussian with the same mean and variance as , which explains the equivalence stated in the theorem.
Let us note that the assumption for some represents a genuine (but not unexpected) restriction: in particular, it implies that the entropic CLT does not apply if are discrete.
Entropy power inequality
Historically, the first result on monotonicity of entropy in the CLT was that for all . This follows directly from an important inequality for entropy, the entropy power inequality (EPI). The rest of this lecture and part of the next lecture will be devoted to proving the EPI. While the EPI does not suffice to establish the full entropic CLT, the same tools will prove to be crucial later on.
Entropy power inequality. Let and be independent real-valued random variables such that , , and all exist. Then
with equality if and only if and are Gaussian.
Before we embark on the proof, let us make some remarks.
Remark. None of the assumptions about existence of entropies is redundant: it can happen that and exist but does not.
Remark. If and are i.i.d., , and , then the EPI implies
which implies . Here we have used the easy-to-check equality , which of course implies . From this observation, the proof of the claim that is immediate: simply note that is the sum of two independent copies of .
Remark. It is easy to check that . In fact, this is true in much more general settings (e.g. on locally compact groups, with entropy defined relative to Haar measure). The EPI is a much stronger statement particular to real-valued random variables.
Remark. The EPI admits the following multidimensional extension.
Multidimensional EPI. Let and be independent -valued random vectors such that , , and all exist. Then
with equality if and only if and are Gaussian with proportional covariance matrices.
Define the entropy power for an -valued random vector by
The EPI says that is superadditive under convolution.
Digression: EPI and Brunn-Minkowski
A good way to develop an appreciation for what the EPI is saying is in analogy with the Brunn-Minkowski inequality. If are Borel sets and denotes -dimensional Lebesgue measure, then
where is the Minkowski sum. In particular, note that is proportional up to an absolute constant to the radius of the -dimensional Euclidean ball whose volume matches that of . The Brunn-Minkowski inequality expresses superadditivity of this functional (and we clearly have equality for balls). The Brunn-Minkowski inequality is of fundamental importance in various areas of mathematics: for example, it implies the isoperimetric inequality in , which states that Euclidean balls with volume have the minimal surface area among all subsets of with volume .
In a sense, the EPI is to random variables as the Brunn-Minkowski inequality is to sets. The Gaussians play the role of the balls, and variance corresponds to radius. In one dimension, for example, since
we see that is proportional to the variance of the Gaussian whose entropy matches that of . The entropy power inequality expresses superadditivity of this functional, with equality for Gaussians.
Proposition. The EPI is equivalent to the following statement: if and are independent and and are independent Gaussians with and , then provided that all of the entropies exist.
Proof. Both implications follow from
Proof of the entropy power inequality
There are many proofs of the EPI. It was stated by Shannon (1948) but first fully proven by Stam (1959); different proofs were later provided by Blachman (1969), Lieb (1978), and many others. We will follow a simplified version of Stam’s proof. We work from now on in the one-dimensional case for simplicity.
Definition (Fisher information). Let be a -valued random variable whose density is an absolutely continuous function. The score function of is defined as
The Fisher information (FI) of is defined as
Remark. Let be a parametric statistical model. In statistics, the score function is usually defined by , and the Fisher information by . This reduces to our definition in the special case of location families, where for some probability density : in this case, and we have
Thus for location families does not depend on and coincides precisely with the Fisher information as we defined it above for a random variable with density . The statistical interpretation allows us do derive a useful inequality. Suppose for simplicity that . Then for every , so is an unbiased estimator of . The Cramér-Rao bound therefore implies the inequality
with equality if and only if is Gaussian. The same conclusion holds when is arbitrary, as both FI and variance are invariant under translation.
Remark. There is a Fisher information analogue of the entropic CLT: in the setup of the entropic CLT, subject to an additional variance constraint, we have . Moreover, Fisher information is minimized by Gaussians. This is often stated in terms of normalized Fisher information, defined as . Note that is both translation and scale invariant: and . We have , with equality if and only if is Gaussian, by the previous remark. The Fisher information analogue of the entropic CLT can now be restated as .
The strategy of the proof of the EPI is as follows:
- We first prove an inequality for Fisher information:
- Develop an integral identity relating and .
- Combine (1) and (2) to get the EPI.
The reason to concentrate on Fisher information is that is an -type quantity, as opposed to the entropy which is an -type quantity. This makes easier to work with.
We begin with some technical results about Fisher information.
Lemma. If has absolutely continuous density and , then has bounded variation. In particular, is bounded.
Proof. Let denote the set of points at which is differentiable, so that . Define for . Let . Then
Lemma. Let be measurable. Let be defined as . If is bounded, then defined by is absolutely continuous on with a.e.
Proof. Let ; we want to show that is absolutely continuous with derivative . First observe that is integrable on from the Cauchy-Schwarz inequality and the boundedness condition. Thus we can compute
which proves the desired result.
Corollary. and, if , then .
Proof. Take in the lemma above for the first claim. Take for the second.
To be continued…
Lecture by Mokshay Madiman | Scribed by Dan Lacker