Lecture 5. Entropic CLT (2)
The goal of this lecture is to prove monotonicity of Fisher information in the central limit theorem. Next lecture we will connect Fisher information to entropy, completing the proof of the entropic CLT.
Two lemmas about the score function
Recall that for a random variable with absolutely continuous density , the score function is defined as and the Fisher information is .
The following lemma was proved in the previous lecture.
Lemma 1. Let be a random variable with finite Fisher information. Let be measurable and let . If is bounded on the interval , then the function is absolutely continuous on with a.e.
There is a converse to the above lemma, which gives a useful characterization of the score function.
Lemma 2. Let be a random variable with density and let be a measurable function with . Suppose that for every bounded measurable function , the function is absolutely continuous on with a.e. Then there must exist an absolutely continuous version of the density , and moreover a.s.
Proof. Take . Then
Hence is in fact continuously differentiable with derivative . On the other hand,
by our assumption. By continuity of and the assumption , we must have
for every . Hence a.e. Since , the proof is complete.
Score function of the sum of independent variables
We now show a key property of the score function: the score function of the sum of two independent random variables is a projection.
Proposition. Let and be two independent random variables. Suppose has finite Fisher information. Then a.s.
Proof. Let . By Lemma 2, we only need to show that the function is locally absolutely continuous with a.e. for every bounded measurable function .
Fix . By independence of and , we can apply Lemma 1 to (conditioned on ) to obtain
Taking expectation of both sides and applying Fubini, we get
we arrive at
which implies a.e.
Remark. The well-known interpretation of conditional expectation as a projection means that under the assumption of finite Fisher information (i.e., score functions in ), the score function of the sum is just the projection of the score of a summand onto the closed subspace . This implies directly, by the Pythagorean inequality, that convolution decreases Fisher information: . In fact, we can do better, as we will see forthwith.
Monotonicity of FI in the CLT along a subsequence
We now make a first step towards proving monotonicity of the Fisher information in the CLT.
Theorem. Let and be independent random variables both with finite Fisher information. Then
for any .
Before going into the proof, we make some remarks.
Remark. Taking , we get . Hence the above theorem is a significant strengthening of the simple fact that convolution decreases Fisher information.
Remark. By taking , we get the following optimized version of the theorem:
Remark. Let be an i.i.d. sequence of random variables with mean zero and unit variance. Let . By taking in the theorem and using scaling property of Fisher information, it is easy to obtain . Hence, the above theorem already implies monotonicity of Fisher information along the subsequence of times : that is, is monotone in .
However, the theorem is not strong enough to give monotonicity without passing to a subsequence. For example, if we apply the previous remark repeatedly we only get , which is not very interesting. To prove full monotonicity of the Fisher information, we will need a strengthening of the above Theorem. But it is instructive to first consider the proof of the simpler case.
Proof. By the projection property of the score function,
Applying the conditional Jensen inequality, we obtain,
Taking expectations of both sides and using independence, we obtain
By Lemma 1, . This finishes the proof.
Monotonicity of Fisher information in the CLT
To prove monotonicity of the Fisher information in the CLT (without passing to a subsequence) we need a strengthening of the property of Fisher information given in the previous section.
In the following, we will use the common notation .
Definition. Let be a collection of non-empty subsets of . Then a collection of non-negative numbers is called a fractional packing for if for all .
We can now state the desired strengthening of our earlier theorem.
Theorem. Let be a fractional packing for and let be independent random variables. Let be a collection of real numbers such that . Then
Remark. Suppose that the are identically distributed. Take . For each , define . It is easy to check that is a fractional packing of . Then
by the above theorem. By the scaling property of Fisher information, this is equivalent to , i.e., monotonicity of the Fisher information. This special case was first proved by Artstein, Ball, Barthe and Naor (2004) with a more complicated proof. The proof we will follow of the more general theorem above is due to Barron and Madiman (2007).
The proof of the above theorem is based on an analysis of variance (ANOVA) type decomposition, which dates back at least to the classic paper of Hoeffding (1948) on U-statistics. To state this decomposition, let be independent random variables, and define the Hilbert space
with inner product
For every , define an operator on as
Proposition. Each can be decomposed as
- if ;
- if ;
- does not depend on if .
Proof. It is easy to verify that is a projection operator, that is, and (self-adjointness). It is also easy to see that for . Hence we have
If , choose such that . Then
It is easily verified that is itself self-adjoint, so that follows directly. Finally, as by definition does not depend on , it is clear that does not depend on if .
The decomposition will be used in the form of the following variance drop lemma whose proof we postpone to next lecture. Here for , .
Lemma. Let be a fractional packing for . Let . Suppose that for each . Then
To be continued…
Lecture by Mokshay Madiman | Scribed by Liyao Wang