Lecture 7. Entropic CLT (4)

This lecture completes the proof of the entropic central limit theorem.

From Fisher information to entropy (continued)

In the previous lecture, we proved the following:

Theorem. If X_1, ..., X_n are independent random variables with I(X_i)<+\infty, then

(1)   \begin{equation*} I\left(\sum_{i\in[n]}X_{i}\right)\leq\sum_{s\in\mathcal{G}}\frac{\omega_{s}^{2}}{\beta_{s}}\, I\left(\sum_{i\in s}X_{i}\right), \end{equation*}

where \{\omega_{s}: s\in\mathcal{G}\} are weights satisfying \sum_{s\in\mathcal{G}}\omega_{s}=1 and \{\beta_{s}\colon s\in\mathcal{G}\} is any fractional packing of the hypergraph G in the sense that \sum_{s\in\mathcal{G}, s\ni i}\beta_{s}\leq 1 for every i\in[n]=\{1,...,n\}.

By optimizing the choice of the weights, it is easy to check that (1) is equivalent to

    \begin{equation*} \frac{1}{I\left(\sum_{i\in[n]}X_{i}\right)}\geq \sum_{s\in\mathcal{G}}\frac{\beta_{s}}{I\left(\sum_{i\in s}X_{i}\right)}. \end{equation*}

In fact, an analogous statement holds for entropy:

(2)   \begin{equation*} e^{2h\left(\sum_{i\in[n]}X_{i}\right)}\geq \sum_{s\in\mathcal{G}}\beta_{s}\, e^{2h\left(\sum_{i\in s}X_{i}\right)}. \end{equation*}

Proving (2) for general fractional packings and general hypergraphs requires some additional steps that we will avoid here. Instead, we will prove the a special case, due to Artstein, Ball, Barthe and Naor (2004), that suffices to resolve the question of monotonicity of entropy in the CLT.

In the following, let \mathcal{G} be the set of all subsets of [n] of size n-1 and take \beta_{s}=\frac{1}{n-1} for every s\in\mathcal{G}. In this case, (2) takes the following form.

Theorem EPI. If X_1, ..., X_n are independent random variables, then

    \begin{equation*} e^{2h\left(\sum_{i\in[n]}X_{i}\right)}\geq \frac{1}{n-1}\sum_{i=1}^{n}e^{2h\left(\sum_{j\neq i}X_{j}\right)}, \end{equation*}

provided that all the entropies exist.

To prove Theorem EPI, we will use the de Bruijn identity which was discussed in the previous lecture. Let us rewrite it in a useful form for the coming proof and give a proof.

Theorem (de Bruijn identity). Let X be a random variable with density f on \mathbb{R}. Let X^{t}=e^{-t}X+\sqrt{1-e^{-2t}}Z, t\geq 0, where Z\sim N(0,1) is independent of X. We have

    \begin{equation*} \frac{d}{dt}h(X^{t})=I(X^{t})-1. \end{equation*}

In particular,

(3)   \begin{equation*} h(N(0,1))-h(X)=\int_{0}^{\infty}[I(X^{t})-1]dt. \end{equation*}

Proof. Let f_{t} be the density of X^{t}. Then

    \begin{equation*} \frac{\partial f_{t}(x)}{\partial t}=(Lf_{t})(x), \end{equation*}

where (L\psi)(x)=\psi^{\prime\prime}(x)+\frac{d}{dx}[x\psi(x)]. It is easy to check that \int (Lf_{t})(x)dx=0. Hence,

    \begin{equation*} \begin{split} \frac{d}{dt}\left[-\int f_{t}(x)\log f_{t}(x)dx\right]&=-\int \frac{\partial f_{t}(x)}{\partial t}\log f_{t}(x)dx-\int f_{t}(x)\frac{1}{f_{t}(x)}\frac{\partial f_{t}(x)}{\partial x}dx\\ &=-\int (Lf_{t})(x)\log f_{t}(x)dx-\int (Lf_{t})(x)dx\\ &=-\int f_{t}^{\prime\prime}(x)dx\log f_{t}(x)dx-\int \frac{\partial}{\partial x}[xf_{t}(x)]\log f_{t}(x)dx\\ &=\int \frac{(f_{t}^{\prime}(x))^{2}}{f_{t}(x)}dx+\int xf_{t}^{\prime}(x)dx\\ &=I(X^{t})-\int f_{t}(x)dx\\ &=I(X^{t})-1. \end{split} \end{equation*}

Integrating from 0 to \infty gives (3). \square

Proof of Theorem EPI. In (1), let \mathcal{G} be the set of all subsets of [n] of size n-1 and \beta_{s}=\frac{1}{n-1}. We have

(4)   \begin{equation*} I\left(\sum_{i\in[n]}X_{i}\right)\leq(n-1)\sum_{i\in[n]}w_{i}^{2}\, I\left(\sum_{j\neq i}X_{j}\right), \end{equation*}

where, w_{i}=\omega_{[n]\setminus\{i\}} is the weight corresponding to the set [n]\setminus\{i\}\in\mathcal{G}. It is easy to check that (4) is equivalent, for every (a_{1},...,a_{n})\in\mathbb{R}^{n} with \sum_{i=1}^{n}a_{i}^{2}=1, to

    \begin{equation*} I\left(\sum_{i=1}^{n}a_{i}X_{i}\right)\leq(n-1)\sum_{i=1}^{n}w_{i}^{2}\, I\left(\sum_{j\neq i}a_{j}X_{j}\right)=(n-1)\sum_{i=1}^{n}\frac{w_{i}^{2}}{\sum_{j\neq i}a_{j}^{2}}\, I\left(\frac{\sum_{j\neq i}a_{j}X_{j}}{\sqrt{\sum_{j\neq i}a_{j}^{2}}}\right). \end{equation*}

Note the last equality is due to the scaling property of Fisher information. Recalling that w_{1},...,w_{n} are arbitrarily chosen weights with \sum_{i=1}^{n}w_{i}=1, we conclude that

(5)   \begin{equation*} I\left(\sum_{i=1}^{n}a_{i}X_{i}\right)\leq(n-1)\sum_{i=1}^{n}b_{i}^{2}\, I\left(\frac{\sum_{j\neq i}a_{j}X_{j}}{\sqrt{\sum_{j\neq i}a_{j}^{2}}}\right), \end{equation*}

for any choice of (a_1,...,a_n)\in\mathbb{R}^n with \sum_{i=1}^{n}a_{i}^{2}=1 and (b_1,...,b_n)\in \mathbb{R}^{n}_{+} with \sum_{i=1}^{n}b_{i}\sqrt{\sum_{j\neq i}a_{j}^{2}}=1.

Let us keep the choice of such (a_1,...,a_n) arbitrary for the moment and let b_{i}=\frac{1}{n-1}\sqrt{\sum_{j\neq i}a_{j}^{2}}. Then

(6)   \begin{equation*}\sum_{i=1}^{n}b_{i}\sqrt{\sum_{j\neq i}a_{j}^{2}}=\sum_{i=1}^{n}\frac{1}{n-1}\sum_{j\neq i}a_{j}^{2}=\frac{1}{n-1}\,(n-1)\sum_{i=1}^{n}a_{i}^{2}=1. \end{equation*}

Let X_{i}^{t}=e^{-t}X_{i}+\sqrt{1-e^{-2t}}Z_{i}, where Z_{i} is independent of all else. Then

    \begin{equation*}I\left(\left(\sum_{i=1}^{n}a_{i}X_{i}\right)^{t}\right)=I\left(\sum_{i=1}^{n}a_{i}X_{i}^{t}\right)\leq \frac{1}{n-1}\sum_{i=1}^{n}\left(\sum_{j\neq i}a_{j}^{2}\right)\, I\left(\frac{\sum_{j\neq i}a_{j}X_{j}^{t}}{\sqrt{\sum_{j\neq i}a_{j} ^{2}}}\right). \end{equation*}

Note that

    \begin{equation*} \begin{split}\frac{\sum_{j\neq i}a_{j}X_{j}^{t}}{\sqrt{\sum_{j\neq i}a_{j}^{2}}}&=\frac{\sum_{j\neq i}a_{j}\left(e^{-t}X_{j}+\sqrt{1-e^{-2t}}Z_{j}\right)}{\sqrt{\sum_{j\neq  i}a_{j}^{2}}}\\&=e^{-t}\cdot\frac{\sum_{j\neq i}a_{j}X_{j}}{\sqrt{\sum_{j\neq i}a_{j}^{2}}}+\sqrt{1-e^{-2t}}\cdot\frac{\sum_{j\neq i}a_{j}Z_{j}}{\sqrt{\sum_{j\neq i}a_{j}^{2}} }\\&\stackrel{(D)}{=}\left(\frac{\sum_{j\neq i}a_{j}X_{j}}{\sqrt{\sum_{j\neq i}a_{j }^{2}}}\right)^{t}, \end{split} \end{equation*}

where the equality in distribution is due to the fact that \frac{\sum_{j\neq i} a_{j}Z_{j}}{\sqrt{\sum_{j\neq i}a_{j}^{2}}}\sim N(0,1). Hence, by (6),

    \begin{equation*}I\left(\left(\sum_{i=1}^{n}a_{i}X_{i}\right)^{t}\right)-1\leq \frac{1}{n-1}\sum_{i=1}^{n}\left(\sum_{j\neq i}a_{j}^{2}\right)\left[I\left(\left(\frac{\sum_ {j\neq i}a_{j}X_{j}}{\sqrt{\sum_{j\neq i}a_{j}^{2}}}\right)^{t}\right)-1\right]. \end{equation*}

We use de Bruijn identity to integrate this from 0 to \infty over time and get

    \begin{equation*}h(N(0,1))-h\left(\sum_{i=1}^{n}a_{i}X_{i}\right)\leq\frac{1}{n-1}\sum_{i=1}^{n}\left(\sum_{j\neq i}a_{j}^{2}\right)\left[h(N(0,1))-h\left(\frac{\sum_{j\neq  i} a_{j}X_{j}}{\sqrt{\sum_{j\neq i}a_{j}^{2}}}\right)\right]. \end{equation*}

By (6) again,

(7)   \begin{equation*}h\left(\sum_{i=1}^{n}a_{i}X_{i}\right)\geq \frac{1}{n-1}\sum_{i=1}^{n}\left(\sum_{j\neq i}a_{j}^{2}\right)h\left(\frac{\sum_{j\neq i} a_{j}X_{j}}{\sqrt{\sum_{j\neq i}a_{j}^{2}}}\right) \end{equation*}

This is the entropy analog of the Fisher information inequality obtained above.

As a final step, let N_{i}=e^{2h\left(\sum_{j\neq i}X_{j}\right)}. We are to show

    \[e^{2h\left(\sum_{i=1}^{n}X_{i}\right)}\geq \frac{1}{n-1}\sum_{i=1}^{n}N_{i}.\]

If N_{k}\geq \frac{1}{n-1}\sum_{i=1}^{n}N_{i} for some k, the result is immediate due to the general fact h(Y_1+Y_2)\geq h(Y_1) (convolution increases entropy). Hence, we assume that N_{k}\leq \frac{1}{n-1}\sum_{i=1}^{n}N_{i} for every k, so

    \begin{equation*} \lambda_{k}:=\frac{N_{k}}{\sum_{i=1}^{n}N_{i}}\leq \frac{1}{n-1}. \end{equation*}

Set a_{i}=\sqrt{1-(n-1)\lambda_{i}}, i\in[n]. Note that

    \begin{equation*} \sum_{i=1}^{n}a_{i}^{2}=\sum_{i=1}^{n}1-(n-1)\sum_{i=1}^{n}\lambda_{i}=n-(n-1)=1. \end{equation*}

With this choice of a_1,...,a_n, we apply (7). After some computations, we get

    \begin{equation*} h\left(\sum_{i=1}^{n}X_{i}\right)\geq \sum_{i=1}^{n}\lambda_{i}\,h\left(\frac{1}{\sqrt{(n-1)\lambda_{i}}}\sum_{j\neq i}X_{j}\right). \end{equation*}

Using the definition of the \lambda_{i}, the final result follows from here immediately. \square

Proof of the Entropic Central Limit Theorem

Entropic CLT. Suppose that X_1, ..., X_n are i.i.d. random variables with \mathbb{E} X_{i}=0 and \text{Var}X_{i}=1. Let S_{n}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i}. If h(S_{n})>-\infty for some n, then h(S_{n})\uparrow h(N(0,1)) as n\rightarrow\infty. Equivalently, if D(\mathcal{L}(S_{n})||N(0,1))<+\infty for some n, then D(\mathcal{L}(S_{n})||N(0,1))\downarrow 0 as n\rightarrow\infty.

In the case of i.i.d. random variables, Theorem EPI gives

    \begin{equation*} e^{2h\left(\sum_{i=1}^{n}X_{i}\right)}\geq \frac{n}{n-1}e^{2h\left(\sum_{j=1}^{n-1}X_{i}\right)}, \end{equation*}

which is equivalent to h(S_n)\geq h(S_{n-1}) because of the identity h(aX)=h(X)+\log\abs{a}. Therefore, it remains to show h(S_{n})\rightarrow h(N(0,1)) as n\rightarrow \infty. For this, we need some analytical properties of the relative entropy. Henceforth, we use \mathcal{P}(\mathcal{X}) to denote the set of all probability measures on some Polish space \mathcal{X} (take \mathcal{X}=\mathbb{R}^d for instance).

Proposition (Variational characterization of D). Let \gamma,\theta\in\mathcal{P}(\mathcal{X}). Then

    \begin{equation*} D(\gamma||\theta)=\sup_{g\in\text{BM}(\mathcal{X})}\left(\int gd\gamma-\log\int e^{g}d\theta\right), \end{equation*}

where \text{BM}(\mathcal{X}) denotes the set of all bounded measurable functions g:\mathcal{X}\rightarrow\mathbb{R}.


  1. In the variational characterization of D above, it is enough to take the supremum over the set \text{BC}(\mathcal{X}) of all bounded continuous functions g:\mathcal{X}\rightarrow\mathbb{R}.
  2. For fixed g\in\text{BM}(\mathcal{X}), the mapping (\gamma,\theta)\mapsto\int gd\gamma-\log\int e^{g}d\theta is convex and continuous. Since D(\gamma||\theta) is the supremum over a class of convex, continuous functions, it is convex and lower semicontinuous. These properties of relative entropy, made transparent by the variational characterization, are very useful in many different contexts.

Corollary. Sublevel sets of D(\cdot||\theta) are compact, that is, the set \{\mu\in\mathcal{P}(\mathcal{X})\colon D(\mu||\theta)\leq M\} is compact (with respect to the topology of weak convergence) for every M<+\infty and \theta\in\mathcal{P}(\mathcal{X}).

Before we prove the corollary, let us recall the definition of tightness and Prohorov Theorem.

Definition (Tightness). A set A\subseteq\mathcal{P}(\mathcal{X}) is called tight if for every \varepsilon>0 there exists a compact set K\subseteq\mathcal{X} such that \mu(K^{c})<\varepsilon for every \mu\in A.

Prohorov Theorem. A set A\subseteq\mathcal{P}(\mathcal{X}) is weakly precompact if and only if it is tight.

Proof of Corollary. Let (\mu_{n})_{n\in\mathbb{N}} be a sequence in \mathcal{P}(\mathcal{X}) with \sup_{n\in\mathbb{N}}D(\mu_{n}||\theta)\leq M. By the variational characterization of D, for every g\in\text{BM}(\mathcal{X}), we have

(8)   \begin{equation*} \int gd\mu_{n}-\log \int e^{g}d\theta\leq D(\mu_{n}||\theta)\leq M. \end{equation*}

Note that \{\theta\} is a tight set as a singleton. We claim that the sequence (\mu_{n})_{n\in\mathbb{N}} is also tight. Indeed, let \varepsilon>0 and let K\subseteq\mathcal{X} be a compact set with \theta(K^{c})<\varepsilon. We take g=\log\left(1+\frac{1}{\varepsilon}\right)1_{K^{c}} and apply (8) to get

    \begin{equation*} \log\left(1+\frac{1}{\varepsilon}\right)\mu_{n}(K^{c})-\log\left[1+\frac{1}{\varepsilon}\,\theta(K^{c})\right]\leq M. \end{equation*}


    \begin{equation*} \mu_{n}(K^{c})\leq\frac{1}{\log\left(1+\frac{1}{\varepsilon}\right)}\left(M+\log\left[1+\frac{1}{\varepsilon}\,\theta(K^{c})\right]\right)\leq \frac{M+\log 2}{\log\left(1+\frac{1}{\varepsilon}\right)}, \end{equation*}

where the rightmost term can be made arbitrarily small. Hence, (\mu_{n})_{n\in\mathbb{N}} is tight. By Prohorov Theorem, there exists \mu\in\mathcal{P}(\mathcal{X}) such that \mu_{n}\Rightarrow\mu as n\rightarrow\infty. By lower semicontinuity of D, we have

    \begin{equation*} D(\mu||\theta)\leq \liminf_{n\rightarrow\infty}D(\mu_{n}||\theta)\leq M. \end{equation*}

This finishes the proof of the corollary. \square

Proof of Variational Characterization of D. As a first case, suppose \gamma\not\ll\theta. So D(\gamma||\theta)=+\infty and there exists a Borel set A\subseteq \mathcal{X} with \theta(A)=0 and \gamma(A)>0. Let g_{n}=n\cdot 1_{A}\in\text{BM}(\mathcal{X}). We have

    \begin{equation*} \int g_{n}d\gamma-\log\int e^{g_{n}}d\theta=n\gamma(A)\rightarrow+\infty \end{equation*}

as n\rightarrow\infty. Hence, both sides of the variational characterization are equal to +\infty.

For the rest, we assume \gamma\ll\theta. First, we show the \geq part. If D(\gamma||\theta)=+\infty, the inequality is obvious. Suppose D(\gamma||\theta)<+\infty. Given g\in\text{BM}(\mathcal{X}), define a probability measure \gamma_{0}\sim\theta by

    \begin{equation*} \frac{d\gamma_{0}}{d\theta}=\frac{e^{g}}{\int e^{g}d\theta}. \end{equation*}


    \begin{equation*} \begin{split} D(\gamma||\theta)&=\int \log\left(\frac{d\gamma}{d\theta}\right)d\gamma\\ &=\int \log\left(\frac{d\gamma}{d\gamma_{0}}\right)d\gamma+\log\left(\frac{d\gamma_{0}}{d\theta}\right)d\gamma\\ &=D(\gamma||\gamma_{0})+\int\left(g-\log\int e^{g}d\theta\right)d\gamma\\ &\geq \int gd\gamma-\log\int e^{g}d\theta. \end{split} \end{equation*}

Since g\in\text{BM}(\mathcal{X}) is chosen arbitrarily, taking supremum on the right hand side gives the \geq part.

Next, we prove the \leq part. Note that if g=\log\frac{d\gamma}{d\theta}, then \int gd\gamma-\log\int e^{g}d\theta=\int \log\frac{d\gamma}{d\theta}d\gamma=D(\gamma||\theta). However, this choice of g may not be in \text{BM}(\mathcal{X}), that is, \frac{d\gamma}{d\theta} may fail to be bounded or bounded away from zero. So we employ the following truncation argument. Let

    \begin{equation*} g_n=\min\{\max\{g,-n\},n\}\in\text{BM}(\mathcal{X}), \end{equation*}

so that g_n\rightarrow g as n\rightarrow\infty. Note that g_{n}1_{\{g\geq 0\}}\uparrow g1_{\{g\geq 0\}} and g_{n}1_{\{g< 0\}}\downarrow g1_{\{g< 0\}}. Thus we have

    \begin{equation*} \begin{split} \lim_{n\rightarrow\infty}\int e^{g_{n}}d\theta&=\lim_{n\rightarrow\infty}\left(\int e^{g_{n}}1_{\{g\geq 0\}}d\theta+\int e^{g_{n}}1_{\{g< 0\}}d\theta\right)\\ &=\lim_{n\rightarrow\infty}\left(\int e^{\min\{g,n\}}1_{\{g\geq 0\}}d\theta+\int e^{\max\{g,-n\}}1_{\{g< 0\}}d\theta\right)\\ &=\int e^{g}1_{\{g\geq 0\}}d\theta+\int e^{g}1_{\{g< 0\}}d\theta\\ &=\int e^{g}d\theta\\ &=\int \frac{d\gamma}{d\theta}d\theta\\ &=1 \end{split} \end{equation*}

by monotone convergence. On the other hand, by Fatou’s Lemma, we have

    \begin{equation*} \liminf_{n\rightarrow\infty}\int g_{n}d\gamma\geq \int gd\gamma=D(\gamma||\theta). \end{equation*}


    \begin{equation*} \liminf_{n\rightarrow\infty}\left(\int g_{n}d\gamma-\log\int e^{g_{n}}d\theta\right)\geq D(\gamma||\theta), \end{equation*}

from which the \leq part of the result follows. \square

Building on these now standard facts (whose exposition above follows that in the book of Dupuis and Ellis), Harremoes and Vignat (2005) gave a short proof of the desired convergence, which we will follow below. It relies on the fact that for uniformly bounded densities within the appropriate moment class, pointwise convergence implies convergence of entropies.

Lemma. If Y_1, Y_2, ... are random variables with \E Y_n =0, \E Y_{n}^{2}=1, and the corresponding densities f_1, f_2, ... are uniformly bounded with f_n\rightarrow f as n\rightarrow \infty(pointwise) for some density f, then h(f_n)\rightarrow h(f) and D(f_n||N(0,1))\rightarrow D(f||N(0,1)) as n\rightarrow\infty.

Proof. Recall D(f||N(0,1))=h(N(0,1))-h(f) for f with mean 0 and variance 1. By lower semicontinuity of D, we have

    \begin{equation*} \limsup_{n\rightarrow\infty}h(f_n)\leq h(f). \end{equation*}

On the other hand, letting c=\sup_{n,x}f_{n}(x), we have

    \begin{equation*} h(f_n)=c\int \left(-\frac{f_n(x)}{c}\log\frac{f_n(x)}{c}\right)dx-\log c, \end{equation*}

and using Fatou’s Lemma,

    \begin{equation*}\liminf_{n\rightarrow\infty}h(f_n)\geq c\int \left(-\frac{f(x)}{c}\log\frac{f(x) }{c}\right)dx-\log c=-\int f(x)\log f(x)dx=h(f). \end{equation*}

Hence, h(f_n)\rightarrow h(f) as n\rightarrow\infty. \square

End of proof of Entropic CLT. Assume D(\mathcal{L}(S_N)||N(0,1))<+\infty. We will use J(X)=I(X)-1 to denote normalized Fisher information. For any t\in (0,\infty), we have that J(S^{t}_{n-1})\geq J(S^{t}_{n}) for n>N. So J(S^{t}_{n})\rightarrow g(t) as n\rightarrow \infty for every t\in(0,\infty). We want to show that g\equiv 0, since then we will get by Lebesgue’s dominated convergence theorem that

    \[D(\mathcal{L}(S_n)||N(0,1))= \int_0^\infty J(S^{t}_{n}) dt \rightarrow \int_0^\infty g(t) dt= 0\]

as n\rightarrow\infty. But since

    \[D(\mathcal{L}(S_n^u)||N(0,1))= \int_u^\infty J(S^{t}_{n}) dt \rightarrow \int_u^\infty g(t) dt ,\]

it is enough to show that D(\mathcal{L}(S_n^u)||N(0,1))\rightarrow 0 for each u>0.

By the monotonicity property we have proved, we know that

    \[D(\mathcal{L}(S_n)||N(0,1)) \leq D(\mathcal{L}(S_N)||N(0,1)) <\infty\]

for any n>N. By compactness of sublevel sets of D, the sequence S_n must therefore have a subsequence S_{n_k} whose distribution converges to a probability measure (let us call Z a random variable with this limiting measure as its distribution). For u>0, the smoothing caused by Gaussian convolution implies that the density of S_{n_k}^u converges pointwise to that of Z^u, and also that the density of S_{2n_k}^u converges pointwise to that of \frac{Z^u + \tilde{Z}^u}{\sqrt{2}}, where \tilde{Z} is an independent copy of Z. By the previous lemma

    \[D(\mathcal{L}(S_{n_k}^u)||N(0,1)) \rightarrow D(\mathcal{L}(Z^u)\|N(0,1))\]

as k\rightarrow \infty, and

    \[D(\mathcal{L}(S_{2n_k}^u)||N(0,1)) \rightarrow  D(\mathcal{L}(\tfrac{Z^u + \tilde{Z}^u}{\sqrt{2}})\|N(0,1)) ,\]

so that necessarily

    \[D(\mathcal{L}(\tfrac{Z^u + \tilde{Z}^u}{\sqrt{2}})\|N(0,1)) =D(\mathcal{L}(Z^u)\|N(0,1)) .\]

By the equality condition in the entropy power inequality, this can only happen if Z^u is Gaussian, which in turn implies that g(u)=J(Z^u)=0. \square

Lecture by Mokshay Madiman | Scribed by Cagin Ararat

05. December 2013 by Ramon van Handel
Categories: Information theoretic methods | Comments Off on Lecture 7. Entropic CLT (4)