An Inequality Involving a Product of Mutual Informations

Many inequal­i­ties in infor­ma­tion the­ory have entropies or mutual infor­ma­tions appear­ing as lin­ear addi­tive terms (e.g. Shannon-type inequal­i­ties). It’s also not uncom­mon to see entropies appear­ing as expo­nents (e.g. entropy power inequal­ity). But per­haps it’s not imme­di­ately seen how prod­ucts of mutual infor­ma­tions make sense.

Recently I have encoun­tered an inequal­ity involv­ing a prod­uct of mutual infor­ma­tions, which I can­not find a good way to prove (or dis­prove, though some numer­ics and asymp­totic analy­sis seem to sug­gest its valid­ity). I would much appre­ci­ate it if some­one could be smart and gra­cious enough to pro­vide a proof, or counter exam­ple, or generalization.

The for­mu­la­tion is quite sim­ple: sup­pose X is a binary ran­dom vari­able with  \displaystyle P(X=0)=P(X=1)=1/2. The ran­dom vari­able Y is the out­put of X pass­ing through a binary sym­met­ric chan­nel. More­over Z is another ran­dom vari­able such that X-Y-Z forms a Markov chain. The claim is that I(X;Z)\ge I(X;Y)I(Y;Z). (Note that the left side is also upper bounded by either one of the two fac­tors on the right side, by the well-known data pro­cess­ing  inequality.

At first glance this inequal­ity seems absurd because dif­fer­ent units appear on the two sides; but this may be resolved by con­sid­er­ing that Y has only one bit of information.

For a given joint dis­tri­b­u­tion of X and Y, the equal­ity can be achieved when Z is an injec­tive func­tion of Y.

An Extremal Conjecture: Experimenting with Online Collaboration

I have an extremal con­jec­ture that I have been work­ing on inter­mit­tently with some col­leagues (includ­ing Jiantao Jiao, Tsachy Weiss­man, Chan­dra Nair, and Kar­tik Venkat). Despite our efforts, we have not been able to prove it. Hence, I thought I would exper­i­ment with online col­lab­o­ra­tion by offer­ing it to the broader IT community.

In order to make things inter­est­ing, we are offer­ing a $1000 prize for the first cor­rect proof of the con­jec­ture, or a $250 award for the first coun­terex­am­ple! Feel free to post your thoughts in the pub­lic com­ments. You can also email me if you have ques­tions or want to bounce some ideas off me.

Although I have no way of enforc­ing them, please abide by the fol­low­ing ground rules:

  1. If you decide to work on this con­jec­ture, please send me an email to let me know that you are doing so. As part of this exper­i­ment with online col­lab­o­ra­tion, I want to gauge how many peo­ple become involved at var­i­ous degrees.
  2. If you solve the con­jec­ture or make sig­nif­i­cant progress, please keep me informed.
  3. If you repost this con­jec­ture, or pub­lish any results, please cite this blog post appropriately.

One final dis­claimer: this post is meant to be a brief intro­duc­tion to the con­jec­ture, with a few par­tial results to get the con­ver­sa­tion started; it is not an exhaus­tive account of the approaches we have tried.

1. The Conjecture

Con­jec­ture 1. Sup­pose {X,Y} are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion {\rho}. Then, for any {U,V} sat­is­fy­ing {U-X-Y-V}, the fol­low­ing inequal­ity holds:

\displaystyle 2^{-2I(Y;U)} 2^{-2I(X;V|U)} \geq (1-\rho^2)+ \rho^2 2^{-2I(X;U)} 2^{-2I(Y;V|U)} . \ \ \ \ \ (1)

2. Par­tial Results

There are sev­eral par­tial results which sug­gest the valid­ity of Con­jec­ture 1. More­over, numer­i­cal exper­i­ments have not pro­duced a counterexample.

Con­jec­ture 1 extends the fol­low­ing well-known con­se­quence of the con­di­tional entropy power inequal­ity to include long Markov chains.

Lemma 1 (Oohama, 1997). Sup­pose {X,Y} are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion {\rho}. Then, for any {U} sat­is­fy­ing {U-X-Y}, the fol­low­ing inequal­ity holds:

\displaystyle 2^{-2 I(Y;U)} \geq 1-\rho^2+\rho^2 2^{-2I(X;U)}. \ \ \ \ \ (2)

Proof: Con­sider any {U} sat­is­fy­ing {U-X-Y}. Let {Y_u, X_u} denote the ran­dom vari­ables {X,Y} con­di­tioned on {U=u}. By Markovity and def­i­n­i­tion of {X,Y}, we have that {Y_u = \rho X_u + Z}, where {Z\sim N(0,1-\rho^2)} is inde­pen­dent of {X_u}. Hence, the con­di­tional entropy power inequal­ity implies that

\displaystyle 2^{2h(Y|U)} \geq \rho^2 2^{2h(X|U)} + 2 \pi e(1-\rho^2) = 2 \pi e \rho^2 2^{-2I(X;U)} + 2 \pi e(1-\rho^2). \ \ \ \ \ (3)

From here, the lemma eas­ily follows. \Box

Lemma 1 can be applied to prove the fol­low­ing spe­cial case of Con­jec­ture 1. This result sub­sumes most of the spe­cial cases I can think of ana­lyz­ing analytically.

Propo­si­tion 1. Sup­pose {X,Y} are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion {\rho}. If {U-X-Y} are jointly Gauss­ian and {U-X-Y-V}, then

\displaystyle 2^{-2I(Y;U)} 2^{-2I(X;V|U)} \geq (1-\rho^2)+ \rho^2 2^{-2I(X;U)} 2^{-2I(Y;V|U)}. \ \ \ \ \ (4)

Proof: With­out loss of gen­er­al­ity, we can assume that {U} has zero mean and unit vari­ance. Define {\rho_u = E[XU]}. Since {U-X-Y} are jointly Gauss­ian, we have

\displaystyle I(X;U) =\frac{1}{2}\log\frac{1}{1-\rho_u^2} \ \ \ \ \ (5)

\displaystyle I(Y;U) =\frac{1}{2}\log\frac{1}{1-\rho^2\rho_u^2}. \ \ \ \ \ (6)

Let {X_u,Y_u,V_u} denote the ran­dom vari­ables {X,Y,V} con­di­tioned on {U=u}, respec­tively. Define {\rho_{XY|u}} to be the cor­re­la­tion coef­fi­cient between {X_u} and {Y_u}. It is read­ily ver­i­fied that

\displaystyle \rho_{XY|u} = \frac{\rho\sqrt{1-\rho_u^2}}{\sqrt{1-\rho^2\rho_u^2}}, \ \ \ \ \ (7)

which does not depend on the par­tic­u­lar value of {u}. By plug­ging (5)-(7) into (4), we see that (4) is equiv­a­lent to

\displaystyle 2^{-2I(X;V|U)} \geq (1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2I(Y;V|U)}. \ \ \ \ \ (8)

For every {u}, the vari­ables {X_u,Y_u} are jointly Gauss­ian with cor­re­la­tion coef­fi­cient {\rho_{XY|u}} and {X_u-Y_u-V_u} form a Markov chain, hence Lemma 1 implies

\displaystyle 2^{-2I(X_u;V_u)} \geq (1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2I(Y_u;V_u)}. \ \ \ \ \ (9)

The desired inequal­ity (8) fol­lows by con­vex­ity of

\displaystyle \log\left[(1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2z}\right] \ \ \ \ \ (10)

as a func­tion of {z}. \Box

3. Equiv­a­lent Forms

There are many equiv­a­lent forms of Con­jec­ture 1. For exam­ple, (1) can be replaced by the sym­met­ric inequality

\displaystyle 2^{-2(I(X;V)+I(Y;U))} \geq (1-\rho^2)2^{-2I(U;V)}+ \rho^2 2^{-2(I(X;U)+I(Y;V))}. \ \ \ \ \ (11)

Alter­na­tively, we can con­sider dual forms of Con­jec­ture 1. For instance, one such form is stated as follows:

Con­jec­ture 1′. Sup­pose {X,Y} are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion {\rho}. For {\lambda\in [{1}/({1+\rho^2}),1]}, the infi­mum of

\displaystyle I(X,Y;U,V)-\lambda\Big(I(X;UV)+I(Y;UV)\Big), \ \ \ \ \ (12)

taken over all {U,V} sat­is­fy­ing {U-X-Y-V} is attained when {U,X,Y,V} are jointly Gaussian.