An Inequality Involving a Product of Mutual Informations

Many inequalities in information theory have entropies or mutual informations appearing as linear additive terms (e.g. Shannon-type inequalities). It’s also not uncommon to see entropies appearing as exponents (e.g. entropy power inequality). But perhaps it’s not immediately seen how products of mutual informations make sense.

Recently I have encountered an inequality involving a product of mutual informations, which I cannot find a good way to prove (or disprove, though some numerics and asymptotic analysis seem to suggest its validity). I would much appreciate it if someone could be smart and gracious enough to provide a proof, or counter example, or generalization.

The formulation is quite simple: suppose X,Y are binary random variables with  \displaystyle P(X=0)=P(Y=0)=1/2, and Z is another random variable such that X-Y-Z forms a Markov chain. The claim is that I(X;Z)\ge I(X;Y)I(Y;Z). (Note that the left side is also upper bounded by either one of the two factors on the right side, by the well-known data processing  inequality.

At first glance this inequality seems absurd because different units appear on the two sides; but this may be resolved by considering that Y has only one bit of information.

For a given joint distribution of X and Y, the equality can be achieved when Z is an injective function of Y.

———————————-

Update: the inequality has been proved, thanks very much for the help by Sudeep Kamath (see comments). An alternative proof (which avoids computing the monotonicity of  f(r) = D(r*\epsilon||1/2)/D(r||1/2) = (1-h(r*\epsilon))/(1-h(r))) is sketched as follows: by convexity of the function h(\epsilon*h^{-1}(x)) (see for example, proof of Mrs Gerbers Lemma), we have H(X|Z=z)=h(\epsilon*h^{-1}(H(Y|Z=z)))\le H(Y|Z=z)+(1-H(Y|Z=z))h(\epsilon) for any z, where \epsilon:=Prob(X\neq Y). Averaging over z under P_Z gives H(X|Z)\le H(Y|Z)+(1-H(Y|Z))h(\epsilon), and the claim follows after rearrangements.