Many inequalities in information theory have entropies or mutual informations appearing as linear additive terms (e.g. Shannon-type inequalities). It’s also not uncommon to see entropies appearing as exponents (e.g. entropy power inequality). But perhaps it’s not immediately seen how products of mutual informations make sense.
Recently I have encountered an inequality involving a product of mutual informations, which I cannot find a good way to prove (or disprove, though some numerics and asymptotic analysis seem to suggest its validity). I would much appreciate it if someone could be smart and gracious enough to provide a proof, or counter example, or generalization.
The formulation is quite simple: suppose are binary random variables with
, and
is another random variable such that
forms a Markov chain. The claim is that
. (Note that the left side is also upper bounded by either one of the two factors on the right side, by the well-known data processing inequality.
At first glance this inequality seems absurd because different units appear on the two sides; but this may be resolved by considering that has only one bit of information.
For a given joint distribution of and
, the equality can be achieved when
is an injective function of
.
———————————-
Update: the inequality has been proved, thanks very much for the help by Sudeep Kamath (see comments). An alternative proof (which avoids computing the monotonicity of is sketched as follows: by convexity of the function
(see for example, proof of Mrs Gerbers Lemma), we have
for any
, where
. Averaging over
under
gives
, and the claim follows after rearrangements.