An Inequality Involving a Product of Mutual Informations

Many inequalities in information theory have entropies or mutual informations appearing as linear additive terms (e.g. Shannon-type inequalities). It’s also not uncommon to see entropies appearing as exponents (e.g. entropy power inequality). But perhaps it’s not immediately seen how products of mutual informations make sense.

Recently I have encountered an inequality involving a product of mutual informations, which I cannot find a good way to prove (or disprove, though some numerics and asymptotic analysis seem to suggest its validity). I would much appreciate it if someone could be smart and gracious enough to provide a proof, or counter example, or generalization.

The formulation is quite simple: suppose X,Y are binary random variables with  \displaystyle P(X=0)=P(Y=0)=1/2, and Z is another random variable such that X-Y-Z forms a Markov chain. The claim is that I(X;Z)\ge I(X;Y)I(Y;Z). (Note that the left side is also upper bounded by either one of the two factors on the right side, by the well-known data processing  inequality.

At first glance this inequality seems absurd because different units appear on the two sides; but this may be resolved by considering that Y has only one bit of information.

For a given joint distribution of X and Y, the equality can be achieved when Z is an injective function of Y.


Update: the inequality has been proved, thanks very much for the help by Sudeep Kamath (see comments). An alternative proof (which avoids computing the monotonicity of  f(r) = D(r*\epsilon||1/2)/D(r||1/2) = (1-h(r*\epsilon))/(1-h(r))) is sketched as follows: by convexity of the function h(\epsilon*h^{-1}(x)) (see for example, proof of Mrs Gerbers Lemma), we have H(X|Z=z)=h(\epsilon*h^{-1}(H(Y|Z=z)))\le H(Y|Z=z)+(1-H(Y|Z=z))h(\epsilon) for any z, where \epsilon:=Prob(X\neq Y). Averaging over z under P_Z gives H(X|Z)\le H(Y|Z)+(1-H(Y|Z))h(\epsilon), and the claim follows after rearrangements.

10 thoughts on “An Inequality Involving a Product of Mutual Informations

  1. I have recently generalized this beautiful inequality to the following case: $Y$ is uniform Bernoulli random variable and the channel from $Y$ to $X$ is binary-input symmetric output (BISO).

    This generalized inequality has a nice interpretation in the context of privacy. See or message me for the proof.

  2. ignoramus et ignorabimus: scalar codomain see inductive ensembles showing derivative continuity X Y Z = Variables ; I = Z;(Z=Ix2) IZ=? is arbitrary XY(P)PDQ

  3. Thanks Jingbo,

    If this fails for non-uniform Y, how can it hold for non-uniform X? (can we have non-uniform X and uniform Y at the same time?)
    I am trying to get a similar inequality for non-uniform non-binary input, but so far not a success!

    • Hi Shahab,
      Thanks for your interest. Regarding to my previous comment, I should just say ‘keep P_{X|Y} to be symmetric’ and ‘keep P_{Y|X} to be symmetric’ in the two cases. In both cases, both X and Y are non-uniform.

      Here is a recent paper by Venkat, Amin, Sudeep, and Chandra dealing with a related problem: In this paper (3) is shown to be an upper bound of the ratio of mutual informations. I believe a lower bound can also be obtained by just replacing ‘sup’ in (3) with ‘inf’, and the resulting lower bound will be tight, using similar arguments. In other words, finding the range of the ratio of mutual informations is an equivalent problem to finding the range of the ratio of divergences in (3). In the uniform case our inequality can be proved this way, as shown in Sudeep’s comment. I think results in this paper will be very helpful for your generalizations.

  4. Recently, I came across an inequality involving products of entropies that I thought I would mention given the topic of this post. As in the original post, let X be a uniform binary random variable, and let Y be the result of passing X through a binary symmetric channel with crossover probability \alpha. Now, let (X^n,Y^n) be n i.i.d. copies of (X,Y). Define the function b_k(x^n) \mapsto \{0,1\} as follows: b_k(x^n) = 1 if the integer representation of the binary n-tuple x^n is less than k, and let b_k(x^n) =0 otherwise.

    In a recent paper ( with Gowtham Kumar, we have conjectured that

    H(b_k(X^n)|Y^n) \geq h_2(\alpha)h_2(k/2^n),

    where h_2(p) denotes the binary entropy of p. Note that the inequality depends only on the ratio k/2^n, rather than the pair (k,n). In the paper, we have an algorithm capable of verifying the conjectured inequality — and hence permits a computer-aided proof of the desired inequality — for any given ratio k/2^n \in [0,1]. (The algorithm requires a sequence of steps which, while simple, are too tedious to do by hand.) It appears very challenging to find a “closed form” proof since the quantity H(b_k(X^n)|Y^n) as a function of k/2^n is pathological in nature.

  5. Varun Jog and I were able to prove your claim. I will sketch the proof here.

    I use notation a*b = a(1-b) + b(1-a) and D(p||q) = p\log(p/q) + (1-p)\log((1-p)/(1-q)). Let the BSC from X to Y have error probability \epsilon.

    Let P(Z=i) = w_i and P(Y=1|Z=i) = r_i. Then, I(Y;Z) = \sum_i w_i D(r_i||1/2) and I(X;Z) = \sum_i w_i D(r_i*\epsilon ||1/2). Thus, to prove your claim, it suffices to show that D(r*\epsilon||1/2)/D(r||1/2) \geq 1-h(\epsilon) for all r in [0,1]. This is true and equality holds at r=0,1. I find this quite interesting because the left hand side does not depend on the base of the logarithm used but the right hand side requires you to use logarithms to base 2.

    We let f(r) = D(r*\epsilon||1/2)/D(r||1/2) = (1-h(r*\epsilon))/(1-h(r)). We define f(1/2) by the limit as (1-2\epsilon)^2. We compute the derivative f'(r) and show that f'(r) is positive over [0,1/2) and negative over (1/2,1], thus proving the claim. To do this, we define u = 1-2r, \delta = 1-2\epsilon, expand the h(.) and \log functions as Taylor series in u and \delta. Do contact me by e-mail if I was not clear or if I was mistaken somewhere above.


    • I think your proof is correct. Very nice work!
      Originally I also guessed that I(X;Z)<=I(Y;Z)(1–2\epsilon)^2 holds although I didn't put this part in the above post. But now I see this is also correct by your argument.
      Thanks again and congratulations!

    • The inequality does not hold for non-uniform X inputs (keeping P_{Y|X} to be that of a BSC(\epsilon)). A counter-example may be: P(X=0)=0.677, \epsilon=0.403, P(Z=0|Y=0)=0, P(Z=0|Y=1)=0.001, where Z is binary. Then \frac{I(X;Z)}{I(Y;Z)}\approx 0.0238 wheras I(X;Y)\approx 0.0239. It’s easy to miss such counterexamples with an incomplete search. The inequality also fails with a nonuniform Y input, (keeping P_{X|Y} to be that of a BSC), according to my numerical study.

    • Jingbo,

      Thanks for the nice counterexample you gave,
      it helped me a lot.

      It has taken a couple of my days to find such a counterexample,

      If I have your email address, I will let you know should I get any results in this regard,

Leave a Reply

Your email address will not be published.