Many inequalities in information theory have entropies or mutual informations appearing as linear additive terms (e.g. Shannon-type inequalities). It’s also not uncommon to see entropies appearing as exponents (e.g. entropy power inequality). But perhaps it’s not immediately seen how products of mutual informations make sense.

Recently I have encountered an inequality involving a product of mutual informations, which I cannot find a good way to prove (or disprove, though some numerics and asymptotic analysis seem to suggest its validity). I would much appreciate it if someone could be smart and gracious enough to provide a proof, or counter example, or generalization.

The formulation is quite simple: suppose are binary random variables with , and is another random variable such that forms a Markov chain. The claim is that . (Note that the left side is also upper bounded by either one of the two factors on the right side, by the well-known data processing inequality.

At first glance this inequality seems absurd because different units appear on the two sides; but this may be resolved by considering that has only one bit of information.

For a given joint distribution of and , the equality can be achieved when is an injective function of .

———————————-

Update: the inequality has been proved, thanks very much for the help by Sudeep Kamath (see comments). An alternative proof (which avoids computing the monotonicity of is sketched as follows: by convexity of the function (see for example, proof of Mrs Gerbers Lemma), we have for any , where . Averaging over under gives , and the claim follows after rearrangements.

ignoramus et ignorabimus: scalar codomain see inductive ensembles showing derivative continuity X Y Z = Variables ; I = Z;(Z=Ix2) IZ=? is arbitrary XY(P)PDQ

Thanks Jingbo,

If this fails for non-uniform Y, how can it hold for non-uniform X? (can we have non-uniform X and uniform Y at the same time?)

I am trying to get a similar inequality for non-uniform non-binary input, but so far not a success!

Hi Shahab,

Thanks for your interest. Regarding to my previous comment, I should just say ‘keep P_{X|Y} to be symmetric’ and ‘keep P_{Y|X} to be symmetric’ in the two cases. In both cases, both X and Y are non-uniform.

Here is a recent paper by Venkat, Amin, Sudeep, and Chandra dealing with a related problem: http://arxiv.org/pdf/1304.6133v1.pdf. In this paper (3) is shown to be an upper bound of the ratio of mutual informations. I believe a lower bound can also be obtained by just replacing ‘sup’ in (3) with ‘inf’, and the resulting lower bound will be tight, using similar arguments. In other words, finding the range of the ratio of mutual informations is an equivalent problem to finding the range of the ratio of divergences in (3). In the uniform case our inequality can be proved this way, as shown in Sudeep’s comment. I think results in this paper will be very helpful for your generalizations.

Recently, I came across an inequality involving products of entropies that I thought I would mention given the topic of this post. As in the original post, let be a uniform binary random variable, and let be the result of passing through a binary symmetric channel with crossover probability . Now, let be i.i.d. copies of . Define the function as follows: if the integer representation of the binary n-tuple is less than , and let otherwise.

In a recent paper (http://arxiv.org/abs/1302.2512) with Gowtham Kumar, we have conjectured that

,

where denotes the binary entropy of . Note that the inequality depends only on the ratio , rather than the pair . In the paper, we have an algorithm capable of verifying the conjectured inequality — and hence permits a computer-aided proof of the desired inequality — for any given ratio . (The algorithm requires a sequence of steps which, while simple, are too tedious to do by hand.) It appears very challenging to find a “closed form” proof since the quantity as a function of is pathological in nature.

Varun Jog and I were able to prove your claim. I will sketch the proof here.

I use notation a*b = a(1-b) + b(1-a) and D(p||q) = p\log(p/q) + (1-p)\log((1-p)/(1-q)). Let the BSC from X to Y have error probability \epsilon.

Let P(Z=i) = w_i and P(Y=1|Z=i) = r_i. Then, I(Y;Z) = \sum_i w_i D(r_i||1/2) and I(X;Z) = \sum_i w_i D(r_i*\epsilon ||1/2). Thus, to prove your claim, it suffices to show that D(r*\epsilon||1/2)/D(r||1/2) \geq 1-h(\epsilon) for all r in [0,1]. This is true and equality holds at r=0,1. I find this quite interesting because the left hand side does not depend on the base of the logarithm used but the right hand side requires you to use logarithms to base 2.

We let f(r) = D(r*\epsilon||1/2)/D(r||1/2) = (1-h(r*\epsilon))/(1-h(r)). We define f(1/2) by the limit as (1-2\epsilon)^2. We compute the derivative f'(r) and show that f'(r) is positive over [0,1/2) and negative over (1/2,1], thus proving the claim. To do this, we define u = 1-2r, \delta = 1-2\epsilon, expand the h(.) and \log functions as Taylor series in u and \delta. Do contact me by e-mail if I was not clear or if I was mistaken somewhere above.

Thanks.

Sudeep

I think your proof is correct. Very nice work!

Originally I also guessed that I(X;Z)<=I(Y;Z)(1–2\epsilon)^2 holds although I didn't put this part in the above post. But now I see this is also correct by your argument.

Thanks again and congratulations!

Does this hold for non-uniform binary inputs?

The inequality does not hold for non-uniform inputs (keeping to be that of a BSC(). A counter-example may be: , , , , where is binary. Then wheras . It’s easy to miss such counterexamples with an incomplete search. The inequality also fails with a nonuniform input, (keeping to be that of a BSC), according to my numerical study.

Jingbo,

Thanks for the nice counterexample you gave,

it helped me a lot.

It has taken a couple of my days to find such a counterexample,

If I have your email address, I will let you know should I get any results in this regard,