Update (4/3/2014): I believe I have solved the conjecture, and proven it to be correct. I will make a preprint available shortly(see link below). The original blog post remains available below. — Tom

Preprint available here: “An extremal inequality for long Markov chains“.

I have an extremal conjecture that I have been working on intermittently with some colleagues (including Jiantao Jiao, Tsachy Weissman, Chandra Nair, and Kartik Venkat). Despite our efforts, we have not been able to prove it. Hence, I thought I would experiment with online collaboration by offering it to the broader IT community.

In order to make things interesting, we are offering a $1000 prize for the first correct proof or counterexample! Feel free to post your thoughts in the public comments. You can also email me if you have questions or want to bounce some ideas off me.

Although I have no way of enforcing them, please abide by the following ground rules:

If you decide to work on this conjecture, please send me an email to let me know that you are doing so. As part of this experiment with online collaboration, I want to gauge how many people become involved at various degrees.
If you solve the conjecture or make significant progress, please keep me informed.
If you repost this conjecture, or publish any results, please cite this blog post appropriately.

One final disclaimer: this post is meant to be a brief introduction to the conjecture, with a few partial results to get the conversation started; it is not an exhaustive account of the approaches we have tried.

1. The Conjecture

Conjecture 1. Suppose ${X,Y}$ are jointly Gaussian, each with unit variance and correlation ${\rho}$ . Then, for any ${U,V}$ satisfying ${U-X-Y-V}$ , the following inequality holds:

$\displaystyle 2^{-2I(Y;U)} 2^{-2I(X;V|U)} \geq (1-\rho^2)+ \rho^2 2^{-2I(X;U)} 2^{-2I(Y;V|U)} . \ \ \ \ \ (1)$

2. Partial Results

There are several partial results which suggest the validity of Conjecture 1. Moreover, numerical experiments have not produced a counterexample.

Conjecture 1 extends the following well-known consequence of the conditional entropy power inequality to include long Markov chains.

Lemma 1 (Oohama, 1997). Suppose ${X,Y}$ are jointly Gaussian, each with unit variance and correlation ${\rho}$ . Then, for any ${U}$ satisfying ${U-X-Y}$ , the following inequality holds:

$\displaystyle 2^{-2 I(Y;U)} \geq 1-\rho^2+\rho^2 2^{-2I(X;U)}. \ \ \ \ \ (2)$

Proof: Consider any ${U}$ satisfying ${U-X-Y}$ . Let ${Y_u, X_u}$ denote the random variables ${X,Y}$ conditioned on ${U=u}$ . By Markovity and definition of ${X,Y}$ , we have that ${Y_u = \rho X_u + Z}$ , where ${Z\sim N(0,1-\rho^2)}$ is independent of ${X_u}$ . Hence, the conditional entropy power inequality implies that

$\displaystyle 2^{2h(Y|U)} \geq \rho^2 2^{2h(X|U)} + 2 \pi e(1-\rho^2) = 2 \pi e \rho^2 2^{-2I(X;U)} + 2 \pi e(1-\rho^2). \ \ \ \ \ (3)$

From here, the lemma easily follows. $\Box$

Lemma 1 can be applied to prove the following special case of Conjecture 1. This result subsumes most of the special cases I can think of analyzing analytically.

Proposition 1. Suppose ${X,Y}$ are jointly Gaussian, each with unit variance and correlation ${\rho}$ . If ${U-X-Y}$ are jointly Gaussian and ${U-X-Y-V}$ , then

$\displaystyle 2^{-2I(Y;U)} 2^{-2I(X;V|U)} \geq (1-\rho^2)+ \rho^2 2^{-2I(X;U)} 2^{-2I(Y;V|U)}. \ \ \ \ \ (4)$

Proof: Without loss of generality, we can assume that ${U}$ has zero mean and unit variance. Define ${\rho_u = E[XU]}$ . Since ${U-X-Y}$ are jointly Gaussian, we have

$\displaystyle I(X;U) =\frac{1}{2}\log\frac{1}{1-\rho_u^2} \ \ \ \ \ (5)$

$\displaystyle I(Y;U) =\frac{1}{2}\log\frac{1}{1-\rho^2\rho_u^2}. \ \ \ \ \ (6)$

Let ${X_u,Y_u,V_u}$ denote the random variables ${X,Y,V}$ conditioned on ${U=u}$ , respectively. Define ${\rho_{XY|u}}$ to be the correlation coefficient between ${X_u}$ and ${Y_u}$ . It is readily verified that

$\displaystyle \rho_{XY|u} = \frac{\rho\sqrt{1-\rho_u^2}}{\sqrt{1-\rho^2\rho_u^2}}, \ \ \ \ \ (7)$

which does not depend on the particular value of ${u}$ . By plugging (5)–(7) into (4), we see that (4) is equivalent to

$\displaystyle 2^{-2I(X;V|U)} \geq (1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2I(Y;V|U)}. \ \ \ \ \ (8)$

For every ${u}$ , the variables ${X_u,Y_u}$ are jointly Gaussian with correlation coefficient ${\rho_{XY|u}}$ and ${X_u-Y_u-V_u}$ form a Markov chain, hence Lemma 1 implies

$\displaystyle 2^{-2I(X_u;V_u)} \geq (1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2I(Y_u;V_u)}. \ \ \ \ \ (9)$

The desired inequality (8) follows by convexity of

$\displaystyle \log\left[(1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2z}\right] \ \ \ \ \ (10)$

as a function of ${z}$ . $\Box$

3. Equivalent Forms

There are many equivalent forms of Conjecture 1. For example, (1) can be replaced by the symmetric inequality

$\displaystyle 2^{-2(I(X;V)+I(Y;U))} \geq (1-\rho^2)2^{-2I(U;V)}+ \rho^2 2^{-2(I(X;U)+I(Y;V))}. \ \ \ \ \ (11)$

Alternatively, we can consider dual forms of Conjecture 1. For instance, one such form is stated as follows:

Conjecture 1′. Suppose ${X,Y}$ are jointly Gaussian, each with unit variance and correlation ${\rho}$ . For ${\lambda\in [{1}/({1+\rho^2}),1]}$ , the infimum of

$\displaystyle I(X,Y;U,V)-\lambda\Big(I(X;UV)+I(Y;UV)\Big), \ \ \ \ \ (12)$

taken over all ${U,V}$ satisfying ${U-X-Y-V}$ is attained when ${U,X,Y,V}$ are jointly Gaussian.

19 thoughts on “An Extremal Conjecture: Experimenting with Online Collaboration”

This example counters this conjecture.
U=2*U1 ;
X1=2*(U1+U2+ …+U20) ;
X=(X1-mean(X1))/std(X1);
Y=0.1*X+W1;
V=0.9*Y+0.1*W2;

where Ui(0,1) is uniform, (i=1,2, … ,20), W1~N(0,0.99), and W2~N(0,1).
I got the following numerical results for this setting using 10^7 samples.
H(U,V)= 81.4570,
H(Y,U)= 11.8206,
H(X,V)= 69.5232,
H(X,U)= 11.9757,
H(Y,V)= 7.4803,

Reply ↓

Just curious, what would the implications be if this conjecture is true?

Reply ↓

Mohammad Samavat on April 7, 2014 at 6:09 pm said:

Nice question, I would like to know too!

Has this conjecture been solved or have there been any further insights? Just curious as I’ve been trying to tackle this problem for a while.

Reply ↓

Thomas Courtade on March 3, 2014 at 11:18 am said:

No, it is not solved yet. However, I recently gave a talk on the conjecture at the 2014 International Zurich Seminar on Communications. The paper should be available on my website shortly.

[I have received several emails inquiring about the status of the conjecture, so I have added a comment at the top of the post to advertise the status. I will update it if the status changes.]
Josef Ziegler on March 3, 2014 at 4:45 pm said:

Oh, what a coincidence, I happen to study there. Thanks for the quick reply!

Does “UV” in Conjecture 1′ stands for the product of U and V?

Reply ↓

Thomas Courtade on January 15, 2014 at 11:36 am said:

No, it denotes the pair of random variables (U,V).

In the other puzzle (http://arxiv.org/pdf/1302.2512v2.pdf), Isn’t 1 − H(α) channel capacity? and hence mutual information between f(x) and y is less than the capacity for all functions?

Reply ↓

AJ on December 7, 2013 at 1:47 pm said:

$I(f(x),y) > max I(x,y)=1-H(alpha)$ seems to be impossible.
Thomas Courtade on January 15, 2014 at 11:34 am said:

You have to be careful here. The problem you are talking about considers mutual information of the form:

I( b(X^n) ; Y^n ), where (X^n,Y^n) are n i.i.d. pairs of correlated Bernoulli(1/2) random variables. In this case, the data processing inequality looking at the channel X^n –> Y^n (which is what you are driving at with your channel capacity statement) simply gives:

I( b(X^n) ; Y^n ) <= n*( 1-H(\alpha) ),

which is off by a factor of n. Clearly this is no good, since we can trivially upper bound:

I( b(X^n) ; Y^n ) <= 1.

Appealing to a so-called 'strong data processing' inequality improves things, but still only gives:

I( b(X^n) ; Y^n ) <= (1-2\alpha)^2,

which is still weaker than the conjecture.

Hello dear friends,
I saw this post some months ago, but because I was busy, I did not say anything about it. If it is not solved yet, so I will be so delighted to work on this Conjecture.

Reply ↓

Gowtham kumar on September 20, 2013 at 10:46 pm said:

The conjecture is not solved yet. That’s what Tom Courtade told me today. Good luck!

What do you mean by “satisfying U – X – Y – V”. Is this a typo that’s supposed to say something like “satisfying U – X = Y – V”, or is there some other notation that I’m not familiar with?

Reply ↓

Thomas Courtade on July 17, 2013 at 2:58 pm said:

The notation U-X-Y-V means that U,X,Y,V form a Markov chain, in that order. In other words, we can factor the joint distribution as p(x,y,u,v) = p(u|x)p(v|y)p(x,y).

I read your post by chance. I will have a try.

Reply ↓

I think you are undervaluing a good counter example!

Reply ↓

Thomas Courtade on April 27, 2013 at 9:34 pm said:

Perhaps. The prizes reflect our feeling that a counterexample would require one appropriately normalized unit of effort, while a proof would require four units. Hence, dollars earned per unit effort should be constant regardless of outcome 🙂

I want to think on this

Reply ↓

Information Theory b-log

An Extremal Conjecture: Experimenting with Online Collaboration

Update (4/3/2014): I believe I have solved the conjecture, and proven it to be correct. I will make a preprint available shortly(see link below). The original blog post remains available below. — Tom

19 thoughts on “An Extremal Conjecture: Experimenting with Online Collaboration”

Leave a Reply Cancel reply