An Extremal Conjecture: Experimenting with Online Collaboration

I have an extremal con­jec­ture that I have been work­ing on inter­mit­tently with some col­leagues (includ­ing Jiantao Jiao, Tsachy Weiss­man, Chan­dra Nair, and Kar­tik Venkat). Despite our efforts, we have not been able to prove it. Hence, I thought I would exper­i­ment with online col­lab­o­ra­tion by offer­ing it to the broader IT community.

In order to make things inter­est­ing, we are offer­ing a $1000 prize for the first cor­rect proof of the con­jec­ture, or a $250 award for the first coun­terex­am­ple! Feel free to post your thoughts in the pub­lic com­ments. You can also email me if you have ques­tions or want to bounce some ideas off me.

Although I have no way of enforc­ing them, please abide by the fol­low­ing ground rules:

  1. If you decide to work on this con­jec­ture, please send me an email to let me know that you are doing so. As part of this exper­i­ment with online col­lab­o­ra­tion, I want to gauge how many peo­ple become involved at var­i­ous degrees.
  2. If you solve the con­jec­ture or make sig­nif­i­cant progress, please keep me informed.
  3. If you repost this con­jec­ture, or pub­lish any results, please cite this blog post appropriately.

One final dis­claimer: this post is meant to be a brief intro­duc­tion to the con­jec­ture, with a few par­tial results to get the con­ver­sa­tion started; it is not an exhaus­tive account of the approaches we have tried.

1. The Conjecture

Con­jec­ture 1. Sup­pose {X,Y} are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion {\rho}. Then, for any {U,V} sat­is­fy­ing {U-X-Y-V}, the fol­low­ing inequal­ity holds:

\displaystyle 2^{-2I(Y;U)} 2^{-2I(X;V|U)} \geq (1-\rho^2)+ \rho^2 2^{-2I(X;U)} 2^{-2I(Y;V|U)} . \ \ \ \ \ (1)

2. Par­tial Results

There are sev­eral par­tial results which sug­gest the valid­ity of Con­jec­ture 1. More­over, numer­i­cal exper­i­ments have not pro­duced a counterexample.

Con­jec­ture 1 extends the fol­low­ing well-known con­se­quence of the con­di­tional entropy power inequal­ity to include long Markov chains.

Lemma 1 (Oohama, 1997). Sup­pose {X,Y} are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion {\rho}. Then, for any {U} sat­is­fy­ing {U-X-Y}, the fol­low­ing inequal­ity holds:

\displaystyle 2^{-2 I(Y;U)} \geq 1-\rho^2+\rho^2 2^{-2I(X;U)}. \ \ \ \ \ (2)

Proof: Con­sider any {U} sat­is­fy­ing {U-X-Y}. Let {Y_u, X_u} denote the ran­dom vari­ables {X,Y} con­di­tioned on {U=u}. By Markovity and def­i­n­i­tion of {X,Y}, we have that {Y_u = \rho X_u + Z}, where {Z\sim N(0,1-\rho^2)} is inde­pen­dent of {X_u}. Hence, the con­di­tional entropy power inequal­ity implies that

\displaystyle 2^{2h(Y|U)} \geq \rho^2 2^{2h(X|U)} + 2 \pi e(1-\rho^2) = 2 \pi e \rho^2 2^{-2I(X;U)} + 2 \pi e(1-\rho^2). \ \ \ \ \ (3)

From here, the lemma eas­ily follows. \Box

Lemma 1 can be applied to prove the fol­low­ing spe­cial case of Con­jec­ture 1. This result sub­sumes most of the spe­cial cases I can think of ana­lyz­ing analytically.

Propo­si­tion 1. Sup­pose {X,Y} are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion {\rho}. If {U-X-Y} are jointly Gauss­ian and {U-X-Y-V}, then

\displaystyle 2^{-2I(Y;U)} 2^{-2I(X;V|U)} \geq (1-\rho^2)+ \rho^2 2^{-2I(X;U)} 2^{-2I(Y;V|U)}. \ \ \ \ \ (4)

Proof: With­out loss of gen­er­al­ity, we can assume that {U} has zero mean and unit vari­ance. Define {\rho_u = E[XU]}. Since {U-X-Y} are jointly Gauss­ian, we have

\displaystyle I(X;U) =\frac{1}{2}\log\frac{1}{1-\rho_u^2} \ \ \ \ \ (5)

\displaystyle I(Y;U) =\frac{1}{2}\log\frac{1}{1-\rho^2\rho_u^2}. \ \ \ \ \ (6)

Let {X_u,Y_u,V_u} denote the ran­dom vari­ables {X,Y,V} con­di­tioned on {U=u}, respec­tively. Define {\rho_{XY|u}} to be the cor­re­la­tion coef­fi­cient between {X_u} and {Y_u}. It is read­ily ver­i­fied that

\displaystyle \rho_{XY|u} = \frac{\rho\sqrt{1-\rho_u^2}}{\sqrt{1-\rho^2\rho_u^2}}, \ \ \ \ \ (7)

which does not depend on the par­tic­u­lar value of {u}. By plug­ging (5)-(7) into (4), we see that (4) is equiv­a­lent to

\displaystyle 2^{-2I(X;V|U)} \geq (1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2I(Y;V|U)}. \ \ \ \ \ (8)

For every {u}, the vari­ables {X_u,Y_u} are jointly Gauss­ian with cor­re­la­tion coef­fi­cient {\rho_{XY|u}} and {X_u-Y_u-V_u} form a Markov chain, hence Lemma 1 implies

\displaystyle 2^{-2I(X_u;V_u)} \geq (1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2I(Y_u;V_u)}. \ \ \ \ \ (9)

The desired inequal­ity (8) fol­lows by con­vex­ity of

\displaystyle \log\left[(1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2z}\right] \ \ \ \ \ (10)

as a func­tion of {z}. \Box

3. Equiv­a­lent Forms

There are many equiv­a­lent forms of Con­jec­ture 1. For exam­ple, (1) can be replaced by the sym­met­ric inequality

\displaystyle 2^{-2(I(X;V)+I(Y;U))} \geq (1-\rho^2)2^{-2I(U;V)}+ \rho^2 2^{-2(I(X;U)+I(Y;V))}. \ \ \ \ \ (11)

Alter­na­tively, we can con­sider dual forms of Con­jec­ture 1. For instance, one such form is stated as follows:

Con­jec­ture 1′. Sup­pose {X,Y} are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion {\rho}. For {\lambda\in [{1}/({1+\rho^2}),1]}, the infi­mum of

\displaystyle I(X,Y;U,V)-\lambda\Big(I(X;UV)+I(Y;UV)\Big), \ \ \ \ \ (12)

taken over all {U,V} sat­is­fy­ing {U-X-Y-V} is attained when {U,X,Y,V} are jointly Gaussian.

Nevanlinna Prize

Nom­i­na­tions of peo­ple born on or after Jan­u­ary 1, 1974

for out­stand­ing con­tri­bu­tions in Math­e­mat­i­cal Aspects of Infor­ma­tion Sci­ences including:

  1. All math­e­mat­i­cal aspects of com­puter sci­ence, includ­ing com­plex­ity the­ory, logic of pro­gram­ming lan­guages, analy­sis of algo­rithms, cryp­tog­ra­phy, com­puter vision, pat­tern recog­ni­tion, infor­ma­tion pro­cess­ing and mod­el­ling of intelligence.
  2. Sci­en­tific com­put­ing and numer­i­cal analy­sis. Com­pu­ta­tional aspects of opti­miza­tion and con­trol the­ory. Com­puter algebra.

Nom­i­na­tion Procedure: http://www.mathunion.org/general/prizes/nevanlinna/details/

 

Information and Inference (new journal)

The first issue of Infor­ma­tion and Infer­ence has just appeared:

http://imaiai.oxfordjournals.org/content/current

It includes the fol­low­ing editorial:

In recent years, a great deal of energy and tal­ent have been devoted to new research prob­lems aris­ing from our era of abun­dant and var­ied data/information. These efforts have com­bined advanced meth­ods drawn from across the spec­trum of estab­lished aca­d­e­mic dis­ci­plines: dis­crete and applied math­e­mat­ics, com­puter sci­ence, the­o­ret­i­cal sta­tis­tics, physics, engi­neer­ing, biol­ogy and even finance. This new jour­nal is designed to serve as a meet­ing place for ideas con­nect­ing the the­ory and appli­ca­tion of infor­ma­tion and infer­ence from across these disciplines.

While the fron­tiers of research involv­ing infor­ma­tion and infer­ence are dynamic, we are cur­rently plan­ning to pub­lish in infor­ma­tion the­ory, sta­tis­ti­cal infer­ence, net­work analy­sis, numer­i­cal analy­sis, learn­ing the­ory, applied and com­pu­ta­tional har­monic analy­sis, prob­a­bil­ity, com­bi­na­torics, sig­nal and image pro­cess­ing, and high-dimensional geom­e­try; we also encour­age papers not fit­ting the above descrip­tion, but which expose novel prob­lems, inno­v­a­tive data types, sur­pris­ing con­nec­tions between dis­ci­plines and alter­na­tive approaches to infer­ence. This first issue exem­pli­fies this top­i­cal diver­sity of the sub­ject mat­ter, linked by the use of sophis­ti­cated math­e­mat­i­cal mod­el­ling, tech­niques of analy­sis, and focus on timely applications.

To enhance the impact of each man­u­script, authors are encour­aged to pro­vide soft­ware to illus– trate their algo­rithm and where pos­si­ble repli­cate the exper­i­ments pre­sented in their man­u­scripts. Man­u­scripts with accom­pa­ny­ing soft­ware are marked as “repro­ducible” and have the soft­ware linked on the jour­nal web­site under sup­ple­men­tary mate­r­ial. It is with plea­sure that we wel­come the scien– tific com­mu­nity to this new pub­li­ca­tion venue.

Robert Calder­bank David L. Donoho John Shawe-Taylor Jared Tanner

Comparing Variability of Random Variables

Con­sider exchange­able ran­dom vari­ables {X_1, \ldots, X_n, \ldots}. A cou­ple of facts seem quite intuitive:

State­ment 1. The “vari­abil­ity” of sam­ple mean {S_m = \frac{1}{m} \sum_{i=1}^{m} X_i} decreases with {m}.

State­ment 2. Let the aver­age of func­tions {f_1, f_2, \ldots, f_n} be defined as {\overline{f} (x) := \frac{1}{n} \sum_{i=1}^{n} f_i(x)}. Then {\max_{1\leq i \leq n} \overline{f}(X_i)} is less “vari­able” than {\max_{1\leq i \leq n} f_i (X_i)}.

 

To make these state­ments pre­cise, one faces the fun­da­men­tal ques­tion of com­par­ing two ran­dom vari­ables {W} and {Z} (or more pre­cisely com­par­ing two dis­tri­b­u­tions). One com­mon way we think of order­ing ran­dom vari­ables is the notion of sto­chas­tic dominance:

\displaystyle W \leq_{st} Z \Leftrightarrow F_W(t) \geq F_Z(t) \ \ \ \mbox{ for all real } t.

How­ever, this notion really is only a suit­able notion when one is con­cerned with the actual size of the ran­dom quan­ti­ties of inter­est, while, in our sce­nario of inter­est, a more nat­ural order would be that which com­pares the vari­abil­ity between two ran­dom vari­ables (or more pre­cisely, again, the two dis­tri­b­u­tions). It turns out that a very use­ful notion, used in a vari­ety of fields, is due to Ross (1983): Ran­dom vari­able {W} is said to be sto­chas­ti­cally less vari­able than ran­dom vari­able {Z} (denoted by {\leq_v}) when every risk-averse deci­sion maker will choose {W} over {Z} (given they have sim­i­lar means). More pre­cisely, for ran­dom vari­ables {W} and {Z} with finite means

\displaystyle W \leq_{v} Z \Leftrightarrow \mathbb{E}[f(X)] \leq \mathbb{E}[f(Y)] \ \ \mbox{ for increasing and convex function } f \in \mathcal{F}

where {\mathcal{F}} is the set of func­tions for which the above expec­ta­tions exist.

One inter­est­ing, but per­haps not entirely obvi­ous, fact is that this notion of order­ing {W\leq_v Z} is equiv­a­lent to say­ing that there is a sequence of mean pre­serv­ing spreads that in the limit trans­forms the dis­tri­b­u­tion of {W} into the dis­tri­b­u­tion of another ran­dom vari­able {W'} with finite mean such that {W'\leq_{st} Z}! Also, using results by Hardy, Lit­tle­wood and Polya (1929), the sto­chas­tic vari­abil­ity order intro­duced above can be shown to be equiv­a­lent to Lorenz (1905) order­ing used in eco­nom­ics to mea­sure income equality.

Now with this, we are ready to for­mal­ize our pre­vi­ous state­ments. The first state­ment is actu­ally due to Arnold and Vil­lasenor (1986):

\displaystyle \frac{1}{m} \sum_{i=1}^{m} X_i \leq_v \frac{1}{m-1} \sum_{i=1}^{m-1} X_i \ \ \ \ \ \ \ \ \ \ \ \ \mbox{for all }\ \ m \in \mathbb{N}.

Note that when you apply this fact to a sequence of iid ran­dom vari­ables with finite mean {\mu}, it strength­ens the strong law of large num­ber in that it ensures that the almost sure con­ver­gence of the sam­ple mean to the mean value {\mu} occurs with monot­o­n­i­cally decreas­ing vari­abil­ity (as the sam­ple size grows).

The sec­ond state­ment comes up in prov­ing cer­tain opti­mal­ity result in shar­ing par­al­lel servers in fork-join queue­ing sys­tems (J. 2008) and has a sim­i­lar flavor:

\displaystyle \max_{1\leq i \leq n} \overline{f}(X_i) \leq_v \max_{1\leq i \leq n} f_i (X_i).

The clean­est way to prove both state­ments, to the best of my knowl­edge, is based on the fol­low­ing the­o­rem first proved by Black­well in 1953 (later strength­ened to ran­dom ele­ments in sep­a­ra­ble Banach spaces by Strassen in 1965, hence referred to by some as Strassen’s theorem):

The­o­rem 1 Let {W} and {Z} be two ran­dom vari­ables with finite means. A nec­es­sary and suf­fi­cient con­di­tion for {W \leq_v Z} is that there are two ran­dom vari­ables {\hat{W}} and {\hat{Z}} with the same mar­gin­als as {W} and {Z}, respec­tively, such that {\mathbb{E}[\hat{Z} |\hat{W}] \geq \hat{W}} almost surely.

For instance, to prove the first state­ment we con­sider {\hat{W} = W = \frac{1}{n} \sum_{i=1}^n X_i} and {Z = \frac{1}{n-1} \sum_{i=1}^{n-1} X_i}. All that is nec­es­sary now is to note that {\hat{Z} : = \frac{1}{n-1} \sum_{i\in I, i \neq J} X_i}, {J} is an inde­pen­dent uni­form rv on the set {I := \{1,2, \ldots, n\}}, has the same dis­tri­b­u­tion as ran­dom vari­able {Z}. Fur­ther­more,

\displaystyle \mathbb{E} [ \hat{Z} | W ] = \mathbb{E} [ \frac{1}{n} \sum_{J=1}^{n} (\frac{1}{n-1} \sum_{i\in I, i \neq J} X_i ) | W ] = \mathbb{E} [ \frac{1}{n} \sum_{j=1}^{n} X_j | W ] = W.

Sim­i­larly to prove the sec­ond state­ment, one can con­struct {\hat{Z}} by select­ing a ran­dom per­mu­ta­tion of func­tions {f_1, \ldots, f_n}.