# An Extremal Conjecture: Experimenting with Online Collaboration

I have an extremal con­jec­ture that I have been work­ing on inter­mit­tently with some col­leagues (includ­ing Jiantao Jiao, Tsachy Weiss­man, Chan­dra Nair, and Kar­tik Venkat). Despite our efforts, we have not been able to prove it. Hence, I thought I would exper­i­ment with online col­lab­o­ra­tion by offer­ing it to the broader IT community.

In order to make things inter­est­ing, we are offer­ing a $1000 prize for the first cor­rect proof of the con­jec­ture, or a$250 award for the first coun­terex­am­ple! Feel free to post your thoughts in the pub­lic com­ments. You can also email me if you have ques­tions or want to bounce some ideas off me.

Although I have no way of enforc­ing them, please abide by the fol­low­ing ground rules:

1. If you decide to work on this con­jec­ture, please send me an email to let me know that you are doing so. As part of this exper­i­ment with online col­lab­o­ra­tion, I want to gauge how many peo­ple become involved at var­i­ous degrees.
2. If you solve the con­jec­ture or make sig­nif­i­cant progress, please keep me informed.
3. If you repost this con­jec­ture, or pub­lish any results, please cite this blog post appropriately.

One final dis­claimer: this post is meant to be a brief intro­duc­tion to the con­jec­ture, with a few par­tial results to get the con­ver­sa­tion started; it is not an exhaus­tive account of the approaches we have tried.

1. The Conjecture

Con­jec­ture 1. Sup­pose ${X,Y}$ are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion ${\rho}$. Then, for any ${U,V}$ sat­is­fy­ing ${U-X-Y-V}$, the fol­low­ing inequal­ity holds:

$\displaystyle 2^{-2I(Y;U)} 2^{-2I(X;V|U)} \geq (1-\rho^2)+ \rho^2 2^{-2I(X;U)} 2^{-2I(Y;V|U)} . \ \ \ \ \ (1)$

2. Par­tial Results

There are sev­eral par­tial results which sug­gest the valid­ity of Con­jec­ture 1. More­over, numer­i­cal exper­i­ments have not pro­duced a counterexample.

Con­jec­ture 1 extends the fol­low­ing well-known con­se­quence of the con­di­tional entropy power inequal­ity to include long Markov chains.

Lemma 1 (Oohama, 1997). Sup­pose ${X,Y}$ are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion ${\rho}$. Then, for any ${U}$ sat­is­fy­ing ${U-X-Y}$, the fol­low­ing inequal­ity holds:

$\displaystyle 2^{-2 I(Y;U)} \geq 1-\rho^2+\rho^2 2^{-2I(X;U)}. \ \ \ \ \ (2)$

Proof: Con­sider any ${U}$ sat­is­fy­ing ${U-X-Y}$. Let ${Y_u, X_u}$ denote the ran­dom vari­ables ${X,Y}$ con­di­tioned on ${U=u}$. By Markovity and def­i­n­i­tion of ${X,Y}$, we have that ${Y_u = \rho X_u + Z}$, where ${Z\sim N(0,1-\rho^2)}$ is inde­pen­dent of ${X_u}$. Hence, the con­di­tional entropy power inequal­ity implies that

$\displaystyle 2^{2h(Y|U)} \geq \rho^2 2^{2h(X|U)} + 2 \pi e(1-\rho^2) = 2 \pi e \rho^2 2^{-2I(X;U)} + 2 \pi e(1-\rho^2). \ \ \ \ \ (3)$

From here, the lemma eas­ily follows. $\Box$

Lemma 1 can be applied to prove the fol­low­ing spe­cial case of Con­jec­ture 1. This result sub­sumes most of the spe­cial cases I can think of ana­lyz­ing analytically.

Propo­si­tion 1. Sup­pose ${X,Y}$ are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion ${\rho}$. If ${U-X-Y}$ are jointly Gauss­ian and ${U-X-Y-V}$, then

$\displaystyle 2^{-2I(Y;U)} 2^{-2I(X;V|U)} \geq (1-\rho^2)+ \rho^2 2^{-2I(X;U)} 2^{-2I(Y;V|U)}. \ \ \ \ \ (4)$

Proof: With­out loss of gen­er­al­ity, we can assume that ${U}$ has zero mean and unit vari­ance. Define ${\rho_u = E[XU]}$. Since ${U-X-Y}$ are jointly Gauss­ian, we have

$\displaystyle I(X;U) =\frac{1}{2}\log\frac{1}{1-\rho_u^2} \ \ \ \ \ (5)$

$\displaystyle I(Y;U) =\frac{1}{2}\log\frac{1}{1-\rho^2\rho_u^2}. \ \ \ \ \ (6)$

Let ${X_u,Y_u,V_u}$ denote the ran­dom vari­ables ${X,Y,V}$ con­di­tioned on ${U=u}$, respec­tively. Define ${\rho_{XY|u}}$ to be the cor­re­la­tion coef­fi­cient between ${X_u}$ and ${Y_u}$. It is read­ily ver­i­fied that

$\displaystyle \rho_{XY|u} = \frac{\rho\sqrt{1-\rho_u^2}}{\sqrt{1-\rho^2\rho_u^2}}, \ \ \ \ \ (7)$

which does not depend on the par­tic­u­lar value of ${u}$. By plug­ging (5)-(7) into (4), we see that (4) is equiv­a­lent to

$\displaystyle 2^{-2I(X;V|U)} \geq (1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2I(Y;V|U)}. \ \ \ \ \ (8)$

For every ${u}$, the vari­ables ${X_u,Y_u}$ are jointly Gauss­ian with cor­re­la­tion coef­fi­cient ${\rho_{XY|u}}$ and ${X_u-Y_u-V_u}$ form a Markov chain, hence Lemma 1 implies

$\displaystyle 2^{-2I(X_u;V_u)} \geq (1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2I(Y_u;V_u)}. \ \ \ \ \ (9)$

The desired inequal­ity (8) fol­lows by con­vex­ity of

$\displaystyle \log\left[(1-\rho_{XY|u}^2)+ \rho_{XY|u}^2 2^{-2z}\right] \ \ \ \ \ (10)$

as a func­tion of ${z}$. $\Box$

3. Equiv­a­lent Forms

There are many equiv­a­lent forms of Con­jec­ture 1. For exam­ple, (1) can be replaced by the sym­met­ric inequality

$\displaystyle 2^{-2(I(X;V)+I(Y;U))} \geq (1-\rho^2)2^{-2I(U;V)}+ \rho^2 2^{-2(I(X;U)+I(Y;V))}. \ \ \ \ \ (11)$

Alter­na­tively, we can con­sider dual forms of Con­jec­ture 1. For instance, one such form is stated as follows:

Con­jec­ture 1′. Sup­pose ${X,Y}$ are jointly Gauss­ian, each with unit vari­ance and cor­re­la­tion ${\rho}$. For ${\lambda\in [{1}/({1+\rho^2}),1]}$, the infi­mum of

$\displaystyle I(X,Y;U,V)-\lambda\Big(I(X;UV)+I(Y;UV)\Big), \ \ \ \ \ (12)$

taken over all ${U,V}$ sat­is­fy­ing ${U-X-Y-V}$ is attained when ${U,X,Y,V}$ are jointly Gaussian.

# President Obama bestows 2011 National Medals of Science and Technology

includ­ing Claude E. Shan­non Award win­ner Sol Golomb

http://www.whitehouse.gov/photos-and-video/video/2013/02/01/president-obama-honors-countrys-top-innovators-and-scientists-2011

# Nevanlinna Prize

Nom­i­na­tions of peo­ple born on or after Jan­u­ary 1, 1974

for out­stand­ing con­tri­bu­tions in Math­e­mat­i­cal Aspects of Infor­ma­tion Sci­ences including:

1. All math­e­mat­i­cal aspects of com­puter sci­ence, includ­ing com­plex­ity the­ory, logic of pro­gram­ming lan­guages, analy­sis of algo­rithms, cryp­tog­ra­phy, com­puter vision, pat­tern recog­ni­tion, infor­ma­tion pro­cess­ing and mod­el­ling of intelligence.
2. Sci­en­tific com­put­ing and numer­i­cal analy­sis. Com­pu­ta­tional aspects of opti­miza­tion and con­trol the­ory. Com­puter algebra.

Nom­i­na­tion Procedure: http://www.mathunion.org/general/prizes/nevanlinna/details/

# The Epijournal: a new publication model

http://www.nature.com/news/mathematicians-aim-to-take-publishers-out-of-publishing-1.12243

# Information and Inference (new journal)

The first issue of Infor­ma­tion and Infer­ence has just appeared:

http://imaiai.oxfordjournals.org/content/current

It includes the fol­low­ing editorial:

In recent years, a great deal of energy and tal­ent have been devoted to new research prob­lems aris­ing from our era of abun­dant and var­ied data/information. These efforts have com­bined advanced meth­ods drawn from across the spec­trum of estab­lished aca­d­e­mic dis­ci­plines: dis­crete and applied math­e­mat­ics, com­puter sci­ence, the­o­ret­i­cal sta­tis­tics, physics, engi­neer­ing, biol­ogy and even finance. This new jour­nal is designed to serve as a meet­ing place for ideas con­nect­ing the the­ory and appli­ca­tion of infor­ma­tion and infer­ence from across these disciplines.

While the fron­tiers of research involv­ing infor­ma­tion and infer­ence are dynamic, we are cur­rently plan­ning to pub­lish in infor­ma­tion the­ory, sta­tis­ti­cal infer­ence, net­work analy­sis, numer­i­cal analy­sis, learn­ing the­ory, applied and com­pu­ta­tional har­monic analy­sis, prob­a­bil­ity, com­bi­na­torics, sig­nal and image pro­cess­ing, and high-dimensional geom­e­try; we also encour­age papers not fit­ting the above descrip­tion, but which expose novel prob­lems, inno­v­a­tive data types, sur­pris­ing con­nec­tions between dis­ci­plines and alter­na­tive approaches to infer­ence. This first issue exem­pli­fies this top­i­cal diver­sity of the sub­ject mat­ter, linked by the use of sophis­ti­cated math­e­mat­i­cal mod­el­ling, tech­niques of analy­sis, and focus on timely applications.

To enhance the impact of each man­u­script, authors are encour­aged to pro­vide soft­ware to illus– trate their algo­rithm and where pos­si­ble repli­cate the exper­i­ments pre­sented in their man­u­scripts. Man­u­scripts with accom­pa­ny­ing soft­ware are marked as “repro­ducible” and have the soft­ware linked on the jour­nal web­site under sup­ple­men­tary mate­r­ial. It is with plea­sure that we wel­come the scien– tific com­mu­nity to this new pub­li­ca­tion venue.

Robert Calder­bank David L. Donoho John Shawe-Taylor Jared Tanner

# Comparing Variability of Random Variables

Con­sider exchange­able ran­dom vari­ables ${X_1, \ldots, X_n, \ldots}$. A cou­ple of facts seem quite intuitive:

State­ment 1. The “vari­abil­ity” of sam­ple mean ${S_m = \frac{1}{m} \sum_{i=1}^{m} X_i}$ decreases with ${m}$.

State­ment 2. Let the aver­age of func­tions ${f_1, f_2, \ldots, f_n}$ be defined as ${\overline{f} (x) := \frac{1}{n} \sum_{i=1}^{n} f_i(x)}$. Then ${\max_{1\leq i \leq n} \overline{f}(X_i)}$ is less “vari­able” than ${\max_{1\leq i \leq n} f_i (X_i)}$.

To make these state­ments pre­cise, one faces the fun­da­men­tal ques­tion of com­par­ing two ran­dom vari­ables ${W}$ and ${Z}$ (or more pre­cisely com­par­ing two dis­tri­b­u­tions). One com­mon way we think of order­ing ran­dom vari­ables is the notion of sto­chas­tic dominance:

$\displaystyle W \leq_{st} Z \Leftrightarrow F_W(t) \geq F_Z(t) \ \ \ \mbox{ for all real } t.$

How­ever, this notion really is only a suit­able notion when one is con­cerned with the actual size of the ran­dom quan­ti­ties of inter­est, while, in our sce­nario of inter­est, a more nat­ural order would be that which com­pares the vari­abil­ity between two ran­dom vari­ables (or more pre­cisely, again, the two dis­tri­b­u­tions). It turns out that a very use­ful notion, used in a vari­ety of fields, is due to Ross (1983): Ran­dom vari­able ${W}$ is said to be sto­chas­ti­cally less vari­able than ran­dom vari­able ${Z}$ (denoted by ${\leq_v}$) when every risk-averse deci­sion maker will choose ${W}$ over ${Z}$ (given they have sim­i­lar means). More pre­cisely, for ran­dom vari­ables ${W}$ and ${Z}$ with finite means

$\displaystyle W \leq_{v} Z \Leftrightarrow \mathbb{E}[f(X)] \leq \mathbb{E}[f(Y)] \ \ \mbox{ for increasing and convex function } f \in \mathcal{F}$

where ${\mathcal{F}}$ is the set of func­tions for which the above expec­ta­tions exist.

One inter­est­ing, but per­haps not entirely obvi­ous, fact is that this notion of order­ing ${W\leq_v Z}$ is equiv­a­lent to say­ing that there is a sequence of mean pre­serv­ing spreads that in the limit trans­forms the dis­tri­b­u­tion of ${W}$ into the dis­tri­b­u­tion of another ran­dom vari­able ${W'}$ with finite mean such that ${W'\leq_{st} Z}$! Also, using results by Hardy, Lit­tle­wood and Polya (1929), the sto­chas­tic vari­abil­ity order intro­duced above can be shown to be equiv­a­lent to Lorenz (1905) order­ing used in eco­nom­ics to mea­sure income equality.

Now with this, we are ready to for­mal­ize our pre­vi­ous state­ments. The first state­ment is actu­ally due to Arnold and Vil­lasenor (1986):

$\displaystyle \frac{1}{m} \sum_{i=1}^{m} X_i \leq_v \frac{1}{m-1} \sum_{i=1}^{m-1} X_i \ \ \ \ \ \ \ \ \ \ \ \ \mbox{for all }\ \ m \in \mathbb{N}.$

Note that when you apply this fact to a sequence of iid ran­dom vari­ables with finite mean ${\mu}$, it strength­ens the strong law of large num­ber in that it ensures that the almost sure con­ver­gence of the sam­ple mean to the mean value ${\mu}$ occurs with monot­o­n­i­cally decreas­ing vari­abil­ity (as the sam­ple size grows).

The sec­ond state­ment comes up in prov­ing cer­tain opti­mal­ity result in shar­ing par­al­lel servers in fork-join queue­ing sys­tems (J. 2008) and has a sim­i­lar flavor:

$\displaystyle \max_{1\leq i \leq n} \overline{f}(X_i) \leq_v \max_{1\leq i \leq n} f_i (X_i).$

The clean­est way to prove both state­ments, to the best of my knowl­edge, is based on the fol­low­ing the­o­rem first proved by Black­well in 1953 (later strength­ened to ran­dom ele­ments in sep­a­ra­ble Banach spaces by Strassen in 1965, hence referred to by some as Strassen’s theorem):

The­o­rem 1 Let ${W}$ and ${Z}$ be two ran­dom vari­ables with finite means. A nec­es­sary and suf­fi­cient con­di­tion for ${W \leq_v Z}$ is that there are two ran­dom vari­ables ${\hat{W}}$ and ${\hat{Z}}$ with the same mar­gin­als as ${W}$ and ${Z}$, respec­tively, such that ${\mathbb{E}[\hat{Z} |\hat{W}] \geq \hat{W}}$ almost surely.

For instance, to prove the first state­ment we con­sider ${\hat{W} = W = \frac{1}{n} \sum_{i=1}^n X_i}$ and ${Z = \frac{1}{n-1} \sum_{i=1}^{n-1} X_i}$. All that is nec­es­sary now is to note that ${\hat{Z} : = \frac{1}{n-1} \sum_{i\in I, i \neq J} X_i}$, ${J}$ is an inde­pen­dent uni­form rv on the set ${I := \{1,2, \ldots, n\}}$, has the same dis­tri­b­u­tion as ran­dom vari­able ${Z}$. Fur­ther­more,

$\displaystyle \mathbb{E} [ \hat{Z} | W ] = \mathbb{E} [ \frac{1}{n} \sum_{J=1}^{n} (\frac{1}{n-1} \sum_{i\in I, i \neq J} X_i ) | W ] = \mathbb{E} [ \frac{1}{n} \sum_{j=1}^{n} X_j | W ] = W.$

Sim­i­larly to prove the sec­ond state­ment, one can con­struct ${\hat{Z}}$ by select­ing a ran­dom per­mu­ta­tion of func­tions ${f_1, \ldots, f_n}$.