James Massey, pioneering coding and information theorist, passed away today June 16, 2013 after a battle with cancer.
An extended interview can be found in
Please add your name to the petition list:
I have an extremal conjecture that I have been working on intermittently with some colleagues (including Jiantao Jiao, Tsachy Weissman, Chandra Nair, and Kartik Venkat). Despite our efforts, we have not been able to prove it. Hence, I thought I would experiment with online collaboration by offering it to the broader IT community.
In order to make things interesting, we are offering a $1000 prize for the first correct proof or counterexample! Feel free to post your thoughts in the public comments. You can also email me if you have questions or want to bounce some ideas off me.
Although I have no way of enforcing them, please abide by the following ground rules:
One final disclaimer: this post is meant to be a brief introduction to the conjecture, with a few partial results to get the conversation started; it is not an exhaustive account of the approaches we have tried.
1. The Conjecture
2. Partial Results
There are several partial results which suggest the validity of Conjecture 1. Moreover, numerical experiments have not produced a counterexample.
Conjecture 1 extends the following well-known consequence of the conditional entropy power inequality to include long Markov chains.
Proof: Consider any satisfying . Let denote the random variables conditioned on . By Markovity and definition of , we have that , where is independent of . Hence, the conditional entropy power inequality implies that
From here, the lemma easily follows.
For every , the variables are jointly Gaussian with correlation coefficient and form a Markov chain, hence Lemma 1 implies
The desired inequality (8) follows by convexity of
as a function of .
3. Equivalent Forms
Alternatively, we can consider dual forms of Conjecture 1. For instance, one such form is stated as follows:
Conjecture 1′. Suppose are jointly Gaussian, each with unit variance and correlation . For , the infimum of
taken over all satisfying is attained when are jointly Gaussian.
including Claude E. Shannon Award winner Sol Golomb
Nominations of people born on or after January 1, 1974
for outstanding contributions in Mathematical Aspects of Information Sciences including:
Nomination Procedure: http://www.mathunion.org/general/prizes/nevanlinna/details/
The first issue of Information and Inference has just appeared:
It includes the following editorial:
In recent years, a great deal of energy and talent have been devoted to new research problems arising from our era of abundant and varied data/information. These efforts have combined advanced methods drawn from across the spectrum of established academic disciplines: discrete and applied mathematics, computer science, theoretical statistics, physics, engineering, biology and even finance. This new journal is designed to serve as a meeting place for ideas connecting the theory and application of information and inference from across these disciplines.
While the frontiers of research involving information and inference are dynamic, we are currently planning to publish in information theory, statistical inference, network analysis, numerical analysis, learning theory, applied and computational harmonic analysis, probability, combinatorics, signal and image processing, and high-dimensional geometry; we also encourage papers not fitting the above description, but which expose novel problems, innovative data types, surprising connections between disciplines and alternative approaches to inference. This first issue exemplifies this topical diversity of the subject matter, linked by the use of sophisticated mathematical modelling, techniques of analysis, and focus on timely applications.
To enhance the impact of each manuscript, authors are encouraged to provide software to illus- trate their algorithm and where possible replicate the experiments presented in their manuscripts. Manuscripts with accompanying software are marked as “reproducible” and have the software linked on the journal website under supplementary material. It is with pleasure that we welcome the scien- tific community to this new publication venue.
Robert Calderbank David L. Donoho John Shawe-Taylor Jared Tanner
Consider exchangeable random variables . A couple of facts seem quite intuitive:
Statement 1. The “variability” of sample mean decreases with .
Statement 2. Let the average of functions be defined as . Then is less “variable” than .
To make these statements precise, one faces the fundamental question of comparing two random variables and (or more precisely comparing two distributions). One common way we think of ordering random variables is the notion of stochastic dominance:
However, this notion really is only a suitable notion when one is concerned with the actual size of the random quantities of interest, while, in our scenario of interest, a more natural order would be that which compares the variability between two random variables (or more precisely, again, the two distributions). It turns out that a very useful notion, used in a variety of fields, is due to Ross (1983): Random variable is said to be stochastically less variable than random variable (denoted by ) when every risk-averse decision maker will choose over (given they have similar means). More precisely, for random variables and with finite means
where is the set of functions for which the above expectations exist.
One interesting, but perhaps not entirely obvious, fact is that this notion of ordering is equivalent to saying that there is a sequence of mean preserving spreads that in the limit transforms the distribution of into the distribution of another random variable with finite mean such that ! Also, using results by Hardy, Littlewood and Polya (1929), the stochastic variability order introduced above can be shown to be equivalent to Lorenz (1905) ordering used in economics to measure income equality.
Now with this, we are ready to formalize our previous statements. The first statement is actually due to Arnold and Villasenor (1986):
Note that when you apply this fact to a sequence of iid random variables with finite mean , it strengthens the strong law of large number in that it ensures that the almost sure convergence of the sample mean to the mean value occurs with monotonically decreasing variability (as the sample size grows).
The second statement comes up in proving certain optimality result in sharing parallel servers in fork-join queueing systems (J. 2008) and has a similar flavor:
The cleanest way to prove both statements, to the best of my knowledge, is based on the following theorem first proved by Blackwell in 1953 (later strengthened to random elements in separable Banach spaces by Strassen in 1965, hence referred to by some as Strassen’s theorem):
Theorem 1 Let and be two random variables with finite means. A necessary and sufficient condition for is that there are two random variables and with the same marginals as and , respectively, such that almost surely.
For instance, to prove the first statement we consider and . All that is necessary now is to note that , is an independent uniform rv on the set , has the same distribution as random variable . Furthermore,
Similarly to prove the second statement, one can construct by selecting a random permutation of functions .