Lecture 8. Entropic cone and matroids

This lecture introduces the notion of the entropic cone and its connection with entropy inequalities.

Entropic cone

Recall that if X is a discrete random variable with distribution P_X, the entropy of X is defined as

    \[H(X) = H(P_X) = -\sum_{x\in\mathrm{supp}(P_X)}P_X(x)\log_2 P_X(x).\]

Now let (X_1,\ldots,X_m)\sim \mu be (not necessarily independent) discrete random variables. It will be convenient to define the entropy function

    \[H(S) := H(X_S) = H(\mu|_{S}) \quad\mbox{where}\quad S\subseteq[m]:=     \{1,\ldots,m\},\quad X_S:=\{X_i:i\in S\}.\]

As the entropy depends only on the probabilities of the outcomes of a random variable and not on its values, we will assume without loss of generality in the sequel that X_1,\ldots,X_n take values in \mathbb{Z}_+.

For any probability \mu on \mathbb{Z}_+^m, let \vec{H_\mu} be the (2^m-1)-dimensional vector

    \[\vec{H_\mu} := \{H(S):S\subseteq[m],~S\ne\varnothing\}.\]

We can now define the entropic cone.

Definition. The entropic cone is the set

    \[\Gamma_m^* := \{\vec{H_\mu}:\mu\mbox{ a probability on }\mathbb{Z}_+^m\}   \subset\mathbb{R}^{2^m-1},\]

i.e., the set of all vectors that can be obtained as the entropy function of m variables.

Question. Can we characterize \Gamma_m^*?

Examples.

  1. If m=1, then \Gamma_1^*=\mathbb{R}_+, as there exist random variables of arbitrary (nonnegative) entropy.
  2. If m=2, the vector \vec{H_\mu} for (X_1,X_2)\sim\mu is

        \[\vec{H_\mu} = \left( \begin{array}{c} H(X_1) \\ H(X_2) \\ H(X_1,X_2) \end{array} \right).\]

    What constraints must this vector satisfy? We must certainly have

        \[H(X_1) \vee H(X_2) \leq H(X_1,X_2) \leq H(X_1)+H(X_2),\]

    which follows from the chain rule together with positivity for the first inequality, and that conditioning reduces entropy for the second. Are these the only constraints? In fact, we can create many vectors given these constraints. For example:

    • The vector (\alpha,\alpha,\alpha)^T is obtained by taking Z such that H(Z)=\alpha, and letting X_1=X_2=Z.
    • The vector (\alpha,\alpha,2\alpha)^T is obtained by taking X_1,X_2 to be i.i.d. copies of Z.
    • Convex combinations of these vectors can be obtained by taking mixtures of their distributions.

    In fact, a careful analysis shows that the above constraints completely characterize the entropic cone for m=2: that is,

        \[\Gamma_2^* = \{(x,y,z)\in\mathbb{R}_+^3: x\vee y\le z\le x+y\}.\]

  3. If m=3, what happens? Here we are are still subject to the same constraints as in the case m=2, but we pick up some new constraints as well such as

        \[H(X_1,X_2) + H(X_2,X_3) \ge H(X_1,X_2,X_3) + H(X_2)\]

    which is equivalent to the inequality H(X_1|X_2) \ge H(X_1|X_2,X_3).

Evidently, the constraints on the entropic cone correspond to entropy inequalities. What type of inequalities must hold for any given m? Let us think about this question a bit more systematically.

Definition. A set function f:2^{[m]}\to\mathbb{R} is called a polymatroid if

  1. f(\varnothing)=0, f(S)\ge 0 for all S.
  2. f(S)\le f(T) when S\subseteq T (monotonicity).
  3. f(S\cup T)+f(S\cap T)\le f(S)+f(T) (submodularity).

It is not difficult to check that

Lemma. The entropy function H is a polymatroid.

Other examples of polymatroids: (1) entropy, mutual information; (2) max-weight f(S)=\max_{x\in S}w(x) for given weights w; (3) flows; (4) cuts (not monotone, but submodular); (5) Von Neumann entropy.

Let us define

    \[\Gamma_m = \{ \mbox{polymatroid vectors on }[m]\}\subset\mathbb{R}^{2^m-1}.\]

Evidently \Gamma_m is a polyhedral cone (the intersection of a finite number of halfplanes).

Theorem.

  1. \Gamma_3^*\ne \Gamma_3.
  2. \mathrm{cl}\,\Gamma_3^*=\Gamma_3.
  3. \mathrm{cl}\,\Gamma_m^*\ne\Gamma_m for m\ge 4.
  4. \mathrm{cl}\,\Gamma_m^* is a convex cone for all m\ge 1 (but not polyhedral for m\ge 4).

For the proof, we refer to the book for R. Yeung.

As an example of \Gamma_3^*\ne \Gamma_3, we note that the vector (\alpha,\alpha,\alpha,2\alpha,2\alpha,2\alpha,2\alpha)^T is only achievable when \alpha=\log_2M for some M\in\mathbb{Z}_+. Let us also note that convexity of \Gamma_m^* holds in its interior but not on the boundary. However, issues only arise on the boundary of \Gamma_3^*, that is, no new entropy inequalities appear beyond the polymatroid inequalities when m=3.

On the other hand, when m\ge 4, many new inequalities are introduced that actually cause holes to appear within \Gamma_m^*. One such inequality (expressed in terms of mutual information) is

    \[2I(X_3;X_4)\leq I(X_1;X_2) + I(X_1;X_3 \mid X_4) + 3I(X_3;X_4 \mid X_1) + I(X_3;X_4 \mid X_2).\]

Using the entropic cone

The entropic cone can be used to obtain new information theoretic inequalities. For example, the following question arose in the study of mechanism design in computer science.

Problem. Let X_1,\ldots,X_m be discrete random variables. Define the m\times m matrix A as A_{ij}=I(X_i;X_j)=H(X_i)+H(X_j)-H(X_i,X_j). Is A positive definite?

When m=2, we have \det(A)=H(X_1)H(X_2)-I(X_1 ; X_2)^2 \geq 0 since H(X_1) \geq I(X_1;X_2). Thus all the principal minors of A have positive determinant, so A is positive definite.

How about m=3? Note that A depends linearly on the entries of the vector \vec{H_\mu} (where \mu is the joint distribution of X_1,\ldots,X_3). Thus if A is positive definite for distributions on the extreme rays (ER) of the entropic cone, then A must be positive definite for any distribution. More generally:

Proposition. If F : \mathbb{R}_+^7 \rightarrow \mathbb{R} is convex, then F(\vec{H}) \geq 0 holds for all \vec H \in \Gamma_3^* if and only if F(\vec H_i) \geq 0 for all \vec H_i \in ER(\Gamma_3).

Proof. It suffices to note that

    \[F(\lambda H_i + (1-\lambda) H_j) \geq \lambda F(H_i) + (1-\lambda)F(H_j) \geq 0\]

for every 0\le\lambda\le 1, and to use that \Gamma_3=\mathrm{cl}\,\Gamma_3^*. \square

This necessary and sufficient condition for m=3 generalizes to a sufficient condition for m\ge 4.

Proposition. If F : \mathbb{R}_+^{2^m-1} \rightarrow \mathbb{R} is convex, then F(\vec{H}) \geq 0 holds for all \vec H \in \Gamma_m^* if F(\vec H_i) \geq 0 for all \vec H_i \in ER(\Gamma_m}).

As \Gamma_m is polyhedral, this simplifies solving problems such as checking positive definiteness of A significantly: it suffices to check a finite number of cases, which can essentially be done by hand. It can be shown this way that A is always positive definite for m=3, but this can fail for m\ge 4.

Matroids

It is sometimes of interest to investigate discrete variants of the above questions.

Definition. A matroid f : 2^{[m]} \rightarrow \mathbb{Z}_+ is defined by the conditions

  1. f(S)\le |S| for all S;
  2. f is monotone;
  3. f is submodular.

Examples.

  1. Vector matroids (also called \mathbb{F}-representable matroids, where \mathbb{F} is a field). Given a matrix A with values in \mathbb{F}, define f(S) = rank(A(S)) where A(S) denotes the columns induced by S.
  2. Graphical matroids. Let G=(V,E) be a graph, and choose m=|E|. Then define f(S) = \max|H|, where the maximum is taking over acyclic subgraphs H \subset S.
  3. Entropic matroids. Let f(S)=H(X_S). For what distributions of X_1,\ldots,X_m is this a matroid?

Denote by \tilde\Gamma^*_{m} the entropic cone induced by those distributions where X_1,\ldots,X_m take values in \{0,1\}. In order for such a distribution \mu to define an entropic matroid, the vector \vec{H_\mu} must take values in \mathbb{Z}^{2^m-1}. Thus we are led to consider the set

    \[\Pi_{m} := \tilde\Gamma^*_{m} \cap \mathbb{Z}^{2^m-1}.\]

Can one characterize what type of matroids can arise as entropic matroids?

Theorem. \Pi_m coincides with the set of all GF(2)-representable matroids with m elements.

For the proof and further results around this theme, see the paper of E. Abbe.

Lecture by Emmanuel Abbe | Scribed by Danny Gitelman

06. December 2013 by Ramon van Handel
Categories: Information theoretic methods | Comments Off

css.php