The Khinchin (Shannon?) axioms for entropy - Entropy Methods in Combinatorics

Definition (Entropy). The entropy of a discrete random variable $X$ is a quantity $H [X]$ that takes real values and has the following properties:

(i) Normalisation: If $X$ is uniform on ${0, 1}$ then $H [X] = 1$ .
(ii) Invariance: If $X$ takes values in $A$ , $Y$ takes values in $B$ , $f$ is a bijection from $A$ to $B$ , and for every $a \in A$ we have $ℙ [X = a] = ℙ [Y = f (a)]$ , then $H [Y] = H [X]$ .
(iii) Extendability: If $X$ takes values in a set $A$ , and $B$ is disjoint from $A$ , $Y$ takes values in $A \cup B$ , and for all $a \in A$ we have $ℙ [Y = a] = ℙ [X = a]$ , then $H [Y] = H [X]$ .
(iv) Maximality: If $X$ takes values in a finite set $A$ and $Y$ is uniformly distributed in $A$ , then $H [X] \leq H [Y]$ .
(v) Continuity: $H$ depends continuously on $X$ with respect to total variation distance (defined by the distance between $X$ and $Y$ is $\sup_{E} | ℙ [X \in E] - ℙ [Y \in E] |$ ).

For the last axiom we need a definition:

Let $X$ and $Y$ be random variables. The conditional entropy $H [X | Y]$ of $X$ given $Y$ is

\sum_{y} ℙ [Y = y] H [X | Y = y] .

(vi) Additivity: $H [X, Y] = H [Y] + H [X | Y]$ .

Proof. $H [X | Y] = \sum_{y} ℙ [Y = y] H [X | Y = y]$ .

Since $X$ and $Y$ are independent, the distribution of $X$ is unaffected by knowing $Y$ (so by invariance, $H [X | Y = y] = H [X]$ ), so

H [X | Y = y] = H [X]

for all $y$ , which gives the result. □

Proof. Lemma 1.1 and obvious induction. □

Lemma 1.3 (Chain rule). Assuming that:

$X_{1}, \dots, X_{n}$ are random variables

Then

H [X_{1}, \dots, X_{n}] = H [X_{1}] + H [X_{2} | X_{1}] + H [X_{3} | X_{1}, X_{2}] + \dots + H [X_{n} | X_{1}, \dots, X_{n - 1}] .

Proof. The case $n = 2$ is additivity. In general,

H [X_{1}, \dots, X_{n}] = H [X_{1}, \dots, X_{n - 1}] + H [X_{n} | X_{1}, \dots, X_{n - 1}]

so we are done by induction. □

Proof. The map $g : x \mapsto (x, f (x))$ is a bijection, and $(X, Y) = g (X)$ . So the first statement follows by invariance. For the second statement:

\begin{array}{l} H [Z | X, Y] & = H [Z, X, Y] - H [X, Y] & (by additivity) \\ = H [Z, X] - H [X] & (by first part) \\ = H [Z | X] & (by additivity) □ \end{array}

Proof. $X$ and $X$ are independent. Therefore, by Lemma 1.1, $H [X, X] = 2 H [X]$ . But by invariance, $H [X, X] = H [X]$ . So $H [X] = 0$ . □

Proof. Let $X_{1}, \dots, X_{n}$ be independent random variables uniformly distributed on ${0, 1}$ . By Corollary 1.2 and normalisation, $H [X_{1}, \dots, X_{n}] = n$ . But $(X_{1}, \dots, X_{n})$ is uniformly distributed on ${0, 1}^{n}$ , so by invariance, the result follows. □

Reminder:

\log

here is to the base

2

(which is the convention for this course).

Proof. Let $r$ be a positive integer and let $X_{1}, \dots, X_{r}$ be independent copies of $X$ .

Then $(X_{1}, \dots, X_{r})$ is uniform on $A^{r}$ and

H [X_{1}, \dots, X_{r}] = r H [X] .

Now pick $k$ such that $2^{k} \leq n^{r} \leq 2^{k + 1}$ . Then by invariance, maximality, and Proposition 1.6, we have that

k \leq r H [X] \leq k + 1 .

\frac{k}{r} \leq \log n \leq \frac{k + 1}{r} ⟹ \frac{k}{r} \leq H [X] \leq \frac{k + 1}{r} \forall k, r

Therefore, $H [X] = \log n$ as claimed. □

Theorem 1.8 (Khinchin). Assuming that:

$H$ satisfies the Khinchin axioms
$X$ takes values in a finite set $A$

Then

H [X] = \sum_{a \in A} p_{a} \log (\frac{1}{p_{a}}) .

Proof. First we do the case where all $p_{a}$ are rational (and then can finish easily by the continuity axiom).

Pick $n \in ℕ$ such that for all $a$ , there is some $m_{a} \in ℕ \cup {0}$ such that $p_{a} = \frac{m_{a}}{n}$ .

Let $Z$ be uniform on $[n]$ . Let $(E_{a} : a \in A)$ be a partition of $[n]$ into sets with $| E_{a} | = m_{a}$ . By invariance we may assume that $X = a ⟺ Z \in E_{a}$ . Then

\begin{array}{l} \log n & = H [Z] \\ = H [Z, X] \\ = H [X] + H [Z | X] \\ = H [X] + \sum_{a \in A} p_{a} H [Z | X = a] \\ = H [X] + \sum_{a \in A} p_{a} \log (m_{a}) \\ = H [X] + \sum_{a \in A} p_{a} (\log p_{a} + \log n) \end{array}

Hence

H [X] = - \sum_{a \in A} p_{a} \log p_{a} = \sum_{a \in A} p_{a} \log (\frac{1}{p_{a}}) .

By continuity, since this holds if all $p_{a}$ are rational, we conclude that the formula holds in general. □

Proof. Immediate consequence of Theorem 1.8. □

Proof. $H [X] = H [X, Y] = H [Y] + H [X | Y]$ . But $H [X | Y] \geq 0$ . □

Proof. Note that for any two random variables $X, Y$ we have

\begin{array}{l} H [X, Y] & \leq H [X] + H [Y] \\ ⟺ H [X | Y] & \leq H [X] \\ ⟺ H [Y | X] & \leq H [Y] \end{array}

Next, observe that $H [X | Y] \leq H [X]$ if $X$ is uniform on a finite set. That is because

\begin{array}{l} H [X | Y] & = \sum_{y} ℙ [Y = y] H [X | Y = y] \\ \leq \sum_{y} ℙ [Y = y] H [X] & (by maximality) \\ = H [X] \end{array}

By the equivalence noted above, we also have that $H [X | Y] \leq H [X]$ if $Y$ is uniform.

Now let $p_{a b} = ℙ [(X, Y) = (a, b)]$ and assume that all $p_{a b}$ are rational. Pick $n$ such that we can write $p_{a b} = \frac{m_{a b}}{n}$ with each $m_{a b}$ an integer. Partition $[n]$ into sets $E_{a b}$ of size $m_{a b}$ . Let $Z$ be uniform on $[n]$ . Without loss of generality (by invariance) $(X, Y) = (a, b) ⟺ Z \in E_{a b}$ .

Let $E_{b} = \cup_{a} E_{a b}$ for each $b$ . So $Y = b ⟺ Z \in E_{b}$ . Now define a random variable $W$ as follows: If $Y = b$ , then $W \in E_{b}$ , but then $W$ is uniformly distributed in $E_{b}$ and independent of $X$ (or $Z$ if you prefer).

So $W$ and $X$ are conditionally independent given $Y$ , and $W$ is uniform on $[n]$ .

Then

\begin{array}{l} H [X | Y] & = H [X | Y, W] & (by conditional independence) \\ = H [X | W] & (as W determines Y) \\ \leq H [X] & (as W is uniform) \end{array}

By continuity, we get the result for general probabilities. □

Proof (Without using formula). By Subadditivity, $H [X | X] \leq H [X]$ . But $H [X | X] = 0$ . □

Proof. Induction using Subadditivity. □

Proof. Calculate:

\begin{array}{l} H [X | Y, Z] & = \sum_{z} ℙ [Z = z] H [X | Y, Z = z] \\ \leq \sum_{z} ℙ [Z = z] H [X | Z = z] \\ = H [X | Z] □ \end{array}

Lemma 1.15. Assuming that:

$X, Y, Z$ random variables
$Z = f (Y)$

Then

H [X | Y] \leq H [X | Z] .

Proof.

\begin{array}{l} H [X | Y] & = H [X, Y] - H [Y] \\ = H [X, Y, Z] - H [Y, Z] \\ \leq H [X, Z] - H [Z] & (Submodularity) \\ = H [X | Z] & □ \end{array}

Lemma 1.16. Assuming that:

$X, Y, Z$ random variables
$Z = f (X) = g (Y)$

Then

H [X, Y] + H [Z] \leq H [X] + H [Y] .

Proof. Submodularity says:

H [X, Y, Z] + H [Z] \leq H [X, Z] + H [Y, Z]

which implies the result since $Z$ depends on $X$ and $Y$ . □

Lemma 1.17. Assuming that:

$X$ takes values in a finite set $A$
$Y$ is uniform on $A$
$H [X] = H [Y]$

Then

X

is uniform.

Proof. Let $p_{a} = ℙ [X = a]$ . Then

\begin{array}{l} H [X] & = \sum_{a \in A} p_{a} \log (\frac{1}{p_{a}}) \\ = | A | 𝔼_{a \in A} p_{a} \log (\frac{1}{p_{a}}) \end{array}

The function $x \mapsto x \log \frac{1}{x}$ is concave on $[0, 1]$ . So, by Jensen’s inequality this is at most

| A | (𝔼_{a} p_{a}) \log (\frac{1}{𝔼_{a} p_{a}}) = \log (| A |) = H [Y] .

Equality holds if and only if $a \mapsto p_{a}$ is constant – i.e. $X$ is uniform. □

Corollary 1.18. Assuming that:

$X, Y$ random variables
$H [X, Y] = H [X] + H [Y]$

Then

X

and

Y

are independent.

Proof. We go through the proof of Subadditivity and check when equality holds.

Suppose that $X$ is uniform on $A$ . Then

\begin{array}{l} H [X | Y] & = \sum_{y} ℙ [Y = y] H [X | Y = y] \\ \leq H [X] \end{array}

with equality if and only if $H [X | Y = y]$ is uniform on $A$ for all $y$ (by Lemma 1.17), which implies that $X$ and $Y$ are independent.

At the last stage of the proof we used

H [X | Y] = H [X | Y, W] = H [X | W] \leq H [X]

where $W$ was uniform. So equality holds only if $X$ and $W$ are independent, which implies (since $Y$ depends on $W$ ) that $X$ and $Y$ are indpendent. □

Subadditivity is equivalent to the statement that

I [X : Y] \geq 0

and Corollary 1.18 implies that

I [X : Y] = 0

if and only if

X

and

Y

are independent.

Definition (Conditional mutual information). Let $X$ , $Y$ and $Z$ be random variables. The conditional mutual information of $X$ and $Y$ given $Z$ , denoted by $I [X : Y | Z]$ is

\begin{array}{l} \sum_{z} ℙ [Z = z] I [X | Z = z : Y | Z = z] \\ = \sum_{z} ℙ [Z = z] (H [X | Z = z] + H [Y | Z = z] - H [X, Y | Z = z]) \\ = H [X | Z] + H [Y | Z] - H [X, Y | Z] \\ = H [X, Z] + H [Y, Z] - H [X, Y, Z] - H [Z] \end{array}

1 The Khinchin (Shannon?) axioms for entropy