Entropy in additive combinatorics - Entropy Methods in Combinatorics

We shall need two “simple” results from additive combinatorics due to Imre Ruzsa.

Definition (Sum set / difference set / etc). Let $G$ be an abelian group and let $A, B \subset G$ .

The sumset $A + B$ is the set ${x + y : x \in A, y \in B}$ .

The difference set $A - B$ is the set ${x - y : x \in A, y \in B}$ .

We write $2 A$ for $A + A$ , $3 A$ for $A + A + A$ , etc.

Proof. This is equivalent to the statement

| A - C | | B | \leq | A - B | | B - C | .

For each $x \in A - C$ , pick $a (x) \in A$ , $c (x) \in C$ such that $a (x) - c (x) = x$ . Define a map

\begin{array}{l} ϕ : (A - C) \times B & \to (A - B, B - C) \\ (x, b) & \mapsto (a (x) - b, b - c (x)) \end{array}

Adding the coordinates of $ϕ (x, b)$ gives $x$ , so we can calculate $a (x)$ (and $c (x)$ ) from $ϕ (x, b)$ , and hence can calculate $b$ . So $ϕ$ is an injection. □

Lemma 6.2 (Ruzsa covering lemma). Assuming that:

$G$ an abelian group
$A, B$ finite subsets of $G$

Then

A

can be covered by at most

\frac{| A + B |}{| B |}

translates of

B - B

Proof. Let ${x_{1}, \dots, x_{k}}$ be a maximal subset of $A$ such that the sets $x_{i} + B$ are disjoint.

Then if $a \in A$ , there exists $i$ such that $(a + B) \cap (x_{i} + B) \neq \emptyset$ . Then $a \in x_{i} + B - B$ .

So $A$ can be covered by $k$ translates of $B - B$ . But

| B | k = | \underset{\subset A + B}{\underset{⏟}{{x_{1}, \dots, x_{k}} + B}} | \leq | A + B | . □

Let

X

Y

be discrete random variables taking values in an abelian group. What is

X + Y

when

X

and

Y

are independent?

For each

z

ℙ (X + Y = z) = \sum_{x + y = z} ℙ (X = x) ℙ (Y = y)

. Writing

p_{x}

and

q_{y}

for

ℙ (X = x)

and

ℙ (Y = y)

respectively, this givesim

\sum_{x + y = z} p_{x} q_{y} = p * q (z)

where

p (x) = p_{x}

and

q (y) = z_{y}

Definition (Entropic Ruzsa distance). Let $G$ be an abelian group and let $X$ , $Y$ be $G$ -valued random variables. The entropic Ruzsa distance $d [X; Y]$ is

H [X^{'} - Y^{'}] - \frac{1}{2} H [X] - \frac{1}{2} H [Y]

where $X^{'}$ , $Y^{'}$ are independent copies of $X$ and $Y$ .

Lemma 6.3. Assuming that:

$A$ , $B$ are finite subsets of $G$
$X$ , $Y$ are uniformly distributed on $A$ , $B$ respectively

Then

d [X; Y] \leq \log d (A, B) .

Proof. Without loss of generality $X$ , $Y$ are indepent. Then

\begin{array}{l} d [X; Y] & = H [X - Y] - \frac{1}{2} H [X] - \frac{1}{2} H [Y] \\ \leq \log | A - B | - \frac{1}{2} \log A - \frac{1}{2} \log B \\ = \log d (A, B) □ \end{array}

Proof.

\begin{array}{l} H [X + Y] & \geq H [X + Y | Y] & (by Subadditivity) \\ = H [X + Y, Y] - H [Y] \\ = H [X, Y] - H [Y] \\ = H [X] + H [Y] - H [Y] - I [X : Y] \\ = H [X] - I [X : Y] \end{array}

By symmetry we also have

H [X + Y] \geq H [Y] - I [X : Y] . □

Proof. Without loss of generality $X$ , $Y$ are independent. Then $I [X : Y] = 0$ , so

\begin{array}{l} H [X - Y] & \geq \max {H [X], H [Y]} \\ \geq \frac{1}{2} (H [X] + H [Y]) □ \end{array}

Lemma 6.6. Assuming that:

$X$ , $Y$ are $G$ -valued random variables

Then

d [X; Y] = 0

if and only if there is some (finite) subgroup

H

G

such that

X

and

Y

are uniform on cosets of

H

Proof.

$\Leftarrow$ If $X$ , $Y$ are uniform on $x + H$ , $y + H$ , then $X^{'} - Y^{'}$ is uniform on $x - y + H$ , so
$H [X^{'} - Y^{'}] = H [X] = H [Y] .$

So $d [X; Y] = 0$ .
$\Rightarrow$ Suppose that $X$ , $Y$ are independent and $H [X - Y] = \frac{1}{2} (H [X] + H [Y])$ .
From the first line of the proof of Lemma 6.4, it follows that $H [X - Y | Y] = H [X - Y]$ . Therefore, $X - Y$ and $Y$ are independent. So for every $z \in A - B$ and every $y_{1}, y_{2} \in B$ ,

$ℙ (X - Y = z | Y = y_{1}) = ℙ (X - Y = z | Y = y_{2})$

where $A = {x : p_{x} \neq 0}$ , $B = {y : q_{y} \neq 0}$ , i.e. for all $y_{1}, y_{2} \in B$ ,

$ℙ (X = y_{1} + z) = ℙ (X = y_{2} + z) .$

So $p_{x}$ is constant on $z + B$ .

In particular, $A \supset z + B$ .

By symmetry, $B \supset A - z$ .

So $A = B + z$ for any $z \in A - B$ . So for every $x \in A$ , $y \in B$ , $A = B + x - y$ , so $A - x = B - y$ . So $A - x$ is the same for every $x \in A$ . Therefore, $A - x = A - A$ for every $x \in A$ .

It follows that
$A - A + A - A = (A - x) - (A - x) = A - A .$

So $A - A$ is a subgroup. Also, $A = A - A + c$ , so $A$ is a coset of $A - A$ . $B = A + z$ , so $B$ is also a coset of $A - A$ . □

Proof. We must show that (assuming without loss of generality that $X$ , $Y$ and $Z$ are independent) that

H [X - Z] - \frac{1}{2} H [X] - \frac{1}{2} H [Z] \leq H [X - Y] - \frac{1}{2} H [X] - \frac{1}{2} H [Y] + H [Y - Z] - \frac{1}{2} H [Y] - \frac{1}{2} H [Z],

i.e. that

H [X - Z] + H [Y] \leq H [X - Y] + H [Y - Z] . (∗)

Since $X - Z$ is a function of $(X - Y, Y - Z)$ and is also a function of $(X, Z)$ , we get using Lemma 1.16 that

H [X - Y, Y - Z, X, Z] + H [X - Z] \leq H [X - Y, Y - Z] + H [X, Z] .

This is the same as

H [X, Y, Z] + H [X - Z] \leq H [X, Z] + H [X - Y, Y - Z] .

By independence, cancelling common terms and Subadditivity, we get ( $*$ ). □

Lemma 6.8 (Submodularity for sums). Assuming that:

$X$ , $Y$ , $Z$ are independent $G$ -valued random variables

Then

H [X + Y + Z] + H [Z] \leq H [X + Z] + H [Y + Z] .

Proof. $X + Y + Z$ is a function of $(X + Z, Y)$ and also a function of $(X, Y + Z)$ . Therefore (using Lemma 1.16),

H [X + Z, Y X, Y + Z] + H [X + Y + Z] \leq H [X + Z, Y] + H [X, Y + Z] .

Hence

H [X, Y, Z] + H [X + Y + Z] \leq H [X + Z] + H [Y] + H [X] + H [Y + Z] .

By independence and cancellation, we get the desired inequality. □

Lemma 6.9. Assuming that:

$G$ an abelian group
$X$ a $G$ -valued random variable

Then

d [X; - X] \leq 2 d [X; X] .

Proof. Let $X_{1}$ , $X_{2}$ , $X_{3}$ be independent copies of $X$ . Then

\begin{array}{l} d [X; - X] & = H [X_{1} + X_{2}] - \frac{1}{2} H [X_{1}] - \frac{1}{2} H [X_{2}] \\ \leq H [X_{1} + X_{2} - X_{3}] - H [X] \\ \leq H [X_{1} - X_{3}] + H [X_{2} - X_{3}] - H [X_{3}] - H [X] \\ = 2 d [X; X] \end{array}

(as $X_{1}, X_{2}, X_{3}$ are all copies of $X$ ). □

Proof.

\begin{array}{l} d [X; - Y] & \leq d [X; Y] + d [Y; - Y] \\ \leq d [X; Y] + 2 d [Y; Y] \\ \leq d [X; Y] + 2 (d [Y; X] + d [X; Y]) \\ = 5 d [X; Y] □ \end{array}

Conditional Distances

Definition (Conditional distance). Let $X, Y, U, V$ be $G$ -valued random variables (in fact, $U$ and $V$ don’t have to be $G$ -valued for the definition to make sense). Then the conditional distance is

d [X | U; Y | V] = \sum_{u, v} ℙ [U = u] ℙ [V = v] d [X | U = u; Y | V = v] .

Definition (Simultaneous conditional distance). Let $X, Y, U$ be $G$ -valued random variables. The simultaneous conditional distance of $X$ to $Y$ given $U$ is

d [X; Y ∥ U] = \sum_{u} ℙ [U = u] d [X | U = u; Y | U = u] .

We say that $X^{'}$ , $Y^{'}$ are conditionally independent trials of $X$ , $Y$ given $U$ if:

$X^{'}$ is distributed like $X$ .
$Y^{'}$ is distributed like $Y$ .
For each $u \in U$ , $X^{'} | U = u$ is distributed like $X | U = u$ ,
For each $u \in U$ , $Y^{'} | U = u$ is distributed like $Y | U = u$ .
$X^{'} | U = u$ and $Y^{'} | U = u$ are independent.

Then

d [X; Y ∥ U] = H [X^{'} - Y^{'} | U] - \frac{1}{2} H [X^{'} | U] - \frac{1}{2} H [Y^{'} | U]

(as can be seen directly from the formula).

Lemma 6.11 (The entropic BSG theorem). Assuming that:

$A$ and $B$ are $G$ -valued random variables

Then

d [A; B ∥ A + B] \leq 3 I [A : B] + 2 H [A + B] - H [A] - H [B] .

Proof.

d [A; B ∥ A + B] = H [A^{'} - B^{'} | A + B] - \frac{1}{2} H [A^{'} | A + B] - \frac{1}{2} H [B^{'} | A + B]

where $A^{'}$ , $B^{'}$ are conditionally independent trials of $A$ , $B$ given $A + B$ . Now calculate

\begin{array}{l} H [A^{'} | A + B] & = H [A | A + B] \\ = H [A, A + B] - H [A + B] \\ = H [A, B] - H [A + B] \\ = H [A] + H [B] - I [A : B] - H [A + B] \end{array}

Similarly, $H [B^{'} | A + B]$ is the same, so $\frac{1}{2} H [A^{'} | A + B] + \frac{1}{2} H [B^{'} | A + B]$ is also the same.

H [A^{'} - B^{'} | A + B] \leq H [A^{'} - B^{'}] .

Let $(A_{1}, B_{1})$ and $(A_{2}, B_{2})$ be conditionally independent trials of $(A, B)$ given $A + B$ . Then $H [A^{'} - B^{'}] = H [A_{1} - B_{2}]$ . By Submodularity,

\begin{array}{l} H [A_{1} - B_{2}] & \leq H [A_{1} - B_{2}, A] + H [A_{1} - B_{2}, B_{1}] - H [A_{1} - B_{2}, A_{1}, B_{1}] \\ H [A_{1} - B_{2}, A_{1}] & = H [A_{1}, B_{2}] \\ \leq H [A_{1}] + H [B_{2}] \\ = H [A] + H [B] \\ H [A_{1} - B_{2}, B_{1}] & = H [A_{2} - B_{1}, B_{1}] & (since A_{1} + B_{1} = A_{2} + B_{2}) \\ = H [A_{2}, B_{1}] \\ \leq H [A] + H [B] \end{array}

Finally,

\begin{array}{l} H [A_{1} - B_{2}, A_{1}, B_{1}] & = H [A_{1}, B_{1}, A_{2}, B_{2}] \\ = H [A_{1}, B_{1}, A_{2}, B_{2} | A + B] + H [A + B] \\ = 2 H [A, B] A + B + H [A + B] & (by conditional independence of (A_{1}, B_{1}) and (A_{2}, B_{2})) \\ = 2 H [A, B] - H [A + B] \\ = 2 H [A] + 2 H [B] - 2 I [A : B] - H [A + B] \end{array}

Adding or subtracting as appropriate all these terms gives the required inequality. □

6 Entropy in additive combinatorics

Conditional Distances