A proof of Marton’s conjecture in 𝔽n2 - Entropy Methods in Combinatorics

7 A proof of Marton’s conjecture in $𝔽_{2}^{n}$

We shall prove the following theorem.

Theorem 7.1 (Green, Manners, Tao, Gowers). There is a polynomial $p$ with the following property:

n \in ℕ

and

A \subset 𝔽_{2}^{n}

is such that

| A + A | \leq C | A |

, then there is a subspace

H \subset 𝔽_{2}^{n}

of size at most

| A |

such that

A

is contained in the union of at most

p (C)

translates of

H

. (Equivalently, there exists

K \subset 𝔽_{2}^{n}

| K | \leq p (C)

such that

A \subset K + H

This is known as “Polynomial Freiman–Ruzsa”.

In fact, we shall prove the following statement.

Theorem 7.2 (Entropic Polynomial Freiman–Ruzsa). There exists an absolute constant $α$ satisfying the following: Let $G = 𝔽_{2}^{n}$ and let $X, Y$ be $G$ -valued random variables. Then there exists a subsgroup $H$ of $G$ such that

d [X; U_{H}] + d [U_{H}; Y] \leq α d [X; Y]

where $U_{H}$ is the uniform distribution on $H$ .

Lemma 7.3. Assuming that:

$X$ a discrete random variable (and write $p_{x}$ for $ℙ (X = x)$ )

Then there exists

x

such that

p_{x} \geq 2^{- H [X]}

Proof. If not, then

H [X] = \sum_{x} p_{x} \log (\frac{1}{p_{x}}) > H [X] \sum_{x} p_{x} = H [X],

contradiction. □

Proposition 7.4. Theorem 7.2 implies Theorem 7.1.

Proof. Let $A \subset 𝔽_{2}^{n}$ , $| A + A | \leq C | A |$ . Let $X$ and $Y$ be independent copies of $U_{A}$ . Then by Theorem 7.2, there exists $H$ (a subgroup) such that

d [X; U_{H}] + d [U_{H}; X] \leq α d [X; Y]

d [X; U_{H}] \leq \frac{α}{2} d [X; Y] .

But

\begin{array}{l} d [X; Y] & = H [U_{A} - U_{A}^{'}] - H [U_{A}] \\ = H [U_{A} + U_{A}^{'}] - H [U_{A}] & (characteristic 2) \\ \leq \log (C | A |) - \log | A | \\ = \log C \end{array}

So $d [X; U_{H}] \leq \frac{α \log C}{2}$ . Therefore

\begin{array}{l} H [X + U_{H}] & \leq \frac{1}{2} H [X] + \frac{1}{2} H [U_{H}] + \frac{α \log C}{2} \\ = \frac{1}{2} \log | A | + \frac{1}{2} \log | H | + \frac{α \log C}{2} \end{array}

Therefore, by Lemma 7.3, there exists $z$ such that

ℙ (X + U_{H} = z) \geq | A |^{- \frac{1}{2}} | H |^{- \frac{1}{2}} C^{- \frac{α}{2}} .

But

ℙ (X + U_{H} = z) = \frac{| A \cap (z - H) |}{| A | | H |} = \frac{| A \cap (z + H) |}{| A | | H |}

(using characteristic 2). So there exists $z \in G$ such that

| A \cap (z + H) | \geq C^{- \frac{α}{2}} | A |^{\frac{1}{2}} | H |^{\frac{1}{2}} .

Let $B = A \cap (z + H)$ . By the Ruzsa covering lemma, we can cover $A$ by at most $\frac{| A + B |}{| B |}$ translates of $B + B$ . But $B \subset z + H$ so $B + B \subset H + H = H$ , so $A$ can be covered by at most $\frac{| A + B |}{| B |}$ translates of $H$ .

But using $B \subset A$ ,

| A + B | \leq | A + A | \leq C | A | .

\frac{| A + B |}{| B |} \leq \frac{C | A |}{C^{- \frac{α}{2}} | A |^{\frac{1}{2}} | H |^{\frac{1}{2}}} = C^{\frac{α}{2} + 1} \frac{| A |^{\frac{1}{2}}}{| H |^{\frac{1}{2}}} .

Since $B$ is contained in $z + H$ ,

| H | \geq C^{- \frac{α}{2}} | A |^{\frac{1}{2}} | H |^{\frac{1}{2}}

so $| H | \geq C^{- α} | A |$ , so

C^{\frac{α}{2} + 1} \frac{| A |^{\frac{1}{2}}}{| H |^{\frac{1}{2}}} \leq C^{α + 1} .

If $| H | \leq | A |$ then we are done. Otherwise, since $B \subset A$ ,

| A | \geq C^{- \frac{α}{2}} | A |^{\frac{1}{2}} | H |^{\frac{1}{2}}

so $| H | \leq C^{α} | A |$ .

Pick a subgroup $H^{'}$ of $H$ of size between $\frac{| A |}{2}$ and $| A |$ . Then $H$ is a union of at most $2 C^{α}$ translates of $H^{'}$ , so $A$ is a union of at most $2 C^{2 α + 1}$ translates of $H^{'}$ . □

Now we reduce further. We shall prove the following statement:

Theorem 7.5 (EPFR $^{'}$ ). There is a constant $η > 0$ such that if $X$ and $Y$ are any two $𝔽_{2}^{n}$ -valued random variables with $d [X; Y] > 0$ , then there exists $𝔽_{2}^{n}$ -valued random variables $U$ and $V$ such that

d [U; V] + η (d [U; X] + d [V; Y]) < d [X; Y] .

Proposition 7.6. EPFR $^{'}$ ( $η$ ) $⟹$ EPFR( $η^{- 1}$ ).

Proof. By compactness we can find $U$ , $V$ such that

τ_{X, Y} [U; V] = d [U; V] + η (d [U; X] + d [V; Y])

is minimised. If $d [U; V] \neq 0$ then by EPFR $^{'}$ ( $η$ ) there exist $Z$ , $W$ such that $τ_{U, V} [Z; W] < d [U; V]$ .

But then

\begin{array}{l} τ_{X, Y} [Z; W] & = d [Z; W] + η (d [Z; X] + d [W; Y]) \\ \leq d [Z; W] + η (d [Z; U] + d [W; V]) + η (d [U; X] + d [V; Y]) \\ (by The entropic Ruzsa triangle inequality) \\ < d [U; V] + η (d [U; X] + d [V; Y]) \\ = τ_{X, Y} [U; V] \end{array}

Contradiction.

It follows that $d [U; V] = 0$ . So there exists $H$ such that $U$ and $V$ are uniform on cosets of $H$ , so

η (d [U_{H}; X] + d [U_{H}; Y]) < d [X; Y],

which gives us EPFR( $η^{- 1}$ ). □

Definition. Write $τ_{X, Y} [U | Z; V | W]$ for

\sum_{Z, W} ℙ [Z = z] ℙ [W = w] τ_{X, Y} [U | Z = z; V | W = w]

Definition. Write $τ_{X, Y} [U; V ∥ Z]$ for

\sum_{z} ℙ [Z = z] τ_{X, Y} [U | z = z; V | Z = z]

Remark. If we can prove EPFR $^{'}$ for conditional random variables, then by averaging we get it for some pair of random variables (e.g. of the form $U | Z = z$ and $V | W = w$ ).

Lemma 7.7 (Fibring lemma). Assuming that:

$G$ and $H$ are abelian groups
$ϕ : G \to H$ a homomorphism
let $X$ , $Y$ independent $G$ -valued random variables

Then

d [X; Y] = d [ϕ (X); ϕ (Y)] + d [X | ϕ (X); Y | ϕ (Y)] + I [X - Y : ϕ (X), ϕ (Y) | ϕ (X) - ϕ (Y)] .

Proof.

\begin{array}{l} d [X; Y] & = H [X - Y] - \frac{1}{2} H [X] - \frac{1}{2} H [Y] \\ = H [ϕ (X) - ϕ (Y)] + H [X - Y | ϕ (X) - ϕ (Y)] - \frac{1}{2} H [ϕ (X)] \\ - \frac{1}{2} H [X | ϕ (X)] - \frac{1}{2} H [ϕ (Y)] - \frac{1}{2} H [Y | ϕ (Y)] \\ = d [ϕ (X); ϕ (Y)] + d [X | ϕ (X); Y | ϕ (Y)] + H [X - Y | ϕ (X) - ϕ (Y)] \\ - H [X - Y | ϕ (X), ϕ (Y)] \end{array}

But the last line of this expression equals

H [X - Y | ϕ (X) - ϕ (Y)] - H [X - Y | ϕ (X), ϕ (Y), ϕ (X) - ϕ (Y)] = I [X - Y : ϕ (X), ϕ (Y) | ϕ (X) - ϕ (Y)] . □

We shall be interested in the following special case.

Corollary 7.8. Assuming that:

$G = 𝔽_{2}^{n}$ and $X_{1}, X_{2}, X_{3}, X_{4}$ are independent $G$ -valued random variables

Then

\begin{array}{l} d [(X_{1}, X_{2}); (X_{3}, X_{4})] & = d [X_{1}; X_{3}] + d [X_{2}; X_{4}] \\ = d [X_{1} + X_{2}; X_{3} + X_{4}] + d [X_{1} | X_{1} + X_{2}; X_{3} | X_{3} + X_{4}] \\ + \underset{(*)}{\underset{⏟}{I [X_{1} + X_{3}, X_{2} + X_{4} : X_{1} + X_{2}, X_{3} + X_{4} | X_{1} + X_{2} + X_{3} + X_{4}]}} \end{array}

Proof. Apply Lemma 7.7 with $X = (X_{1}, X_{2})$ , $Y = (X_{3}, X_{4})$ and $ϕ (x, y) = x + y$ . □

We shall now set $W = X_{1} + X_{2} + X_{3} + X_{4}$ .

Recall that Lemma 6.11 says

d [X; Y ∥ X + Y] \leq 3 I [X : Y] + 2 H [X + Y] - H [X] - H [Y] .

Equivalently,

I [X : Y] \geq \frac{1}{3} (d [X; Y ∥ X + Y] + H [X] + H [Y] - 2 H [X + Y]) .

Applying this to the information term ( $*$ ), we get that it is at least

\begin{array}{l} \frac{1}{3} (d [X_{1} + X_{3}, X_{2} + X_{4}; X_{1} + X_{2}, X_{3} + X_{4} ∥ X_{2} + X_{3}, W] + H [X_{1} + X_{3}, X_{2} + X_{4} | W] \\ + H [X_{1} + X_{2}, X_{3} + X_{4} | W] - 2 H [X_{2} + X_{3}, X_{2} + X_{3} | W]) \end{array}

which simplifies to

\begin{array}{l} \frac{1}{3} (d [X_{1} + X_{3}, X_{2} + X_{4}; X_{1} + X_{2}, X_{3} + X_{4} ∥ X_{2} + X_{3}, W] + H [X_{1} + X_{3} | W] \\ + H [X_{1} + X_{2} | W] - 2 H [X_{2} + X_{3} | W]) \end{array}

So Corollary 7.8 now gives us:

\begin{array}{l} d [X_{1}; X_{3}] + d [X_{2}; X_{4}] & \geq d [X_{1} + X_{2}; X_{3} + X_{4}] + d [X_{1} | X_{1} + X_{2}; X_{3} | X_{4}] \\ \frac{1}{3} (d [X_{1} + X_{2}; X_{1} + X_{3} ∥ X_{2} + X_{3}, W] \\ + H [X_{1} + X_{2} | W] + H [X_{1} + X_{3} | W] - 2 H [X_{2} + X_{3} | W]) \end{array}

Now apply this to $(X_{1}, X_{2}, X_{3}, X_{4})$ , $(X_{1}, X_{2}, X_{4}, X_{3})$ and $(X_{1}, X_{4}, X_{3}, X_{2})$ and add.

We look first at the entropy terms. We get

\begin{array}{l} 2 H [X_{1} + X_{2} | W] + H [X_{1} + X_{4} | W] + H [X_{1} + X_{3} | W] + H [X_{1} + X_{4} | W] + H [X_{1} + X_{3} | W] \\ - 2 H [X_{2} + X_{3} | W] - 2 H [X_{2} + X_{4} | W] - 2 H [X_{1} + X_{2} | W] \\ = 0 \end{array}

where we made heavy use of the observation that if $i, j, k, l$ are some permutation of $1, 2, 3, 4$ , then

H [X_{i} + X_{j} | W] = H [X_{k} + X_{l} | W] .

This also allowed use e.g. to replace

d [X_{1} + X_{2}, X_{3} + X_{4}; X_{1} + X_{3}, X_{2} + X_{4} ∥ X_{2} + X_{3}, W]

d [X_{1} + X_{2}; X_{1} + X_{3} ∥ X_{2} + X_{3}, W] .

Therefore, we get the following inequality:

Lemma 7.9.

\begin{array}{l} 2 d [X_{1}; X_{3}] + 2 d [X_{2}; X_{4}] + d [X_{1}; X_{4}] + d [X_{2}; X_{3}] \\ \geq 2 d [X_{1} + X_{2}; X_{3} + X_{4}] + d [X_{1} + X_{4}; X_{2} + X_{3}] \\ + 2 d [X_{1} | X_{1} + X_{2}; X_{3} | X_{3} + X_{4}] + d [X_{1} | X_{1} + X_{4}; X_{2} | X_{2} + X_{3}] \\ + \frac{1}{3} (d [X_{1} + X_{2}; X_{1} + X_{3} ∥ X_{2} + X_{3}, W] + d [X_{1} + X_{2}; X_{1} + X_{4} ∥ X_{2} + X_{4}, W] \\ + d [X_{1} + X_{4}; X_{1} + X_{3} ∥ X_{3} + X_{4}, W]) \end{array}

Proof. Above. □

Now let $X_{1}, X_{2}$ be copies of $X$ and $Y_{1}, Y_{2}$ copies of $Y$ and apply Lemma 7.9 to $(X_{1}, X_{2}, Y_{1}, Y_{2})$ (all independent), to get this.

Lemma 7.10. Assuming that:

$X_{1}, X_{2}, Y_{1}, Y_{2}$ satisfy: $X_{1}$ and $X_{2}$ are copies of $X$ , $Y_{1}$ and $Y_{2}$ are copies of $Y$ , and all of them are independent

Then

\begin{array}{l} 6 d [X; Y] \\ \geq 2 d [X_{1} + X_{2}; Y_{1} + Y_{2}] + d [X_{1} + Y_{2}; X_{2} + Y_{1}] \\ + 2 d [X_{1} | X_{1} + X_{2}; Y_{1} | Y_{1} + Y_{2}] + d [X_{1} | X_{1} + Y_{1}; X_{2} | X_{2} + Y_{2}] \\ + \frac{2}{3} d [X_{1} + X_{2}; X_{1} + Y_{1} ∥ X_{2} + Y_{1}, X_{1} + Y_{2}] \\ + \frac{1}{3} d [X_{1} + Y_{1}; X_{1} + Y_{2} ∥ X_{1} + X_{2}, Y_{1} + Y_{2}] \end{array}

Proof. Use above. □

Recall that we want $(U, V)$ such that

\begin{array}{l} τ_{X, Y} [U; V] & = d [U; V] + η (d [U; X] + d [V; Y]) \\ < d [X; Y] \end{array}

Lemma 7.10 gives us a collection of distances (some conditioned), at least one of which is at most $\frac{6}{7} d [X; Y]$ . So it will be enough to show that for all of them we get

d [U; X] + d [V; Y] \leq C d [X; Y],

for some absolute constant $C$ . Then we can take $η < \frac{1}{7 C}$ .

Definition ( $C$ -relevant). Say that $(U, V)$ is $C$ -relevant to $(X, Y)$ if

d [U; X] + d [V; Y] \leq C d [X; Y] .

Lemma 7.11. $(Y, X)$ is $2$ -relevant to $(X, Y)$ .

Proof. $d [Y; X] + d [X; Y] = 2 d [X; Y]$ . □

Lemma 7.12. Assuming that:

$U, V, X$ be independent $𝔽_{2}^{n}$ -valued random variables

Then

d [U + V; X] \leq \frac{1}{2} (d [U; X] + d [V; X] + d [U; V]) .

Proof.

\begin{array}{l} d [U + V; X] & = H [U + V + X] - \frac{1}{2} H [U + V] - \frac{1}{2} H [X] \\ = H [U + V + X] - H [U + V] + \frac{1}{2} H [U + V] - \frac{1}{2} H [X] \\ \leq \frac{1}{2} H [U + X] - \frac{1}{2} H [U] + \frac{1}{2} H [V + X] - \frac{1}{2} H [V] + \frac{1}{2} H [U + V] - \frac{1}{2} H [X] \\ = \frac{1}{2} (d [U; X] + d [V; X] + d [U; V]) □ \end{array}

Corollary 7.13. Assuming that:

$(U, V)$ is $C$ -relevant to $(X, Y)$
$U_{1}, U_{2}, V_{1}, V_{2}$ are independent copies of $U, V$

Then

(U_{1} + U_{2}, V_{1} + V_{2})

2 C

-relevant to

(X, Y)

Proof.

\begin{array}{l} d [U_{1} + U_{2}; X] + d [V_{1} + V_{2}; Y] \\ \leq \frac{1}{2} (2 d [U; X] + d [U; U] + 2 d [V; Y] + d [V; V]) & (by Lemma 7.12) \\ \leq 2 (d [U; X] + d [V; Y]) & (by The entropic Ruzsa triangle inequality) \\ \leq 2 C d [X; Y] \end{array}

□

Corollary 7.14. $(X_{1} + X_{2}, Y_{1} + Y_{2})$ is $4$ -relevant to $(Y, X)$ .

Proof. $(X, Y)$ is $2$ -relevant to $(Y, X)$ , so by Corollary 7.13 we’re done. □

Corollary. Assuming that:

$(U, V)$ is $C$ -relevant to $(X, Y)$

Then

(U + V, U + V)

(3 C + 2)

-relevant to

(X, Y)

Proof. By Lemma 7.12,

\begin{array}{l} d [U + V; X] + d [U + V; Y] & \leq \frac{1}{2} (d [U; X] + d [V; X] + d [U; Y] + d [V; Y] + 2 d [U; V]) \\ \leq \frac{1}{2} (2 d [U; X] + 4 d [U; V] + 2 d [V; Y]) \\ \leq \frac{1}{2} (6 d [U; X] + 6 d [V; Y] + 4 d [X; Y]) □ \end{array}

Corollary 7.15. Assuming that:

$(U, V)$ is $C$ -relevant to $(X, Y)$

Then

(U + V, U + V)

2 (C + 1)

-relevant to

(X, Y)

Proof.

\begin{array}{l} d [U + V; X] & \leq \frac{1}{2} (d [U; X] + d [V; X] + d [U; V]) \\ \leq \frac{1}{2} (d [U; X] + d [V; Y] + d [X; Y] + d [U; X] + d [X; Y] + d [V; Y]) \\ = d [U; X] + d [V; Y] + d [X; Y] \end{array}

Similarly for $d [U + V; Y]$ . □

Lemma 7.16. Assuming that:

$U, V, X$ are independent $𝔽_{2}^{n}$ -valued random variables

Then

d [U | U + V; X] \leq \frac{1}{2} (d [U; X] + d [V; X] + d [U; V]) .

Proof.

\begin{array}{l} d [U | U + V; X] & \leq H [U + X | U + V] - \frac{1}{2} H [U | U + V] - \frac{1}{2} H [X] \\ \leq H [U + X] - \frac{1}{2} H [U] - \frac{1}{2} H [V] + \frac{1}{2} H [U + V] - \frac{1}{2} H [X] \end{array}

But $d [U | U + V; X] = d [V | U + V; X]$ , so it’s also

\leq H [V + X] - \frac{1}{2} H [U] - \frac{1}{2} H [V] + \frac{1}{2} H [U + V] - \frac{1}{2} H [X] .

Averaging the two inequalities gives the result (as earlier). □

Corollary 7.17. Assuming that:

$U, V$ are independent random variables
$(U, V)$ is $C$ -relevant to $(X, Y)$

Then

(i) $(U_{1} | U_{1} + U_{2}, V_{1} | V_{1} + V_{2})$ is $2 C$ -relevant to $(X, Y)$ .
(ii) $(U_{1} | U_{1} + V_{1}, U_{2} | U_{2} + V_{2})$ is $2 (C + 1)$ -relevant to $(X, Y)$ .

Proof. Use Lemma 7.16. Then as soon as it is used, we are in exactly the situation we were in when bounding the relevance of $(U_{1} + U_{2}, V_{1} + V_{2})$ and $(U_{1} + V_{1}, U_{2} + V_{2})$ . □

It remains to tackle the last two terms in Lemma 7.10. For the fifth term we need to bound

d [X_{1} + X_{2} | X_{2} + Y_{1}, X_{1} + Y_{2}; X] + d [X_{1} + Y_{1} | X_{2} + Y_{1}, X_{1} + Y_{2}; Y] .

But first term of this is at most (by Lemma 7.12)

\frac{1}{2} (d [X_{1} | X_{2} + Y_{1}, X_{1} + Y_{2}; X] + d [X_{2} | X_{2} + Y_{1}, X_{1} + Y_{2}; X] + d [X_{1}; X_{2} ∥ X_{2} + Y_{1}, X_{1} + Y_{2}]) .

By The entropic Ruzsa triangle inequality and independence, this is at most

\begin{array}{l} \leq d [X_{1} | X_{1} + Y_{2}; X] + d [X_{2} | X_{2} + Y_{1}; X] \\ = 2 d [X | X + Y; X] \end{array}

Now we can use Lemma 7.16, and similarly for the other terms.

In this way, we get that the fifth and sixth terms have relevances bounded above by $λ C$ for an absolute constant $λ$ .

7 A proof of Marton’s conjecture in 𝔽2n

7 A proof of Marton’s conjecture in $𝔽_{2}^{n}$