7 A proof of Marton’s conjecture in 𝔽2n

We shall prove the following theorem.

Theorem 7.1 (Green, Manners, Tao, Gowers). There is a polynomial p with the following property:

If n and A𝔽2n is such that |A+A|C|A|, then there is a subspace H𝔽2n of size at most |A| such that A is contained in the union of at most p(C) translates of H. (Equivalently, there exists K𝔽2n, |K|p(C) such that AK+H).

This is known as “Polynomial Freiman–Ruzsa”.

In fact, we shall prove the following statement.

Theorem 7.2 (Entropic Polynomial Freiman–Ruzsa). There exists an absolute constant α satisfying the following: Let G=𝔽2n and let X,Y be G-valued random variables. Then there exists a subsgroup H of G such that

d[X;UH]+d[UH;Y]αd[X;Y]

where UH is the uniform distribution on H.

Lemma 7.3. Assuming that:

  • X a discrete random variable (and write px for (X=x))

Then there exists x such that px2H[X].

Proof. If not, then

H[X]=xpx log (1px)>H[X]xpx=H[X],

contradiction.

Proposition 7.4. Theorem 7.2 implies Theorem 7.1.

Proof. Let A𝔽2n, |A+A|C|A|. Let X and Y be independent copies of UA. Then by Theorem 7.2, there exists H (a subgroup) such that

d[X;UH]+d[UH;X]αd[X;Y]

so

d[X;UH]α2d[X;Y].

But

d[X;Y]=H[UAUA]H[UA]=H[UA+UA]H[UA](characteristic 2)log(C|A|) log |A|=logC

So d[X;UH]αlogC2. Therefore

H[X+UH]12H[X]+12H[UH]+αlogC2=12log|A|+12 log |H|+α log C2

Therefore, by Lemma 7.3, there exists z such that

(X+UH=z)|A|12|H|12Cα2.

But

(X+UH=z)=|A(zH)||A||H|=|A(z+H)||A||H|

(using characteristic 2). So there exists zG such that

|A(z+H)|Cα2|A|12|H|12.

Let B=A(z+H). By the Ruzsa covering lemma, we can cover A by at most |A+B||B| translates of B+B. But Bz+H so B+BH+H=H, so A can be covered by at most |A+B||B| translates of H.

But using BA,

|A+B||A+A|C|A|.

So

|A+B||B|C|A|Cα2|A|12|H|12=Cα2+1|A|12|H|12.

Since B is contained in z+H,

|H|Cα2|A|12|H|12

so |H|Cα|A|, so

Cα2+1|A|12|H|12Cα+1.

If |H||A| then we are done. Otherwise, since BA,

|A|Cα2|A|12|H|12

so |H|Cα|A|.

Pick a subgroup H of H of size between |A|2 and |A|. Then H is a union of at most 2Cα translates of H, so A is a union of at most 2C2α+1 translates of H.

Now we reduce further. We shall prove the following statement:

Theorem 7.5 (EPFR). There is a constant η>0 such that if X and Y are any two 𝔽2n-valued random variables with d[X;Y]>0, then there exists 𝔽2n-valued random variables U and V such that

d[U;V]+η(d[U;X]+d[V;Y])<d[X;Y].

Proposition 7.6. EPFR(η) EPFR(η1).

Proof. By compactness we can find U, V such that

τX,Y[U;V]=d[U;V]+η(d[U;X]+d[V;Y])

is minimised. If d[U;V]0 then by EPFR(η) there exist Z, W such that τU,V[Z;W]<d[U;V].

But then

τX,Y[Z;W]=d[Z;W]+η(d[Z;X]+d[W;Y])d[Z;W]+η(d[Z;U]+d[W;V])+η(d[U;X]+d[V;Y])(by The entropic Ruzsa triangle inequality)<d[U;V]+η(d[U;X]+d[V;Y])=τX,Y[U;V]

Contradiction.

It follows that d[U;V]=0. So there exists H such that U and V are uniform on cosets of H, so

η(d[UH;X]+d[UH;Y])<d[X;Y],

which gives us EPFR(η1).

Definition. Write τX,Y[U|Z;V|W] for

Z,W[Z=z][W=w]τX,Y[U|Z=z;V|W=w]

Definition. Write τX,Y[U;VZ] for

z[Z=z]τX,Y[U|z=z;V|Z=z]

Remark. If we can prove EPFR for conditional random variables, then by averaging we get it for some pair of random variables (e.g. of the form U|Z=z and V|W=w).

Lemma 7.7 (Fibring lemma). Assuming that:

  • G and H are abelian groups

  • ϕ:GH a homomorphism

  • let X, Y be G-valued random variables.

Then
d[X;Y]=d[ϕ(X);ϕ(Y)]+d[X|ϕ(X);Y|ϕ(Y)]+I[XY:ϕ(X),ϕ(Y)|ϕ(X)ϕ(Y)].

Proof.

d[X;Y]=H[XY]12H[X]12H[Y]=H[ϕ(X)ϕ(Y)]+H[XY|ϕ(X)ϕ(Y)]12H[ϕ(X)]  12H[X|ϕ(X)]12H[ϕ(Y)]12H[Y|ϕ(Y)]=d[ϕ(X);ϕ(Y)]+d[X|ϕ(X);Y|ϕ(Y)]+H[XY|ϕ(X)ϕ(Y)]  H[XY|ϕ(X),ϕ(Y)]

But the last line of this expression equals

H[XY|ϕ(X)ϕ(Y)]H[XY|ϕ(X),ϕ(Y),ϕ(X)ϕ(Y)]=I[XY:ϕ(X),ϕ(Y)|ϕ(X)ϕ(Y)].

We shall be interested in the following special case.

Corollary 7.8. Assuming that:

  • G=𝔽2n and X1,X2,X3,X4 are independent G-valued random variables

Then d[(X1,X2);(X3,X4)]=d[X1;X3]+d[X2;X4]=d[X1+X2;X3+X4]+d[X1|X1+X2;X3|X3+X4]  +I[X1+X3,X2+X4:X1+X2,X3+X4|X1+X2+X3+X4]()

Proof. Apply Lemma 7.7 with X=(X1,X2), Y=(X3,X4) and ϕ(x,y)=x+y.

We shall now set W=X1+X2+X3+X4.

Recall that Lemma 6.11 says

d[X;Y  X+Y]3I[X:Y]+2H[X+Y]H[X]H[Y].

Equivalently,

I[X:Y]13(d[X;Y  X+Y]+H[X]+H[Y]2H[X+Y]).

Applying this to the information term (), we get that it is at least

13(d[X1+X3,X2+X4;X1+X2,X3+X4  X2+X3,W]+H[X1+X3,X2+X4|W]  +H[X1+X2,X3+X4|W]2H[X2+X3,X2+X3|W])

which simplifies to

13(d[X1+X3,X2+X4;X1+X2,X3+X4  X2+X3,W]+H[X1+X3|W]  +H[X1+X2|W]2H[X2+X3|W])

So Corollary 7.8 now gives us:

d[X1;X3]+d[X2;X4]d[X1+X2;X3+X4]+d[X1|X1+X2;X3|X4]  13(d[X1+X2;X1+X3  X2+X3,W]  +H[X1+X2|W]+H[X1+X3|W]H[X2+X3|W])

Now apply this to (X1,X2,X3,X4), (X1,X2,X4,X3) and (X1,X4,X3,X2) and add.

We look first at the entropy terms. We get

2H[X1+X2|W]+H[X1+X4|W]+H[X1+X3|W]+H[X1+X4|W]+H[X1+X2|W]  2H[X1+X2|W]2H[X2+X4|W]2H[X1+X2|W]=0

where we made heavy use of the observation that if i,j,k,l are some permutation of 1,2,3,4, then

H[Xi+Xj|W]=H[Xk+Xl|W].

This also allowed use e.g. to replace

d[X1+X2,X3+X4;X1+X3,X2+X4  X2+X3,W]

by

d[X1+X2;X1+X3  X2+X3,W].

Therefore, we get the following inequality:

Lemma 7.9.

2d[X1;X2]+2d[X3;X4]+d[X1;X4]+d[X2;X3]2d[X1+X2;X3+X4]+d[X1+X4;X2+X3]  +2d[X1|X1+X2;X3|X3+X4]+d[X1|X1+X4;X2|X2+X3]  +13(d[X1+X2;X1+X3  X2+X3,W]+d[X1+X2;X1+X4  X2+X4,W]  +d[X1+X4;X1+X3  X3+X4,W])

Proof. Above.

Now let X1,X2 be copies of X and Y1,Y2 copies of Y and apply Lemma 7.9 to (X1,X2,Y1,Y2) (all independent), to get this.

Lemma 7.10. Assuming that:

  • X1,X2,Y1,Y2 satisfy: X1 and X2 are copies of X, Y1 and Y2 are copies of Y, and all of them are independent

Then 6d[X;Y]2d[X1+X2;Y1+Y2]+d[X1+Y2;X2+Y1]  +2d[X1|X1+X2;Y1|Y1+Y2]+d[X1|X1+Y1;X2|X2+Y2]  +23d[X1+X2;X1+Y1  X2+Y1,X1+Y2]  +13d[X1+Y1;X1+Y2  X1+X2,Y1+Y2]

OR? TODO: figure out which is correct

6d[X;Y]2d[X1+X2;Y1+Y2]+d[X1+Y1;X2+Y2]  +2d[X1|X1+X2;Y1|Y1+Y2]+d[X1|X1+Y1;X2|X2+Y2]  +23d[X1+X2|X1+Y1;X2+Y1|X1+Y2]  +13d[X1+Y1|X1+Y2;X1+X1|Y1+Y2]

Proof. Use above.

Recall that we want (U,V) such that

τX,Y[U;V]=d[U;V]+η(d[U;X]+d[V;Y])<d[X;Y]

Lemma 7.10 gives us a collection of distances (some conditioned), at least one of which is at most 67d[X;Y]. So it will be enough to show that for all of them we get

d[U;X]+d[V;Y]Cd[X;Y],

for some absolute constant C. Then we can take η<17C.

Definition (C-relevant). Say that (U,V) is C-relevant to (X,Y) if

d[U;X]+d[V;Y]Cd[X;Y].

Lemma 7.11. (Y,X) is 2-relevant to (X,Y).

Proof. d[Y;X]+d[X;Y]=2d[X;Y].

Lemma 7.12. Assuming that:

  • U,V,X be independent 𝔽2n-valued random variables

Then
d[U+V;X]12(d[U;X]+d[V;X]+d[U;V]).

Proof.

d[U+V;X]=H[U+V+X]12H[U+V]12H[X]=H[U+V+X]H[U+V]+12H[U+V]12H[X]12H[U+X]12H[U]+12H[V+X]12H[V]+12H[U+V]12H[X]=12(d[U;X]+d[V;X]+d[U;V])

Corollary 7.13. Assuming that:

  • (U,V) is C-relevant to (X,Y)

  • U1,U2,V1,V2 are independent copies of U,V

Then (U1+U2,V1+V2) is 2C-relevant to (X,Y).

Proof.

d[U1+U2;X]+d[V1+V2;Y]12(2d[U;V]+d[U;U]+2d[V;Y]+d[V;V])(by Lemma 7.12)2(d[U;X]+d[V;Y])(by The entropic Ruzsa triangle inequality)2Cd[X;Y]

Corollary 7.14. (X1+X2,Y1+Y2) is 4-relevant to (Y,X).

Proof. (X,Y) is 2-relevant to (Y,X), so by Corollary 7.13 we’re done.

Corollary. Assuming that:

Then (U+V,U+V) is (3C+2)-relevant to (X,Y).

Proof. By Lemma 7.12,

d[U+V;X]+d[U+V;Y]12(d[U;X]+d[V;X]+d[U;Y]+d[V;Y]+2d[U;V])12(2d[U;X]+4d[U;V]+2d[V;Y])12(6d[U;X]+6d[V;Y]+4d[X;Y])

Corollary 7.15. Assuming that:

Then (U+V,U+V) is 2(C+1)-relevant to (X,Y).

Proof.

d[U+V;X]12(d[U;X]+d[V;X]+d[U;V])12(d[U;X]+d[V;Y]+d[X;Y]+d[U;X]+d[X;Y]+d[V;Y])=d[U;X]+d[V;Y]+d[X;Y]

Similarly for d[U+V;Y].

Lemma 7.16. Assuming that:

  • U,V,X are independent 𝔽2n-valued random variables

Then
d[U|U+V;X]12(d[U;X]+d[V;X]+d[U;V]).

Proof.

d[U|U+V;X]H[U+X|U+V]12H[U|U+V]12H[X]H[U+X]12H[U]12H[V]+12H[U+V]12H[X]

But d[U|U+V;X]=d[V|U+V;X], so it’s also

H[V+X]12H[U]12H[V]+12H[U+V]12H[X].

Averaging the two inequalities gives the result (as earlier).

Corollary 7.17. Assuming that:

  • U,V are independent random variables

  • (U,V) is C-relevant to (X,Y)

Then

Proof. Use Lemma 7.16. Then as soon as it is used, we are in exactly the situation we were in when bounding the relevance of (U1+U2,V1+V2) and (U1+V1,U2+V2).

It remains to tackle the last two terms in Lemma 7.10. For the fifth term we need to bound

d[X1+X2|X2+Y1,X1+Y2;X]+d[X1+Y1|X2+Y1,X1+Y2;Y].

But first term of this is at most (by Lemma 7.12)

12(d[X1;X2+Y1,X1+Y2]X+d[X2|X1+Y1,X1+Y2;X]+d[X1;X2  X2+Y1,X1+Y2]).

By The entropic Ruzsa triangle inequality and independence, this is at most

d[X1|X1+Y2;X]+d[X2|X2+Y1;X]=2d[X|X+Y;X]

Now we can use Lemma 7.16, and similarly for the other terms.

In this way, we get that the fifth and sixth terms have relevances bounded above by λC for an absolute constant λ.