6 Entropy in additive combinatorics

We shall need two “simple” results from additive combinatorics due to Imre Ruzsa.

Definition (Sum set / difference set / etc). Let G be an abelian group and let A,BG.

The sumset A+B is the set {x+y:xA,yB}.

The difference set AB is the set {xy:xA,yB}.

We write 2A for A+A, 3A for A+A+A, etc.

Definition (Ruzsa distance). The Ruzsa distance d(A,B) is

|AB||A|12|B|12.

Lemma 6.1 (Ruzsa triangle inequality). d(A,C)d(A,B)+d(B,C).

Proof. This is equivalent to the statement

|AC||B||AB||BC|.

For each xAC, pick a(x)A, c(x)C such that a(x)c(x)=x. Define a map

ϕ:(AC)×B(AB,BC)(x,b)(a(x)b,bc(x))

Adding the coordinates of ϕ(x,b) gives x, so we can calculate a(x) (and c(x)) from ϕ(x,b), and hence can calculate b. So ϕ is an injection.

Lemma 6.2 (Ruzsa covering lemma). Assuming that:

  • G an abelian group

  • A,B finite subsets of G

Then A can be covered by at most |A+B||B| translates of BB.

Proof. Let {x1,,xk} be a maximal subset of A such that the sets xi+B are disjoint.

Then if aA, there exists i such that (a+B)(xi+B). Then axi+BB.

So A can be covered by k translates of BB. But

|B|k=|{x1,,xk}+BA+B||A+B|.

Let X, Y be discrete random variables taking values in an abelian group. What is X+Y when X and Y are independent?

For each z, (X+Y=z)=x+y=z(X=x)(Y=y). Writing px and qy for (X=x) and (Y=y) respectively, this givesim x+y=zpxqy=pq(z) where p(x)=px and q(y)=zy.

So, sums of independent random variables convolutions.

Definition (Entropic Ruzsa distance). Let G be an abelian group and let X, Y be G-valued random variables. The entropic Ruzsa distance d[X;Y] is

H[XY]12H[X]12H[Y]

where X, Y are independent copies of X and Y.

Lemma 6.3. Assuming that:

  • A, B are finite subsets of G

  • X, Y are uniformly distributed on A, B respectively

Then
d[X;Y]logd(A,B).

Proof. Without loss of generality X, Y are indepent. Then

d[X;Y]=H[XY]12H[X]12H[Y]log|AB|12 log A12 log B=logd(A,B)

Lemma 6.4. Assuming that:

  • X, Y are G-valued random variables

Then
H[X+Y]max{H[X],H[Y]}I[X:Y].

Proof.

H[X+Y]H[X+Y|Y](by Subadditivity)=H[X+Y,Y]H[Y]=H[X,Y]H[Y]=H[X]+H[Y]H[Y]I[X:Y]=H[X]I[X:Y]

By symmetry we also have

H[X+Y]H[Y]I[X:Y].

Corollary. Assuming that:

Then:

H[XY]max{H[X],H[Y]}I[X:Y].

Corollary 6.5. Assuming that:

  • X, Y are G-valued random variables

Then
d[X;Y]0.

Proof. Without loss of generality X, Y are independent. Then I[X:Y]=0, so

H[XY]max{H[X],H[Y]}12(H[X]+H[Y])

Lemma 6.6. Assuming that:

  • X, Y are G-valued random variables

Then d[X;Y]=0 if and only if there is some (finite) subgroup H of G such that X and Y are uniform on cosets of H.

Proof.

Recall Lemma 1.16: If Z=f(X)=g(Y), then:

H[X,Y]+H[Z]H[X]+H[Y].

Lemma 6.7 (The entropic Ruzsa triangle inequality). Assuming that:

  • X, Y, Z are G-valued random variables

Then
d[X;Z]d[X;Y]+d[Y;Z].

Proof. We must show that (assuming without loss of generality that X, Y and Z are independent) that

H[XZ]12H[X]12H[Z]H[XY]12H[X]12H[Y]+H[YZ]12H[Y]12H[Z],

i.e. that

H[XZ]+H[Y]H[XY]+H[YZ].(∗)

Since XZ is a function of (XY,YZ) and is also a function of (X,Z), we get using Lemma 1.16 that

H[XY,YZ,X,Z]+H[XZ]H[XY,YZ]+H[X,Z].

This is the same as

H[X,Y,Z]+H[XZ]H[X,Z]+H[XY,YZ].

By independence, cancelling common terms and Subadditivity, we get ().

Lemma 6.8 (Submodularity for sums). Assuming that:

  • X, Y, Z are independent G-valued random variables

Then
H[X+Y+Z]+H[Z]H[X+Z]+H[Y+Z].

Proof. X+Y+Z is a function of (X+Z,Y) and also a function of (X,Y+Z). Therefore (using Lemma 1.16),

H[X+Z,YX,Y+Z]+H[X+Y+Z]H[X+Z,Y]+H[X,Y+Z].

Hence

H[X,Y,Z]+H[X+Y+Z]H[X+Z]+H[Y]+H[X]+H[Y+Z].

By independence and cancellation, we get the desired inequality.

Lemma 6.9. Assuming that:

  • G an abelian group

  • X a G-valued random variable

Then
d[X;X]2d[X;X].

Proof. Let X1, X2, X3 be independent copies of X. Then

d[X;X]=H[X1+X2]12H[X1]12H[X2]H[X1+X2X3]H[X]H[X1X3]+H[X2X3]H[X3]H[X]=2d[X;X]

(as X1,X2,X3 are all copies of X).

Corollary 6.10. Assuming that:

  • X and Y are G-valued random variables

Then
d[X;Y]5d[X;Y].

Proof.

d[X;Y]d[X;Y]+d[Y;Y]d[X;Y]+2d[Y;Y]d[X;Y]+2(d[Y;X]+d[X;Y])=5d[X;Y]
Conditional Distances

Definition (Conditional distance). Let X,Y,U,V be G-valued random variables (in fact, U and V don’t have to be G-valued for the definition to make sense). Then the conditional distance is

d[X|U;Y|V]=u,v[U=u][V=v]d[X|U=u;Y|V=v].

The next definition is not completely standard.

Definition (Simultaneous conditional distance). Let X,Y,U be G-valued random variables. The simultaneous conditional distance of X to Y given U is

d[X;YU]=u[U=u]d[X|U=u;Y|U=u].

We say that X, Y are conditionally independent trials of X, Y given U if:

  • X is distributed like X.

  • Y is distributed like Y.

  • For each uU, X|U=u is distributed like X|U=u,

  • For each uU, Y|U=u is distributed like Y|U=u.

  • X|U=u and Y|U=u are independent.

Then

d[X;YU]=H[XY|U]12H[X|U]12H[Y|U]

(as can be seen directly from the formula).

Lemma 6.11 (The entropic BSG theorem). Assuming that:

  • A and B are G-valued random variables

Then
d[A;B  A+B]3I[A:B]+2H[A+B]H[A]H[B].

Remark. The last few terms look like 2d[A;B]. But they aren’t equal to it, because A and B aren’t (necessarily) independent!

Proof.

d[A;B  A+B]=H[AB|A+B]12H[A|A+B]12H[B|A+B]

where A, B are conditionally independent trials of A, B given A+B. Now calculate

H[A|A+B]=H[A|A+B]=H[A,A+B]H[A+B]=H[A,B]H[A+B]=H[A]+H[B]I[A:B]H[A+B]

Similarly, H[B|A+B] is the same, so 12H[A|A+B]+12H[B|A+B] is also the same.

H[AB|A+B]H[AB].

Let (A1,B1) and (A2,B2) be conditionally independent trials of (A,B) given A+B. Then H[AB]=H[A1B2]. By Submodularity,

H[A1B2]H[A1B2,A]+H[A1B2,B1]H[A1B2,A1,B1]H[A1B2,A1]=H[A1,B2]H[A1]+H[B2]=H[A]+H[B]H[A1B2,B1]=H[A2B1,B1](since A1+B1=A2+B2)=H[A2,B1]H[A]+H[B]

Finally,

H[A1B2,A1,B1]=H[A1,B1,A2,B2]=H[A1,B1,A2,B2|A+B]+H[A+B]=2H[A,B]A+B+H[A+B](by conditional independence of (A1,B1) and (A2,B2))=2H[A,B]H[A+B]=2H[A]+2H[B]2I[A:B]H[A+B]

Adding or subtracting as appropriate all these terms gives the required inequality.