Analysis on the p-biased cube - Analysis of Boolean Functions

6 Analysis on the $p$ -biased cube

Let $p \in [0, 1]$ . We define a measure $μ_{p}$ on ${- 1, 1}^{n}$ by picking each $x_{i}$ independently to be $1$ with probability $q$ and $- 1$ with probability $p$ where $q = 1 - p$ . This convention looks counterintuitive at first (placing $p$ for the probability of the smaller value), but recall that under the correspondence ${- 1, 1}^{n} \leftrightarrow {0, 1}^{n}$ , we have $1 \leftrightarrow 0$ and $- 1 \leftrightarrow 1$ . So this convention means that we “take the identity most of the time, and take the non-identity with probability $p$ ”.

Consider the random variable $x_{i}$ (where $x \sim μ_{p}$ ). Then

μ = 𝔼 x_{i} = q - p = 2 q - 1 = 1 - 2 p

and

σ^{2} = 𝔼 {(x_{i} - μ)}^{2} = q {(1 - μ)}^{2} + p {(- 1 - μ)}^{2} = q {(2 p)}^{2} + p {(2 q)}^{2} = 4 p q,

so $σ = 2 \sqrt{p q}$ .

Let $ϕ_{i} = \frac{x_{i} - μ}{σ}$ . We write $ϕ (t)$ for $\frac{t - μ}{σ}$ , so $ϕ_{i} = ϕ (x_{i})$ .

Let $ϕ_{A} = \prod_{i \in A} ϕ_{i}$ . The $ϕ_{A}$ will play the role of the $x_{A}$ in the unbiased case.

Fix $p$ . Then given $f, g : {- 1, 1}^{n} \to ℝ$ , we define

⟨ f, g ⟩ = 𝔼_{x \sim μ_{p}} f (x) g (x)

and

∥ f ∥_{r} = {(𝔼_{x \sim μ_{p}} | f (x) |^{r})}^{\frac{1}{r}} .

Lemma 6.1. $⟨ ϕ_{A}, ϕ_{B} ⟩ = δ_{A B}$ for every $A, B \subset [n]$ .

Proof.

⟨ ϕ_{A}, ϕ_{B} ⟩ = 𝔼_{x \sim μ_{p}} \prod_{i \in A} ϕ_{i} (x) \prod_{i \in B} ϕ_{i} (x) = 𝔼_{x \sim μ_{p}} \prod_{i \in A △ B} ϕ_{i} (x) \prod_{i \in A \cap B} ϕ_{i} {(x)}^{2} = δ_{A B} . □

Definition 6.2 ( $p$ -biased Fourier coefficient). Let $f : {- 1, 1}^{n} \to ℝ$ . The $p$ -biased Fourier coefficient $\hat{f} (A)$ is defined by $\hat{f} (A) = ⟨ f, ϕ_{A} ⟩ = 𝔼_{x \sim μ_{p}} f (x) ϕ_{A} (x)$ .

Because the $ϕ_{A}$ form an orthonormal basis, it follows that

\begin{array}{l} ⟨ f, g ⟩ & = \sum_{A} \hat{f} (A) \hat{g} (A) & (Plancherel) \\ f & = \sum_{A} \hat{f} (A) ϕ_{A} & (inversion formula) \end{array}

Definition 6.3. Let $f : {- 1, 1}^{n} \to ℝ$ . Then define $D_{i} f$ by

D_{i} f (x) = \frac{σ}{2} (f (x_{i \to 1}) - f (x_{i \to - 1})) .

Then define ${Inf}_{i} f$ to be $𝔼_{x \sim μ_{p}} D_{i} f {(x)}^{2} = ∥ D_{i} f ∥_{2}^{2}$ .

Now,

D_{i} ϕ_{A} (x) = \frac{σ}{2} (ϕ_{A} (x_{i \to 1}) - ϕ_{A} (x_{i \to - 1})) .

If $i \notin A$ then this is $0$ . Otherwise, it is

\frac{σ}{2} ϕ_{A ∖ {1}} (x) (ϕ (1) - ϕ (- 1)) = \frac{σ}{2} ϕ_{A ∖ {1}} (\frac{1 - μ}{σ} - \frac{- 1 - μ}{σ}) = ϕ_{A ∖ {1}} (x) .

It follows that

D_{i} f = D_{i} (\sum_{A} \hat{f} (A) ϕ_{A}) = \sum_{A ∋ i} \hat{f} (A) ϕ_{A ∖ {1}},

and therefore that

∥ D_{i} f ∥_{2}^{2} = \sum_{A ∋ i} \hat{f} {(A)}^{2},

and therefore that

I (f) = \sum_{i} {Inf}_{i} f = \sum_{A} | A | \hat{f} {(A)}^{2} .

Noise and stability

Fix $p$ , so that we don’t have to write subscripts everywhere.

Let $x \in {- 1, 1}$ and let $ρ \in [0, 1]$ . We say that $y \sim N_{ρ} (x)$ if with probability $ρ$ $y_{i} = x_{i}$ and with probability $1 - ρ$ $y_{i}$ is $μ_{p}$ -random, and all $y_{i}$ s independent.

If $x \sim μ_{p}$ , then

ℙ [y_{i} = 1] = p q + (1 - p) q = q .

So $y \sim μ_{p}$ . Then we say that $x$ and $y$ are $ρ$ -correlated, and write $x \sim_{ρ} y$ .

Note: $\sim_{ρ}$ depends on $p$ , but we don’t write $\sum_{ρ, p}$ (i.e. we omit the $p$ for convenience).

Definition 6.4. The $p$ -biased noise operator $T_{ρ}$ is given by the formula

T_{ρ} f (x) = 𝔼_{y \sim N_{ρ} (x)} f (y) .

Lemma 6.5. For every $A \subset [n]$ , $T_{ρ} ϕ_{A} = ρ^{| A |} ϕ_{A}$ .

Reminders:

\begin{array}{l} ϕ (t) & = \frac{t - μ}{σ} \\ μ & = q - p = 2 q - 1 = 1 - 2 p \\ σ & = 2 \sqrt{p q} \\ ϕ (1) & = \sqrt{\frac{p}{q}} \\ ϕ (- 1) & = - \sqrt{\frac{q}{p}} \\ ϕ_{A} (x) & = \prod_{i \in A} ϕ (x_{i}) \end{array}

Proof.

\begin{array}{l} T_{ρ} ϕ_{A} (x) & = 𝔼_{y \sim T_{ρ} (x)} \prod_{i \in A} ϕ (y_{i}) \\ = \prod_{i \in A} 𝔼_{y \sim T_{ρ} (x)} ϕ (y_{i}) \\ = \prod_{i \in A} (ρ ϕ (x_{i}) + (1 - ρ) 𝔼 ϕ) \\ = ρ^{| A |} ϕ_{A} (x) □ \end{array}

Corollary 6.6. $\hat{T_{ρ} f} (A) = ρ^{| A |} \hat{f} (A)$ .

Proof. $T_{ρ} f = \sum_{A} \hat{f} (A) T_{ρ} ϕ_{A} = \sum_{A} ρ^{| A |} \hat{f} (A) ϕ_{A}$ . □

Definition 6.7. Let $ρ \in [0, 1]$ , $f : {- 1, 1}^{n} \to ℝ$ . Then

{Stab}_{ρ} f = ⟨ f, T_{ρ} f ⟩ = 𝔼_{x \sim_{ρ} y} f (x) f (y) .

Remark. By Corollary 6.6, ${Stab}_{ρ} f = \sum_{A} ρ^{| A |} \hat{f} {(A)}^{2}$ , as in unbiased case.

The Margulis–Russo formula

We shall be considering more than one value of $p$ .

Convention: Let $f : {- 1, 1}^{n} \to ℝ$ . If we write $f^{(p)}$ , then all definitions should be understood to be $p$ -biased.

Lemma 6.8. Let $f : ℝ^{n} \to ℝ$ be a multilinear function, and write $f$ also for its restriction to ${- 1, 1}^{n}$ . Then $𝔼_{x \sim μ_{p}} f^{(p)} (x) = f (\bar{μ}, μ, \dots, μ)$ .

Note: whenever we write $f^{(p)}$ , it is implicit that we are restricting to ${- 1, 1}^{n}$ , since $μ_{p}$ is only defined there.

We will give 3 proofs!

Proof 1. Write $f = \sum_{A} 𝜃_{A} x_{A}$ . Then $𝔼_{x \sim μ_{p}} x_{A} = \prod_{i \in A} 𝔼_{x \sim μ_{p}} x_{i} = \prod_{i \in A} (q - p) = μ^{| A |} = x_{A} (μ, μ, \dots, μ)$ . Then by linearity, we’re done. □

You might think the above proof is a bit odd, since it uses the $x_{A}$ even though we’re in the $p$ -biased case. I would agree with you!

Proof 2. Write $f^{(p)} = \sum_{A} \hat{f} (A) ϕ_{A}$ . Then

𝔼_{x \sim μ_{p}} ϕ_{A} = \prod_{i \in A} 𝔼_{x \sim μ_{p}} ϕ_{i} = {\begin{matrix} 0 & A \neq \emptyset \\ 1 & A = \emptyset \end{matrix} = ϕ_{A} (μ, μ, \dots, μ) □

Proof 3. Induction on $n$ .

𝔼 f^{(p)} (x) = 𝔼 (q f^{(p)} (x_{n \to 1}) + p f^{(p)} (x_{n \to - 1})) = 𝔼 f^{(p)} (x_{n \to μ}) = f^{(p)} (μ, \dots, μ) .

Where the second equality used linearity in the last coordinate, and the last equality is by induction hypothesis. □

Theorem 6.9 (The Margulis–Russo formula). Let $f$ be as above. Then

\frac{d}{d μ} 𝔼 f^{(p)} = \frac{1}{σ} \sum_{i = 1}^{n} \hat{f^{(p)}} (i) .

Remark. Later we will care instead about $\frac{d}{d p}$ . But for now it is easier to work with $\frac{d}{d μ}$ .

Proof. By Lemma 6.8,

\begin{array}{l} \frac{d}{d μ} 𝔼 f^{(p)} & = \frac{d}{d μ} f (μ, μ, \dots, μ) \\ = \sum_{i = 1}^{n} \frac{\partial}{\partial x_{i}} f (μ, μ, \dots, μ) \\ = \sum_{i = 1}^{n} \frac{1}{2} (f (μ_{i \to 1}) - f (μ_{i \to - 1})) & (by multilinearity) \\ = \frac{1}{σ} \sum_{i = 1}^{n} D_{i} f (μ) \\ = \frac{1}{σ} \sum_{i = 1}^{n} 𝔼_{x \sim μ_{p}} D_{i} f^{(p)} (x) \end{array}

But $D_{i} f = \sum_{A ∋ i} \hat{f} (A) ϕ_{A ∖ {i}}$ , so $𝔼 D_{i} f = \hat{f} (i)$ . The result follows. □

Corollary 6.10. Let $f : {- 1, 1}^{n} \to {- 1, 1}$ be a monotone Boolean function. Then

\frac{d}{d p} ℙ [f^{(p)} (x) = - 1] = \frac{1}{σ^{2}} I (f^{(p)}) .

Proof. $𝔼 f^{(p)} = 1 - 2 ℙ [f^{(p)} (x) = - 1]$ , so

ℙ [f^{(p)} (x) = - 1] = \frac{1 - 𝔼 f^{(p)}}{2} .

Therefore,

\begin{array}{l} \frac{d}{d p} ℙ [f^{(p)} (x) = - 1] & = - \frac{1}{2} \frac{d}{d μ} 𝔼 f^{(p)} \frac{d μ}{d p} \\ = \frac{1}{σ} \sum_{i = 1}^{n} \hat{f^{(p)}} (i) & (by Theorem 6.9) \end{array}

From the proof of Theorem 6.9, this is $\frac{1}{σ} \sum_{i = 1}^{n} 𝔼 D_{i} f^{(p)}$ . Since $f$ is monotone, $D_{i} f^{(p)} (x) \in {0, σ}$ , so this equals

\frac{1}{σ^{2}} \sum_{i = 1}^{n} 𝔼 {(D_{i} f^{(p)})}^{2} = \frac{1}{σ^{2}} \sum_{i} ∥ D_{i} f^{(p)} ∥_{2}^{2} = I (f^{(p)}) . □

Remark. Suppose that $p_{1} < p_{2}$ are such that $ℙ [f^{(p_{1})} (x) = - 1] = 𝜀$ , $ℙ [f^{(p_{2})} = - 1] = 1 - 𝜀$ . Then by MVT there exists $p \in (p_{1}, p_{2})$ such that

\frac{d}{d p} ℙ [f^{(p)} (x) = - 1] = \frac{1 - 2 𝜀}{p_{2} - p_{1}} .

So by The Margulis–Russo formula, there exists $p \in (p_{1}, p_{2})$ such that $I (f^{(p)}) = σ^{2} (\frac{1 - 2 𝜀}{p_{2} - p_{1}})$ .

So if $p_{2} - p_{1}$ isn’t small, then there exists $p$ such that $𝜀 \leq ℙ [f^{(p)} = - 1] \leq 1 - 𝜀$ and $I (f^{(p)})$ isn’t too large.

Let $f^{(p)} : {- 1, 1} \to ℝ$ . Then for each $i \in [n]$ , $E_{i} f$ is defined by

E_{i} f (x) = q f (x_{i \to 1}) + p f (x_{i \to - 1}) .

Lemma 6.11. For every $f : {- 1, 1}^{n} \to ℝ$ and every $i \in [n]$ , $f = E_{i} f + ϕ_{i} D_{i} f$ , and $E_{i} f$ and $ϕ_{i} D_{i} f$ are orthogonal.

Proof. Reminders:

\begin{array}{l} μ & = q - p = 2 q - 1 = 1 - 2 p \\ σ & = 2 \sqrt{p q} \end{array}

f (x) - E_{i} f (x) = {\begin{matrix} p (f (x_{i \to 1}) - f (x_{i \to - 1})) & x_{i} = 1 \\ - q (f (x_{i \to 1}) - f (x_{i \to - 1})) & x_{i} = - 1 \end{matrix}

Also,

ϕ_{i} (x) = {\begin{matrix} \frac{2 p}{σ} & x_{i} = 1 \\ - \frac{2 q}{σ} & x_{i} = - 1 \end{matrix}

Therefore,

f (x) - E_{i} f (x) = \frac{σ}{2} ϕ_{i} (x) (f (x_{i \to 1}) - f (x_{i \to - 1})) = ϕ_{i} D_{i} f (x) .

Orthogonality is easy ( $E_{i} f$ and $D_{i} f$ don’t depend on $x_{i}$ , and then use the fact that $ϕ_{i}$ has average $0$ ). □

Lemma 6.12 ( $p$ -biased Bonami Lemma). Let $f^{(p)} : {- 1, 1}^{n} \to ℝ$ have degree at most $k$ . Then $∥ f ∥_{4} \leq C^{k ∕ 2} ∥ f ∥_{2}$ , where $C = \frac{4}{σ^{2}}$ .

Proof. Induction on $n$ . Let $g = E_{n} f$ , $h = D_{n} f$ . By orthogonality,

∥ f ∥_{2}^{2} = ∥ g ∥_{2}^{2} + ∥ ϕ_{n} h ∥_{2}^{2} = ∥ g ∥_{2}^{2} + ∥ h ∥_{2}^{2} .

Note also that $D_{n} f$ has degree at most $k - 1$ .

\begin{array}{l} ∥ f ∥_{4}^{4} & = 𝔼_{x} (g {(x)}^{4} + 4 ϕ_{n} (x) g {(x)}^{3} h (x) + 6 ϕ_{n} {(x)}^{2} g {(x)}^{2} h {(x)}^{2} + 4 ϕ_{n} {(x)}^{2} g (x) h {(x)}^{3} + ϕ_{n} {(x)}^{4} h {(x)}^{4}) \\ \leq ∥ g ∥_{4}^{4} + 6 ∥ g ∥_{4}^{2} ∥ h ∥_{4}^{2} + 4 | 𝔼 ϕ^{3} | ∥ g ∥_{4} ∥ h ∥_{4}^{3} + 𝔼 ϕ^{4} ∥ h ∥_{4}^{4} \end{array}

Using that $ϕ (1) = \sqrt{\frac{p}{q}}$ , $ϕ (- 1) = - \sqrt{\frac{q}{p}}$ , we get

𝔼 ϕ^{3} = q {(\frac{p}{q})}^{3 ∕ 2} - p {(\frac{q}{p})}^{3 ∕ 2} = \frac{p^{2} - q^{2}}{\sqrt{p q}} = \frac{2 (p - q)}{2 \sqrt{p q}},

so $| 𝔼 ϕ^{3} | \leq \frac{2}{σ}$ . Also,

𝔼 ϕ^{4} = q \frac{p^{2}}{q^{2}} + p \frac{q^{2}}{p^{2}} = \frac{p^{3} + q^{3}}{p q} \leq \frac{4}{σ^{2}} .

∥ f ∥_{4}^{4} \leq C^{2 k} (∥ g ∥_{2}^{4} + 6 C^{- 1} ∥ g ∥_{2}^{4} ∥ h ∥_{2}^{2} + \frac{8}{σ} C^{- 3 ∕ 2} ∥ g ∥_{2} ∥ h ∥_{2}^{3} + \frac{4}{σ^{2}} C^{- 2} ∥ h ∥_{2}^{4}) .

Apply $a b \leq \frac{a^{2} + b^{2}}{2}$ with $a = 2 ∥ g ∥_{2} ∥ h ∥_{2} C^{- \frac{1}{2}}$ , $b = \frac{4}{σ} C^{- 1} ∥ h ∥_{2}^{2}$ . Then

∥ f ∥_{4}^{4} \leq C^{2 k} (∥ g ∥_{2}^{4} + 8 C^{- 1} ∥ g ∥_{2}^{2} ∥ h ∥_{2}^{2} + \frac{12}{σ^{2}} C^{- 2} ∥ h ∥_{2}^{4}) .

Choose $C$ such that $8 C^{- 1} \leq 2$ and $\frac{12}{σ^{2}} C^{- 2} \leq 1$ . $C = \frac{4}{σ^{2}}$ will do.

So $∥ f ∥_{4}^{4} \leq C^{2 k} {(∥ g ∥_{2}^{2} + ∥ h ∥_{2}^{2})}^{2} = C^{2 k} ∥ f ∥_{2}^{4}$ . □

Remark. This proof does not recover the $p = \frac{1}{2}$ case that we saw before (Bonami’s Lemma): in Lemma 4.1 we proved Lemma 6.12 for $p = \frac{1}{2}$ but with $C = 3$ , whereas in the above proof we only get $C = 4$ in the case $p = \frac{1}{2}$ .

Corollary 6.13. Let $ρ = \frac{σ}{2}$ . Then for every $f : {- 1, 1}^{n} \to ℝ$ we have $∥ T_{ρ} f ∥_{4} \leq ∥ f ∥_{2}$ .

Proof.

\begin{array}{l} ∥ T_{ρ} f ∥_{4} & \leq \sum_{k = 0}^{n} ∥ T_{ρ} f^{(= k)} ∥_{4} \\ \leq \sum_{k =}^{n} ρ^{k} ∥ f^{(= k)} ∥_{4} \\ \leq \sum_{k = 0}^{n} ρ^{k} C^{k ∕ 2} ∥ f^{(= k)} ∥_{2} \\ = \sum_{k = 0}^{n} ∥ f^{(= k)} ∥_{2} \\ \leq \sqrt{n} ∥ f ∥_{2} \end{array}

By the tensor power trick, the result follows. □

Corollary 6.14. For every $f : {- 1, 1}^{n} \to ℝ$ with $ρ = \frac{σ}{2}$ , we also have $∥ T_{ρ} f ∥_{2} \leq ∥ f ∥_{4 ∕ 3}$ .

Proof. Identical to uniform case, but with a different $ρ$ . □

Remark. As before, this gives us that ${Stab}_{ρ^{2}} f \leq ∥ f ∥_{4 ∕ 3}^{2}$ , i.e. ${Stab}_{\frac{σ^{2}}{4}} \leq ∥ f ∥_{4 ∕ 3}^{2}$ .

Theorem 6.15 ( $p$ -biased Friedgut junta theorem). Let $f^{(p)} : {- 1, 1}^{n} \to {- 1, 1}$ be a Boolean function and suppose that $∥ f^{(\leq k)} ∥_{2}^{2} \geq 1 - 𝜀$ . Then there exists an $m$ -junta $g : {- 1, 1}^{n} \to ℝ$ with $∥ g - f ∥_{2}^{2} \leq 2 𝜀$ and $m \leq \frac{ρ^{- 2 k} I {(f)}^{3}}{𝜀^{2} σ^{2}}$ .

Proof. Let $τ > 0$ (to be chosen later) and let

J = {i : {Inf}_{i} f \geq τ} .

Let $ρ = \frac{σ^{2}}{4}$ . Then

\begin{array}{l} \sum_{i \notin J} {Stab}_{ρ} (D_{i} f) & \leq \sum_{i \notin J} ∥ D_{i} f ∥_{4 ∕ 3}^{2} \\ = \sum_{i \notin J} ∥ D_{i} f ∥_{2}^{2} σ^{- 2 ∕ 3} \\ = \sum_{i \notin J} {({Inf}_{i} f)}^{3 ∕ 2} σ^{- 1} \\ \leq σ^{- 1} τ^{1 ∕ 2} I (f) \end{array}

Let $g = \sum_{\begin{array}{c} A \subset J \\ | A | \leq k \end{array}} \hat{f} (A) ϕ (A)$ . Then

∥ f - g ∥_{2}^{2} \leq \sum_{\begin{array}{c} B ⁄ \subset J \\ | B | \leq k \end{array}} \hat{f} {(B)}^{2} + \sum_{| B | > k} \hat{f} {(B)}^{2} .

By hypothesis, the second term is at most $𝜀$ . But

\sum_{i \notin J} {Stab}_{ρ} (D_{i} f) = ρ^{- 1} \sum_{B} | B ∖ J | ρ^{| B |} \hat{f} {(B)}^{2} \geq ρ^{- 1} \sum_{\begin{array}{c} B ⁄ \subset J \\ | B | \leq k \end{array}} ρ^{k} f {(B)}^{2} .

Therefore,

\sum_{\begin{array}{c} B ⁄ \subset J \\ | B | \leq k \end{array}} f {(B)}^{2} \leq ρ^{- (k - 1)} σ^{- 1} τ^{\frac{1}{2}} I (f) .

Set $τ = \frac{𝜀^{2} σ^{2} ρ^{2 k}}{I {(f)}^{2}}$ . Then we get the bound. □

Remark. As in unbiased case, we always have that $∥ f^{(\leq k)} ∥_{2}^{2} \geq 1 - 𝜀$ if $k \geq \frac{I (f)}{𝜀}$ .

6 Analysis on the p-biased cube

Noise and stability

The Margulis–Russo formula

6 Analysis on the $p$ -biased cube