CS 294-92: Lecture 6 - Spectral Concentration and Low-Degree Learning¶
Instructor: Avishay Tal
Scribe & Notebook by: Gabriel Taboada
Reference: O'Donnell, Analysis of Boolean Functions, Chapter 3
Overview¶
This notebook explores the connection between Boolean function analysis and learning theory:
- Spectral Concentration: When is a function's Fourier weight concentrated on low-degree coefficients?
- Decision Trees: How decision tree depth relates to spectral concentration
- PAC Learning: The LMN Theorem for learning functions with spectral concentration
- Fourier Coefficient Estimation: Using samples to estimate Fourier coefficients
# Install/upgrade boofun (required for Colab)
# This ensures you have the latest version with all features
!pip install --upgrade boofun -q
import boofun as bf
print(f"BooFun version: {bf.__version__}")
[notice] A new release of pip is available: 25.2 -> 26.0 [notice] To update, run: /Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip
/Users/gabrieltaboada/dev/Boofun/boofun/src/boofun/core/errormodels.py:21: UserWarning: uncertainties library not available - some error models disabled
warnings.warn("uncertainties library not available - some error models disabled")
/Users/gabrieltaboada/dev/Boofun/boofun/src/boofun/quantum/__init__.py:22: UserWarning: Qiskit not available - quantum features limited
warnings.warn("Qiskit not available - quantum features limited")
BooFun version: 1.1.1
import numpy as np
import matplotlib.pyplot as plt
import boofun as bf
from boofun.analysis import learning, complexity
import warnings
warnings.filterwarnings('ignore')
1. Spectral Concentration¶
Definition: A function $f$ is $\varepsilon$-concentrated on degree $\leq k$ if: $$\mathbf{W}^{>k}[f] := \sum_{|S| > k} \hat{f}(S)^2 \leq \varepsilon$$
This means the "Fourier weight" on high-degree coefficients is small.
Example: Decision Trees and Spectral Concentration¶
Theorem (O'Donnell 3.2): A depth-$d$ decision tree has all its Fourier mass on degree $\leq d$.
# Demonstrate spectral concentration for different functions
functions = [
("Dictator x₀", bf.dictator(4, 0)), # 4 vars, dictator index 0, depth 1
("Majority-5", bf.majority(5)), # Complex
("Parity-4", bf.parity(4)), # Max degree
("Tribes(2,3)", bf.tribes(2, 3)), # DNF
]
print("Spectral Concentration Analysis")
print("=" * 70)
print(f"{'Function':<15} {'DT depth':>10} {'W≤1':>10} {'W≤2':>10} {'W≤3':>10}")
print("-" * 70)
for name, f in functions:
dt_depth = complexity.decision_tree_depth(f)
# f.W_leq(k) = sum of f̂(S)² for |S| ≤ k
conc = [f.W_leq(k) for k in [1, 2, 3]]
print(f"{name:<15} {dt_depth:>10} {conc[0]:>10.4f} {conc[1]:>10.4f} {conc[2]:>10.4f}")
print("\nHigher DT depth → spectral weight spreads to higher degrees")
Spectral Concentration Analysis ====================================================================== Function DT depth W≤1 W≤2 W≤3 ---------------------------------------------------------------------- Dictator x₀ 1 1.0000 1.0000 1.0000 Majority-5 5 0.7031 0.7031 0.8594 Parity-4 4 0.0000 0.0000 0.0000 Tribes(2,3) 3 0.7500 0.9375 1.0000 Higher DT depth → spectral weight spreads to higher degrees
# Figure 6.1 from lecture notes: Fourier spectra of different functions
# Shows W_k[f] = Σ_{|S|=k} f̂(S)² (Fourier weight at level k)
def compute_W_k(f):
"""Compute Fourier weight at each level k."""
fourier = f.fourier()
n = f.n_vars
W_k = [0.0] * (n + 1)
for s in range(len(fourier)):
degree = bin(s).count('1')
W_k[degree] += fourier[s] ** 2
return W_k
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Top-left: Dictator
n = 9
f_dict = bf.dictator(n, 0)
W_k = compute_W_k(f_dict)
axes[0,0].bar(range(n+1), W_k, color='green', alpha=0.7)
axes[0,0].set_xlabel('Level k')
axes[0,0].set_ylabel('$W_k[f]$')
axes[0,0].set_title('Fourier Spectrum of Dictator Function')
axes[0,0].set_xticks(range(0, n+1, 2))
# Top-right: XOR (Parity)
f_xor = bf.parity(n)
W_k = compute_W_k(f_xor)
axes[0,1].bar(range(n+1), W_k, color='indianred', alpha=0.7)
axes[0,1].set_xlabel('Level k')
axes[0,1].set_ylabel('$W_k[f]$')
axes[0,1].set_title('Fourier Spectrum of XOR Function')
axes[0,1].set_xticks(range(0, n+1, 2))
# Bottom-left: Majority with asymptotic decay
n_maj = 15 # Use odd n for majority
f_maj = bf.majority(n_maj)
W_k = compute_W_k(f_maj)
k_vals = np.array(range(n_maj+1))
axes[1,0].bar(k_vals, W_k, color='steelblue', alpha=0.7, label='Majority Spectrum')
# Add asymptotic decay curve: ~Θ(1/k^{3/2}) for odd k
k_odd = np.array([k for k in range(1, n_maj+1) if k % 2 == 1])
decay = 0.8 / (k_odd ** 1.5) # Scaled for visualization
axes[1,0].plot(k_odd, decay, 'o--', color='orange', linewidth=2, label=r'$\tilde{\Theta}(1/k^{3/2})$')
axes[1,0].set_xlabel('Level k')
axes[1,0].set_ylabel('$W_k[Maj_n]$')
axes[1,0].set_title(f'Fourier Spectrum of Majority Function (n={n_maj})')
axes[1,0].legend()
axes[1,0].set_xticks(range(0, n_maj+1, 2))
# Bottom-right: Decay comparison
k_range = np.arange(1, 20)
majority_decay = 1.0 / (k_range ** 1.5) # Θ(1/k^{3/2})
poly_decay = 1.0 / (k_range ** 2) # Polynomial: 1/k²
exp_decay = 0.7 ** k_range # Exponential: ρ^k
axes[1,1].plot(k_range, majority_decay, '-', linewidth=2, label=r'$1/k^{3/2}$ (Majority)')
axes[1,1].plot(k_range, poly_decay, '-', linewidth=2, label=r'Polynomial (Tribes, before $\log n$)')
axes[1,1].plot(k_range, exp_decay, '--', linewidth=2, label=r'Exp. tail (after $\log n$)')
axes[1,1].set_xlabel('Level k')
axes[1,1].set_ylabel('Decay')
axes[1,1].set_title('Fourier Decay Comparison')
axes[1,1].legend()
axes[1,1].set_ylim(0, 1.1)
axes[1,1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("Key observations (from lecture notes):")
print(" • Dictator: W₁ = 1, all other Wₖ = 0 (degree 1)")
print(" • XOR: Wₙ = 1, all other Wₖ = 0 (maximum degree)")
print(" • Majority: Wₖ ≈ Θ̃(1/k^{3/2}) for odd k (concentrated on low degrees)")
print(" • Tribes: Polynomial decay until log(n), then exponential")
Key observations (from lecture notes):
• Dictator: W₁ = 1, all other Wₖ = 0 (degree 1)
• XOR: Wₙ = 1, all other Wₖ = 0 (maximum degree)
• Majority: Wₖ ≈ Θ̃(1/k^{3/2}) for odd k (concentrated on low degrees)
• Tribes: Polynomial decay until log(n), then exponential
2. Fourier Coefficient Estimation¶
Lemma (O'Donnell 3.30): Given $m = O(\log(1/\delta)/\varepsilon^2)$ samples, we can estimate $\hat{f}(S)$ with error $\leq \varepsilon$ with probability $\geq 1 - \delta$.
The empirical estimator is: $$\tilde{f}(S) = \frac{1}{m} \sum_{i=1}^{m} f(x^{(i)}) \chi_S(x^{(i)})$$
# Demonstrate Fourier coefficient estimation using the library's built-in function
from boofun.analysis.learning import estimate_fourier_coefficient
f = bf.majority(4)
true_coeffs = f.fourier()
sample_sizes = [50, 200, 1000]
print("Fourier Coefficient Estimation (Majority-4)")
print("=" * 75)
print(f"{'Subset S':<10} {'True':>8} " + "".join(f"{'m='+str(m)+' (est±stderr)':>20}" for m in sample_sizes))
print("-" * 75)
# Important: use DIFFERENT random seeds for each sample size to get independent estimates
subsets_to_show = [0, 1, 2, 3, 5, 15] # Empty, singletons, pairs, full
for s in subsets_to_show:
subset = [i for i in range(4) if (s >> i) & 1]
true_val = true_coeffs[s]
row = f"{str(subset):<10} {true_val:>8.4f}"
for i, m in enumerate(sample_sizes):
# Use different seed for each sample size to get independent samples
rng = np.random.default_rng(42 + i * 1000 + s)
est, stderr = estimate_fourier_coefficient(f, s, num_samples=m, rng=rng)
row += f" {est:>9.4f}±{stderr:.4f}"
print(row)
print("\nStandard error decreases as 1/√m (more samples → better estimates)")
Fourier Coefficient Estimation (Majority-4) =========================================================================== Subset S True m=50 (est±stderr) m=200 (est±stderr) m=1000 (est±stderr) --------------------------------------------------------------------------- [] 0.3750 0.2400±0.1387 0.4200±0.0643 0.3880±0.0292 [0] 0.3750 0.6800±0.1047 0.3400±0.0667 0.3320±0.0298 [1] 0.3750 0.2800±0.1371 0.4600±0.0629 0.3640±0.0295 [0, 1] -0.1250 -0.0800±0.1424 -0.2200±0.0692 -0.0980±0.0315 [0, 2] -0.1250 -0.3200±0.1353 -0.1100±0.0705 -0.1100±0.0314 [0, 1, 2, 3] 0.3750 0.2800±0.1371 0.4100±0.0647 0.3100±0.0301 Standard error decreases as 1/√m (more samples → better estimates)
3. The LMN Theorem (PAC Learning)¶
Theorem (Linial-Mansour-Nisan, 1993): Let $\mathcal{C}$ be a concept class such that every $f \in \mathcal{C}$ is $\varepsilon$-concentrated on Fourier coefficients of degree $\leq k$. Then $\mathcal{C}$ is $(\varepsilon, \delta)$-PAC learnable in time $\text{poly}(n^k, 1/\varepsilon, \log(1/\delta))$.
Learning Algorithm¶
- Estimate all degree-$\leq k$ Fourier coefficients using samples
- Construct hypothesis: $h(x) = \text{sgn}\left(\sum_{|S| \leq k} \tilde{f}(S) \chi_S(x)\right)$
- Fourier concentration bounds the approximation error
# Demonstrate the LMN learning algorithm (simplified version)
from boofun.analysis.learning import estimate_fourier_coefficient
# Create a depth-2 decision tree: (x₀ AND x₁) OR (x₂ AND x₃)
# Truth table: f=1 when (x₀=1 and x₁=1) or (x₂=1 and x₃=1)
tt = []
for i in range(16):
x0, x1, x2, x3 = [(i >> j) & 1 for j in range(4)]
val = (x0 and x1) or (x2 and x3)
tt.append(val)
target = bf.create(tt)
print("LMN Learning Example")
print("=" * 60)
print("Target: (x₀ AND x₁) OR (x₂ AND x₃)")
print(f"Decision tree depth: {complexity.decision_tree_depth(target)}")
print()
# Show spectral concentration: W≤k = sum of f̂(S)² for |S| ≤ k
for k in [1, 2, 3, 4]:
conc = target.W_leq(k)
print(f"W≤{k} = {conc:.4f}")
print("\nLMN approach: estimate low-degree coefficients, threshold to classify")
# Estimate degree ≤ 2 coefficients
rng = np.random.default_rng(42)
estimated_coeffs = {}
for s in range(16):
degree = bin(s).count('1')
if degree <= 2:
est, _ = estimate_fourier_coefficient(target, s, num_samples=500, rng=rng)
if abs(est) > 0.05:
estimated_coeffs[s] = est
print(f"\nEstimated {len(estimated_coeffs)} non-negligible degree-≤2 coefficients")
# Use estimated coefficients for prediction
def predict(x):
val = sum(coeff * (1 - 2 * (bin(x & s).count('1') % 2))
for s, coeff in estimated_coeffs.items())
return 1 if val > 0 else -1
# Test accuracy
correct = sum(1 for x in range(16)
if predict(x) == (1 - 2 * int(target.evaluate(x))))
print(f"Learning accuracy: {correct}/16 = {correct/16:.2%}")
LMN Learning Example ============================================================ Target: (x₀ AND x₁) OR (x₂ AND x₃) Decision tree depth: 4 W≤1 = 0.5781 W≤2 = 0.9219 W≤3 = 0.9844 W≤4 = 1.0000 LMN approach: estimate low-degree coefficients, threshold to classify Estimated 11 non-negligible degree-≤2 coefficients Learning accuracy: 16/16 = 100.00%
4. The Goldreich-Levin Algorithm¶
Problem: Find all "heavy" Fourier coefficients without enumerating all $2^n$ subsets.
Theorem (Goldreich-Levin, 1989): There exists an algorithm that, given oracle access to $f$, finds all $S$ with $|\hat{f}(S)| \geq \tau$ using $O(n/\tau^4)$ queries.
Key Idea: Self-Correction via Restrictions¶
For a random subset $T \subseteq [n]$: $$\mathbf{E}_{T,b}[\hat{f|_{T=b}}(S|_{\bar{T}})] = \hat{f}(S)$$
This allows recursively finding heavy coefficients by restricting variables.
# Demonstrate Goldreich-Levin algorithm on multiple functions
from boofun.analysis.learning import goldreich_levin
# Use consistent threshold for both "true" and GL
tau = 0.15 # Threshold for heavy coefficients
test_functions = {
"Parity-4": bf.parity(4),
"Majority-5": bf.majority(5),
"Dictator": bf.dictator(4, 0),
}
for name, f in test_functions.items():
print(f"Goldreich-Levin on {name} (τ={tau})")
print("=" * 55)
# True heavy coefficients (using same threshold)
true_coeffs = f.fourier()
true_heavy = [(s, c) for s, c in enumerate(true_coeffs) if abs(c) >= tau]
print(f"True heavy coefficients (|f̂(S)| ≥ {tau}): {len(true_heavy)}")
for s, c in sorted(true_heavy, key=lambda x: -abs(x[1]))[:3]:
subset = [i for i in range(f.n_vars) if (s >> i) & 1]
print(f" S={subset}: f̂(S) = {c:.4f}")
if len(true_heavy) > 3:
print(f" ... and {len(true_heavy) - 3} more")
# Use Goldreich-Levin with same threshold
heavy = goldreich_levin(f, threshold=tau)
print(f"\nGL found {len(heavy)} coefficient(s):")
for s, c in sorted(heavy, key=lambda x: -abs(x[1]))[:3]:
subset = [i for i in range(f.n_vars) if (s >> i) & 1]
print(f" S={subset}: est = {c:.4f}")
print()
Goldreich-Levin on Parity-4 (τ=0.15) ======================================================= True heavy coefficients (|f̂(S)| ≥ 0.15): 1 S=[0, 1, 2, 3]: f̂(S) = 1.0000 GL found 1 coefficient(s): S=[0, 1, 2, 3]: est = 1.0000 Goldreich-Levin on Majority-5 (τ=0.15) ======================================================= True heavy coefficients (|f̂(S)| ≥ 0.15): 6 S=[0]: f̂(S) = 0.3750 S=[1]: f̂(S) = 0.3750 S=[2]: f̂(S) = 0.3750 ... and 3 more
GL found 12 coefficient(s):
S=[4]: est = 0.4076 S=[0, 1, 2, 3, 4]: est = 0.3924 S=[1]: est = 0.3863 Goldreich-Levin on Dictator (τ=0.15) =======================================================
True heavy coefficients (|f̂(S)| ≥ 0.15): 1 S=[0]: f̂(S) = 1.0000 GL found 1 coefficient(s): S=[0]: est = 1.0000
Summary¶
Key Concepts¶
Spectral Concentration: Functions with bounded decision tree depth have Fourier weight concentrated on low degrees
LMN Theorem: If a function class has spectral concentration at degree $k$, it's PAC-learnable in time $n^{O(k)}$
Fourier Coefficient Estimation: $O(\log(1/\delta)/\varepsilon^2)$ samples suffice to estimate any $\hat{f}(S)$ within $\varepsilon$
Goldreich-Levin: Find heavy Fourier coefficients efficiently without enumeration
Corollaries (from lecture notes)¶
- Depth-d decision trees: Learnable in time $n^{O(d)}$
- Size-s decision trees: Learnable in time $n^{O(\log s)}$
- Linear Threshold Functions: Learnable in time $n^{O(1/\varepsilon^2)}$
boofun API¶
from boofun.analysis.learning import estimate_fourier_coefficient, goldreich_levin
from boofun.analysis.pac_learning import pac_learn_low_degree
# Estimate single coefficient
est, stderr = estimate_fourier_coefficient(f, S, num_samples=1000)
# Find all heavy coefficients
heavy = goldreich_levin(f, threshold=0.1)
# PAC learn with low-degree assumption
coeffs = pac_learn_low_degree(f, max_degree=3, epsilon=0.1)
Open Questions¶
- Can depth-$d$ decision trees be learned in $\text{poly}(n, 2^d)$ time?
- Can $k$-juntas be learned in $\text{poly}(n)$ time for $k = \log n$?
- Efficient learning of small DNF/CNF formulas?