Learning Theory Guide
Algorithms for learning Boolean functions from queries or samples.
Overview
BooFun provides implementations of key learning algorithms from computational learning theory:
Goldreich-Levin: Find heavy Fourier coefficients with query access
PAC Learning: Probably Approximately Correct learning framework
Junta Learning: Learn functions that depend on few variables
LMN Algorithm: Learn decision trees from uniform samples
Sparse Fourier Learning: Learn functions with few Fourier coefficients
Goldreich-Levin Algorithm
The Goldreich-Levin algorithm finds all “heavy” Fourier coefficients (|f̂(S)| ≥ τ) using only O(n/τ²) queries.
from boofun.analysis.learning import goldreich_levin, find_heavy_coefficients
f = bf.majority(7)
# Find all coefficients with |f̂(S)| ≥ 0.1
heavy = goldreich_levin(f, threshold=0.1, num_samples=5000)
print(f"Found {len(heavy)} heavy coefficients")
for S, coeff in heavy:
print(f" f̂({S}) ≈ {coeff:.4f}")
Estimating Single Coefficients
from boofun.analysis.learning import estimate_fourier_coefficient
f = bf.parity(5)
# Estimate f̂({0,1,2,3,4}) - should be ±1 for parity
S = 0b11111 # All variables
estimate, std_err = estimate_fourier_coefficient(f, S, num_samples=10000)
print(f"Estimated f̂(S) = {estimate:.4f} ± {std_err:.4f}")
GoldreichLevinLearner Class
For more control over the learning process:
from boofun.analysis.learning import GoldreichLevinLearner
learner = GoldreichLevinLearner(threshold=0.1, confidence=0.95)
# Learn from query access
heavy_coeffs = learner.learn(f, num_samples=10000)
# Get learning statistics
print(f"Queries used: {learner.query_count}")
print(f"Coefficients found: {len(heavy_coeffs)}")
PAC Learning
The PAC (Probably Approximately Correct) framework:
Given: Sample access to f (can draw (x, f(x)) for random x)
Goal: Output hypothesis h such that Pr[h(x) != f(x)] <= epsilon
With probability: at least 1 - delta
Learning Low-Degree Functions
Functions with spectral concentration at low degrees:
from boofun.analysis.pac_learning import pac_learn_low_degree
f = bf.majority(5)
# Learn degree-2 approximation
hypothesis, error = pac_learn_low_degree(
f,
degree=2,
epsilon=0.1,
delta=0.05,
num_samples=10000
)
print(f"Approximation error: {error:.4f}")
Learning Juntas
A k-junta depends on at most k variables:
from boofun.analysis.pac_learning import pac_learn_junta
# Create a 3-junta (depends on variables 0, 2, 4 only)
def junta_func(x):
return x[0] ^ x[2] ^ x[4]
f = bf.create(junta_func, n=7)
# Learn the junta
hypothesis, relevant_vars = pac_learn_junta(
f,
k=3,
epsilon=0.1,
delta=0.05
)
print(f"Relevant variables: {relevant_vars}")
LMN Algorithm
Learn decision trees from uniform samples (Linial-Mansour-Nisan):
from boofun.analysis.pac_learning import lmn_algorithm
f = bf.tribes(2, 3) # A read-once DNF
# Learn using LMN
hypothesis = lmn_algorithm(
f,
epsilon=0.1,
delta=0.05
)
# Test accuracy
from boofun.analysis.pac_learning import sample_function
test_samples = sample_function(f, 1000)
correct = sum(1 for x, y in test_samples if hypothesis(x) == y)
print(f"Accuracy: {correct/1000:.1%}")
Learning Sparse Fourier Functions
Functions with few non-zero Fourier coefficients:
from boofun.analysis.pac_learning import pac_learn_sparse_fourier
f = bf.parity(4) # Has only one non-zero coefficient
hypothesis, support = pac_learn_sparse_fourier(
f,
sparsity=5,
epsilon=0.1,
delta=0.05
)
print(f"Learned Fourier support: {support}")
Learning Monotone Functions
from boofun.analysis.pac_learning import pac_learn_monotone
f = bf.AND(4) # Monotone function
hypothesis = pac_learn_monotone(
f,
epsilon=0.1,
delta=0.05
)
PACLearner Class
General-purpose PAC learner:
from boofun.analysis.pac_learning import PACLearner
learner = PACLearner(epsilon=0.1, delta=0.05)
# Learn from samples
hypothesis = learner.learn(f, num_samples=10000)
# Evaluate
error = learner.evaluate(hypothesis, f, num_test=1000)
print(f"Test error: {error:.4f}")
# Get statistics
print(f"Samples used: {learner.sample_count}")
Sample Complexity
Function Class |
Sample Complexity |
Algorithm |
|---|---|---|
k-juntas |
O(2^k log n / ε) |
Influence-based |
Degree-d |
O(n^d / ε²) |
Low-degree learning |
s-sparse Fourier |
O(s log n / ε²) |
Sparse recovery |
Decision trees (size s) |
O(s log n / ε²) |
LMN |
Monotone |
O(n / ε²) |
Monotone learning |
Connections to Other Topics
Influences and Learning
High-influence variables are likely relevant:
# Find influential variables for junta learning
infs = f.influences()
top_vars = sorted(range(len(infs)), key=lambda i: infs[i], reverse=True)[:k]
Noise Stability and Learning
Noise-stable functions are easier to learn:
stability = f.noise_stability(0.99)
# High stability → few high-degree coefficients → easier to learn
Hypercontractivity and Learning
KKL theorem bounds help with junta identification:
from boofun import max_influence_bound
max_inf, kkl_bound, total = max_influence_bound(f)
# If max_inf >> kkl_bound, function might be close to a junta
Mathematical Background
PAC Model
A concept class C is PAC-learnable if there exists an algorithm A such that:
For any f ∈ C and distribution D
Given m = poly(n, 1/ε, 1/δ) samples from D
A outputs h with Pr_D[h(x) ≠ f(x)] ≤ ε
With probability at least 1 - δ
Goldreich-Levin Theorem
Theorem: Given query access to f, we can find all S with |f̂(S)| ≥ τ using O(n/τ²) queries.
LMN Theorem
Theorem (Linial-Mansour-Nisan): Decision trees of size s can be ε-approximated by degree O(log(s/ε)) polynomials.
See Also
Spectral Analysis Guide: Fourier coefficients
Query Complexity Guide: Query models
notebooks/lecture7_goldreich_levin.ipynb: Goldreich-Levin tutorialnotebooks/lecture8_learning_juntas.ipynb: Junta learning tutorialO’Donnell, Analysis of Boolean Functions, Chapter 3